Brad's Blog: February 2008 Archives

Wed Feb 27 11:06:46 GMT 2008

Best timing for an earthquake - ever

So spending some time living in the UK was supposed to give us exposure to lots of experiences that aren't really available in Australia - but it would be fair to say that I wasn't expecting an earthquake to be one of them.

It wasn't a big earthquake to be sure - quoted in the news as having a magnitdue of 5.2, felt for hundreds of miles around the epicentre the biggest immediate impact seems to be some toppled chimneys at the epicentre.

For more details:

But from my perspective the event was a particularly well staged bit of immersive television. Let me tell you a tale...

Nearing 01:00 in the morning I've just reached the end of an old episode of the X-Files (S03E11). Scully has had a crisis of faith and has decided to unload to a local priest.

The last lines of the show are (copied from the referenced episode guide):

Priest: "Sometimes we must come full circle to find the truth. Why does that surprise you?"
Scully: "Mostly it just makes me afraid."
Priest: "Afraid?"
Scully: "Afraid that god is speaking. But that no one's listening."

And perfectly on cue, just as Scully says "listening" there's a deep rumbling sound and the the house creaks and shakes.

Now thats entertainment!

Posted by Bradley Dean | Permalink | Categories: Cambridge

Mon Feb 25 01:21:58 GMT 2008

Bundling the Amazon S3 code with distutils

Brief note - I've bundled up the Amazon S3 code I've been working on with python distutils.

This is available here: AmazonS3Store-1.0.tar.gz

Posted by Bradley Dean | Permalink | Categories: Python, Programming

Fri Feb 22 23:50:40 GMT 2008

Tweaking and bundling the Amazon S3 tools

With a little bit of tweaking I've now used s3store.py to push a tarball of an entire system up to Amazon S3 which means I've now got this code to the point I needed it.

I've bundled up the code into a tarball: amazon-s3-20080222.tbz

The contents of the tarball are:

S3.py: The Amazon S3 Library for REST in Python
backup.py: Walk directory tree storing regular files to Amazon S3
clear_environment.py: Delete everything stored in Amazon S3
s3config.cfg: Confifguration file template
s3store.py: read, write, list and delete piped data in/out of Amazon S3
s3storelib.py: module supporting s3store.py
system_backup.sh: Store a tarball built from / to Amazon S3

The main change made to the base library (s3storelib.py) was to include an error-and-retry on writing data to S3:

 1  Index: s3storelib.py
 2  ===================================================================
 3  --- s3storelib.py       (revision 18)
 4  +++ s3storelib.py       (revision 20)
 5  @@ -8,6 +8,9 @@
 6   # Data chunk size
 7   chunk_size = 10 * 1024 * 1024
 8
 9  +# Maximum number of times to retry S3 calls
10  +max_tries = 5
11  +
12   def usage():
13     print "\n".join([
14         'Usage:'
15  @@ -134,8 +137,19 @@
16       s3_chunk = '%s-%010d' % (tag, counter)
17       print "Uploading chunk: ", s3_chunk,
18       sys.stdout.flush()
19  -    resp = conn.put(bucket, s3_chunk, chunk)
20  -    assert resp.http_response.status == 200, resp.message
21  +    tries = 0
22  +    while True:
23  +      try:
24  +        resp = conn.put(bucket, s3_chunk, chunk)
25  +        assert resp.http_response.status == 200, resp.message
26  +      except Exception, e:
27  +        tries += 1
28  +        if tries > max_tries:
29  +          raise Exception( "Too many failures: " + str(e) )
30  +        print "[RETRY]",
31  +      else:
32  +        # It worked, break the loop
33  +        break
34       print "[DONE]"
35       sys.stdout.flush()
36       counter += 1

Sometimes connections seems to fail, but in general when I've gone looking for them they haven't happened so a retry-loop seemed a reasonable approach.

Posted by Bradley Dean | Permalink | Categories: Python, Programming

Fri Feb 22 01:14:13 GMT 2008

Revisting Amazon S3 - Piping data into S3

So, the other day I was playing around with storing and deleting content on Amazon Simple Storage Service (S3).

At the time I threw together a quick backup script which walked a local directory tree and attempted to push files up to S3. I also noted that there were lots of limitations to that approach - one of the main ones being that this really just stored regular files so everything else was left behind (symlinks, empty directories etc.)

It occurred to me that we already have perfectly good tools for packaging files together (tar for instance). The problem was that to use these tools I needed disk space to store the output.

This problem has already been solved - if you need to tar up a set of files onto another computer you can simply tar to stdout and pipe that through a ssh connection:

1  $ tar cjBf - /source/dir | ssh host "cat > file.tbz"

So what I really needed was to be able to pipe data into S3, something akin to splitting a file, which in turn can be expressed very simply in python with something like (split.py):

 1  #!/usr/bin/env python
 2
 3  import sys
 4
 5  chunk_size   = int(sys.argv[1])
 6  split_prefix = sys.argv[2]
 7  counter      = 0
 8
 9  chunk = sys.stdin.read(chunk_size)
10  while len(chunk) > 0:
11    fh = open("%s-%05d" % (split_prefix, counter), "wb")
12    fh.write(chunk)
13    fh.close()
14    chunk = sys.stdin.read(chunk_size)
15    counter += 1

And that works - because all I need to do is replace those file writes with S3 RESTful PUTs and I'm there:

 1  def write_data(cmdopts, cfg, conn):
 2    """ Read data from STDIN and store it to Amazon S3
 3
 4        Exceptions will be raised for non-recoverable errors
 5    """
 6    bucket  = cfg.get('Bucket', 'id')
 7    tag     = cmdopts['tag']
 8    counter = 0
 9
10    chunk = sys.stdin.read(chunk_size)
11    while len(chunk) > 0:
12      s3_chunk = '%s-%010d' % (tag, counter)
13      print "Uploading chunk: ", s3_chunk,
14      sys.stdout.flush()
15      resp = conn.put(bucket, s3_chunk, chunk)
16      assert resp.http_response.status == 200, resp.message
17      print "[DONE]"
18      sys.stdout.flush()
19      counter += 1
20      chunk = sys.stdin.read(chunk_size)

And if I can do that, then I should be able reverse the process and read my S3 content back through a pipe using something like:

 1  def read_data(cmdopts, cfg, conn):
 2    """ Read data from STDIN and store it to Amazon S3
 3    """
 4    bucket  = cfg.get('Bucket', 'id')
 5    tag     = cmdopts['tag']
 6
 7    assert bucket in [x.name for x in conn.list_all_my_buckets().entries]
 8
 9    for name in [x.key for x in conn.list_bucket(bucket).entries]:
10      if ( name[:len(tag)+1] == '%s-' % tag ):
11        data = conn.get(bucket, name)
12        sys.stdout.write(data.object.data)

And because that's all feeling fairly useful I've wrapped it up in a little more code which makes things easy:

s3store.py: The main script - reads and writes from S3
s3storelib.py: Library for s3store.py

And finally - an example of using the script:

In...

 1  $ tar czf - /path/to/something | ./s3store.py -w -t bob
 2  tar: Removing leading `/' from member names
 3  Deleting old data:  bob-0000000000  [DONE]
 4  Deleting old data:  bob-0000000001  [DONE]
 5  Deleting old data:  bob-0000000002  [DONE]
 6  Uploading chunk:  bob-0000000000 [DONE]
 7  Uploading chunk:  bob-0000000001 [DONE]
 8  Uploading chunk:  bob-0000000002 [DONE]
 9  .
10  .
11  .

And out...

1  $ ./s3store.py -r -t bob | tar tzf -
2  path/to/something/
3  path/to/something/a/
4  path/to/something/a/file.txt
5  path/to/something/b_file.txt
6  .
7  .
8  .

The script uses a python ConfigParser configuration script which looks like this:

[Credentials]
aws_access_key_id: XXXXXXXXXXXXXXXXXXXX
aws_secret_access_key: YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY

[Bucket]
id: data-backup

Posted by Bradley Dean | Permalink | Categories: Python, Programming

Wed Feb 20 18:24:41 GMT 2008

Demanding spam

Wow, spam just got demanding.

Usually I don't see my spam thanks to SpamAssassin but every now and then some creeps through and I have a squiz.

So today I've received one that says:

We demand that you take 5 minutes out of your online experience and renew your records to avoid running into any future problems with the online service.

Must be a sort of reverse-social-engineering tactic... :).

Posted by Bradley Dean | Permalink

Tue Feb 19 00:21:51 GMT 2008

Connecting to Amazon S3 with Python

Every now and then I've looked at and discussed the various Amazon Web Services but have never actually got around to using any og them personally.

I still don't really need a dynamically and automatically scalable cluster cloud of virtual computers at my beck and call - however groovy that might seem.

What I do need at the moment is some extra storage space - both as a place to backup some data reliably and as a place to serve larger chunks of data which have a tendency to fill up the brilliant but not exactly storage-heavy VPS hosting solutions available these days.

Amazon Simple Storage Service (S3) to the rescue. S3 provides cheap storage via both REST and SOAP interfaces.

Better yet there's libraries already available in a number of languages - information and documentation is available at the Amazon S3 Community Code site.

In this case I'm using the Amazon S3 Library for REST in Python.

So what can I do with this?

Here's a rudimentary backup script: ( backup.py):

 1  #!/usr/bin/env python
 2
 3  import os
 4  import os.path
 5  import sys
 6  import time
 7
 8  import S3
 9
10  AWS_ACCESS_KEY_ID = 'XXXXXXXXXXXXXXXXXXXX'
11  AWS_SECRET_ACCESS_KEY = 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY'
12  conn = S3.AWSAuthConnection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
13
14  time_stamp = time.strftime("%Y%m%d-%H%M%S")
15  backup_bucket = "backup"
16
17  print "Storing in %s [%s]" % (backup_bucket, time_stamp),
18  resp = conn.create_bucket(backup_bucket)
19  print resp.message
20
21  for base_dir in sys.argv:
22    print base_dir
23    for root, dirs, files in os.walk(base_dir):
24      print root
25      for file in files:
26        file_path = os.path.join(root, file)
27        fh = open(file_path, 'rb')
28        data = fh.read()
29        fh.close()
30
31        backup_path = os.path.join(time_stamp, file_path.lstrip('/'))
32        print " .. %s" % backup_path,
33        resp = conn.put(backup_bucket, backup_path, data)
34        print " [%s]" % resp.message

This will walk through a given set of directories and try and upload all regular files it finds. Note that no handling exists for failed uploads (I did say rudimentary) or for non-regular files like symlinks.

I suppose the easiest or most reliable way to make this work across all file types would be to just backup tarballs - on the other hand that means that I need to have the space to store the tarball which somewhat defeats the purpose of cheaper storage.

Having gone and pushed a whole lot of stuff into my S3 space I may as well delete it (if for nothing else then for an excercise in walking through the S3 contents).

So, here's a deletion script: ( clear_environment.py)

 1  #!/usr/bin/env python
 2
 3  import S3
 4
 5  AWS_ACCESS_KEY_ID = 'XXXXXXXXXXXXXXXXXXXX'
 6  AWS_SECRET_ACCESS_KEY = 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY'
 7  conn = S3.AWSAuthConnection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
 8
 9  for bucket in conn.list_all_my_buckets().entries:
10    print bucket.name.encode('ascii', 'replace')
11    for item in conn.list_bucket(bucket.name).entries:
12      print " .. %s" % item.key.encode('ascii', 'replace'),
13      conn.delete(bucket.name, item.key)
14      print " [DELETED]"
15    conn.delete_bucket(bucket.name)
16    print "Deleted bucket"

Probably the main thing to note here is that Amazon S3 does not store objects in a hierarchy. There are a number of base level buckets (in this case named 'backup') which are then just filled up with uniquely keyed items.

A convention among various S3 file-storage/backup solutions has been to name this key using a unix-style path structure. If one of these solutions were to access the files stored by the backup script above they would allow navigation by 'directory' even though no actual directories existed.

Posted by Bradley Dean | Permalink | Categories: Python, Programming

Thu Feb 14 00:12:21 GMT 2008

Travelling through France

Here's the photos of our trip through France in November-December 2007

Trip to France - November to December 2007

Posted by Bradley Dean | Permalink | Categories: Travel

Wed Feb 13 01:38:43 GMT 2008

Testing out nanoblogger

So here we go, I've imported a the few blog/article-like entries I had started to store on my wiki into a new nanoblogger blog.

This is a static-html style blogging system built essentially in bash and using common unix tools to get the job done.

The nicest thing is that it's a command-line blogging system so it's pretty easy to write articles using vi, not to mention to be able to import articles that are written offline.

Posted by Bradley Dean | Permalink | Categories: Programming, SysAdmin

Tuesday 12th February, 16:11:51 GMT

Sending ETRN to backup SMTP exchanges

Having played around with checking SMTP services for backup MX exchanges ("Testing SMTP exchanges") I then thought it would be useful to be able to easily trigger ETRN requests. Backup MX servers tend to poll the mail server periodically to do this automatically but being impatient...

Again using smtplib, this is even quicker and easier then the testing script:

 1  #!/usr/bin/env python
 2
 3  import smtplib
 4
 5  backup_servers = { 'mx3.zoneedit.com' : [ 'bjdean.id.au'
 6                                          , 'orientgrove.net'
 7                                          ]
 8                   }
 9
10  if __name__ == '__main__':
11    for backup_mx in backup_servers.keys():
12      print ">>> Connecting to", backup_mx
13      server = smtplib.SMTP(backup_mx)
14      #server.set_debuglevel(1)
15      for domain in backup_servers[backup_mx]:
16        print ">>> >>> ETRN domain", domain
17        server.docmd('ETRN', domain)

And here's what I see (with debugging turned back on):

>>> Connecting to mx3.zoneedit.com
>>> >>> ETRN domain bjdean.id.au
send: 'ETRN bjdean.id.au\r\n'
reply: '250 Queuing started\r\n'
reply: retcode (250); Msg: Queuing started
>>> >>> ETRN domain orientgrove.net
send: 'ETRN orientgrove.net\r\n'
reply: '250 Queuing started\r\n'
reply: retcode (250); Msg: Queuing started

Posted by Bradley Dean | Permalink | Categories: Python, Programming

Tuesday 12th February, 11:21:14 GMT

Testing SMTP exchanges

With a few domains in tow, and a few different live and backup MX exchanges attached to those, I needed a quick way to work out what was working and what wasn't.

dnspython and smtplib make for a very quick script which tells me everything I need to know.

With a few quick code adjustments I can disect the failures or view the complete SMTP transcript - particularly handy if I'm discussing issues with up-stream providers.

Here's the code:

 1  #!/usr/bin/env python
 2
 3  import smtplib
 4  import dns.resolver
 5
 6  domains = [ 'mydomain.id.au'
 7            , 'myotherdomain.org'
 8            , 'bjdean.id.au'
 9            ]
10
11  def test_domain(domain):
12    print "Testing", domain
13
14    for server in dns.resolver.query(domain, 'MX'):
15      test_smtp(domain, str(server.exchange).strip('.'))
16
17  def test_smtp(domain, exchange):
18    print "Sending test message via exchange", exchange
19    fromaddr = "test_smtp_servers-FROM@%s" % (domain)
20    toaddr   = "test_smtp_servers-TO@%s" % (domain)
21    subject  = "Test via %s for %s" % (exchange, domain)
22    msg = "From: "    + fromaddr + "\r\n" \
23        + "To: "      + toaddr   + "\r\n" \
24        + "Subject: " + subject  + "\r\n" \
25        + "\r\n\r\n"                      \
26        + subject
27
28    server = smtplib.SMTP(exchange)
29    #server.set_debuglevel(1)
30    try:
31      server.sendmail(fromaddr, toaddr, msg)
32    except Exception, e:
33      print "EXCHANGE FAILED:", e
34      #import pdb; pdb.set_trace()
35    server.quit()
36
37  if __name__ == '__main__':
38    for domain in domains:
39      test_domain(domain)

And here's what I see:

Testing mydomain.id.au
Sending test message via exchange mx1.mydomain.id.au
Sending test message via exchange mx2.mydomain.id.au
Testing myotherdomain.org
Sending test message via exchange myotherdomain.org
Testing bjdean.id.au
Sending test message via exchange mail.bjdean.id.au
Sending test message via exchange mx2.zoneedit.com
EXCHANGE FAILED: {'test_smtp_servers-TO@bjdean.id.au': (554, '5.7.1 : Relay access denied')}

Posted by Bradley Dean | Permalink | Categories: Python, Programming