Wed Feb 27 11:06:46 GMT 2008
Best timing for an earthquake - ever
So spending some time living in the UK was supposed to give us exposure to lots of experiences that aren't really available in Australia - but it would be fair to say that I wasn't expecting an earthquake to be one of them.
It wasn't a big earthquake to be sure - quoted in the news as having a magnitdue of 5.2, felt for hundreds of miles around the epicentre the biggest immediate impact seems to be some toppled chimneys at the epicentre.
For more details:
But from my perspective the event was a particularly well staged bit of immersive television. Let me tell you a tale...
Nearing 01:00 in the morning I've just reached the end of an old episode of the X-Files (S03E11). Scully has had a crisis of faith and has decided to unload to a local priest.
The last lines of the show are (copied from the referenced episode guide):
Priest: "Sometimes we must come full circle to find the truth. Why does that surprise you?"
Scully: "Mostly it just makes me afraid."
Priest: "Afraid?"
Scully: "Afraid that god is speaking. But that no one's listening."
And perfectly on cue, just as Scully says "listening" there's a deep rumbling sound and the the house creaks and shakes.
Now thats entertainment!
Mon Feb 25 01:21:58 GMT 2008
Bundling the Amazon S3 code with distutils
Brief note - I've bundled up the Amazon S3 code I've been working on with python distutils.
This is available here: AmazonS3Store-1.0.tar.gz
Fri Feb 22 23:50:40 GMT 2008
Tweaking and bundling the Amazon S3 tools
With a little bit of tweaking I've now used s3store.py
to push
a tarball of an entire system up to Amazon S3 which means I've now got this
code to the point I needed it.
I've bundled up the code into a tarball: amazon-s3-20080222.tbz
The contents of the tarball are:
- S3.py
- The Amazon S3 Library for REST in Python
- backup.py
- Walk directory tree storing regular files to Amazon S3
- clear_environment.py
- Delete everything stored in Amazon S3
- s3config.cfg
- Confifguration file template
- s3store.py
- read, write, list and delete piped data in/out of Amazon S3
- s3storelib.py
- module supporting s3store.py
- system_backup.sh
- Store a tarball built from / to Amazon S3
The main change made to the base library (s3storelib.py
) was
to include an error-and-retry on writing data to S3:
1 Index: s3storelib.py 2 =================================================================== 3 --- s3storelib.py (revision 18) 4 +++ s3storelib.py (revision 20) 5 @@ -8,6 +8,9 @@ 6 # Data chunk size 7 chunk_size = 10 * 1024 * 1024 8 9 +# Maximum number of times to retry S3 calls 10 +max_tries = 5 11 + 12 def usage(): 13 print "\n".join([ 14 'Usage:' 15 @@ -134,8 +137,19 @@ 16 s3_chunk = '%s-%010d' % (tag, counter) 17 print "Uploading chunk: ", s3_chunk, 18 sys.stdout.flush() 19 - resp = conn.put(bucket, s3_chunk, chunk) 20 - assert resp.http_response.status == 200, resp.message 21 + tries = 0 22 + while True: 23 + try: 24 + resp = conn.put(bucket, s3_chunk, chunk) 25 + assert resp.http_response.status == 200, resp.message 26 + except Exception, e: 27 + tries += 1 28 + if tries > max_tries: 29 + raise Exception( "Too many failures: " + str(e) ) 30 + print "[RETRY]", 31 + else: 32 + # It worked, break the loop 33 + break 34 print "[DONE]" 35 sys.stdout.flush() 36 counter += 1
Sometimes connections seems to fail, but in general when I've gone looking for them they haven't happened so a retry-loop seemed a reasonable approach.
Fri Feb 22 01:14:13 GMT 2008
Revisting Amazon S3 - Piping data into S3
So, the other day I was playing around with storing and deleting content on Amazon Simple Storage Service (S3).
At the time I threw together a quick backup script which walked a local directory tree and attempted to push files up to S3. I also noted that there were lots of limitations to that approach - one of the main ones being that this really just stored regular files so everything else was left behind (symlinks, empty directories etc.)
It occurred to me that we already have perfectly good tools for packaging files together (tar for instance). The problem was that to use these tools I needed disk space to store the output.
This problem has already been solved - if you need to tar up a set of files onto another computer you can simply tar to stdout and pipe that through a ssh connection:
1 $ tar cjBf - /source/dir | ssh host "cat > file.tbz"
So what I really needed was to be able to pipe data into S3, something akin to splitting a file, which in turn can be expressed very simply in python with something like (split.py):
1 #!/usr/bin/env python 2 3 import sys 4 5 chunk_size = int(sys.argv[1]) 6 split_prefix = sys.argv[2] 7 counter = 0 8 9 chunk = sys.stdin.read(chunk_size) 10 while len(chunk) > 0: 11 fh = open("%s-%05d" % (split_prefix, counter), "wb") 12 fh.write(chunk) 13 fh.close() 14 chunk = sys.stdin.read(chunk_size) 15 counter += 1
And that works - because all I need to do is replace those file writes with S3 RESTful PUTs and I'm there:
1 def write_data(cmdopts, cfg, conn): 2 """ Read data from STDIN and store it to Amazon S3 3 4 Exceptions will be raised for non-recoverable errors 5 """ 6 bucket = cfg.get('Bucket', 'id') 7 tag = cmdopts['tag'] 8 counter = 0 9 10 chunk = sys.stdin.read(chunk_size) 11 while len(chunk) > 0: 12 s3_chunk = '%s-%010d' % (tag, counter) 13 print "Uploading chunk: ", s3_chunk, 14 sys.stdout.flush() 15 resp = conn.put(bucket, s3_chunk, chunk) 16 assert resp.http_response.status == 200, resp.message 17 print "[DONE]" 18 sys.stdout.flush() 19 counter += 1 20 chunk = sys.stdin.read(chunk_size)
And if I can do that, then I should be able reverse the process and read my S3 content back through a pipe using something like:
1 def read_data(cmdopts, cfg, conn): 2 """ Read data from STDIN and store it to Amazon S3 3 """ 4 bucket = cfg.get('Bucket', 'id') 5 tag = cmdopts['tag'] 6 7 assert bucket in [x.name for x in conn.list_all_my_buckets().entries] 8 9 for name in [x.key for x in conn.list_bucket(bucket).entries]: 10 if ( name[:len(tag)+1] == '%s-' % tag ): 11 data = conn.get(bucket, name) 12 sys.stdout.write(data.object.data)
And because that's all feeling fairly useful I've wrapped it up in a little more code which makes things easy:
- s3store.py
- The main script - reads and writes from S3
- s3storelib.py
- Library for
s3store.py
And finally - an example of using the script:
In...
1 $ tar czf - /path/to/something | ./s3store.py -w -t bob 2 tar: Removing leading `/' from member names 3 Deleting old data: bob-0000000000 [DONE] 4 Deleting old data: bob-0000000001 [DONE] 5 Deleting old data: bob-0000000002 [DONE] 6 Uploading chunk: bob-0000000000 [DONE] 7 Uploading chunk: bob-0000000001 [DONE] 8 Uploading chunk: bob-0000000002 [DONE] 9 . 10 . 11 .
And out...
1 $ ./s3store.py -r -t bob | tar tzf - 2 path/to/something/ 3 path/to/something/a/ 4 path/to/something/a/file.txt 5 path/to/something/b_file.txt 6 . 7 . 8 .
The script uses a python ConfigParser configuration script which looks like this:
[Credentials] aws_access_key_id: XXXXXXXXXXXXXXXXXXXX aws_secret_access_key: YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY [Bucket] id: data-backup
Wed Feb 20 18:24:41 GMT 2008
Demanding spam
Wow, spam just got demanding.
Usually I don't see my spam thanks to SpamAssassin but every now and then some creeps through and I have a squiz.
So today I've received one that says:
We demand that you take 5 minutes out of your online experience and renew your records to avoid running into any future problems with the online service.
Must be a sort of reverse-social-engineering tactic... :)
.
Tue Feb 19 00:21:51 GMT 2008
Connecting to Amazon S3 with Python
Every now and then I've looked at and discussed the various Amazon Web Services but have never actually got around to using any og them personally.
I still don't really need a dynamically and automatically scalable cluster cloud of virtual computers at my beck and call - however groovy that might seem.
What I do need at the moment is some extra storage space - both as a place to backup some data reliably and as a place to serve larger chunks of data which have a tendency to fill up the brilliant but not exactly storage-heavy VPS hosting solutions available these days.
Amazon Simple Storage Service (S3) to the rescue. S3 provides cheap storage via both REST and SOAP interfaces.
Better yet there's libraries already available in a number of languages - information and documentation is available at the Amazon S3 Community Code site.
In this case I'm using the Amazon S3 Library for REST in Python.
So what can I do with this?
Here's a rudimentary backup script: ( backup.py):
1 #!/usr/bin/env python 2 3 import os 4 import os.path 5 import sys 6 import time 7 8 import S3 9 10 AWS_ACCESS_KEY_ID = 'XXXXXXXXXXXXXXXXXXXX' 11 AWS_SECRET_ACCESS_KEY = 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY' 12 conn = S3.AWSAuthConnection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) 13 14 time_stamp = time.strftime("%Y%m%d-%H%M%S") 15 backup_bucket = "backup" 16 17 print "Storing in %s [%s]" % (backup_bucket, time_stamp), 18 resp = conn.create_bucket(backup_bucket) 19 print resp.message 20 21 for base_dir in sys.argv: 22 print base_dir 23 for root, dirs, files in os.walk(base_dir): 24 print root 25 for file in files: 26 file_path = os.path.join(root, file) 27 fh = open(file_path, 'rb') 28 data = fh.read() 29 fh.close() 30 31 backup_path = os.path.join(time_stamp, file_path.lstrip('/')) 32 print " .. %s" % backup_path, 33 resp = conn.put(backup_bucket, backup_path, data) 34 print " [%s]" % resp.message
This will walk through a given set of directories and try and upload all regular files it finds. Note that no handling exists for failed uploads (I did say rudimentary) or for non-regular files like symlinks.
I suppose the easiest or most reliable way to make this work across all file types would be to just backup tarballs - on the other hand that means that I need to have the space to store the tarball which somewhat defeats the purpose of cheaper storage.
Having gone and pushed a whole lot of stuff into my S3 space I may as well delete it (if for nothing else then for an excercise in walking through the S3 contents).
So, here's a deletion script: ( clear_environment.py)
1 #!/usr/bin/env python 2 3 import S3 4 5 AWS_ACCESS_KEY_ID = 'XXXXXXXXXXXXXXXXXXXX' 6 AWS_SECRET_ACCESS_KEY = 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY' 7 conn = S3.AWSAuthConnection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) 8 9 for bucket in conn.list_all_my_buckets().entries: 10 print bucket.name.encode('ascii', 'replace') 11 for item in conn.list_bucket(bucket.name).entries: 12 print " .. %s" % item.key.encode('ascii', 'replace'), 13 conn.delete(bucket.name, item.key) 14 print " [DELETED]" 15 conn.delete_bucket(bucket.name) 16 print "Deleted bucket"
Probably the main thing to note here is that Amazon S3 does not store objects in a hierarchy. There are a number of base level buckets (in this case named 'backup') which are then just filled up with uniquely keyed items.
A convention among various S3 file-storage/backup solutions has been to name this key using a unix-style path structure. If one of these solutions were to access the files stored by the backup script above they would allow navigation by 'directory' even though no actual directories existed.
Thu Feb 14 00:12:21 GMT 2008
Travelling through France
Here's the photos of our trip through France in November-December 2007
Trip to France - November to December 2007 |
Wed Feb 13 01:38:43 GMT 2008
Testing out nanoblogger
So here we go, I've imported a the few blog/article-like entries I had started to store on my wiki into a new nanoblogger blog.
This is a static-html style blogging system built essentially in bash and using common unix tools to get the job done.
The nicest thing is that it's a command-line blogging system so it's pretty easy to write articles using vi, not to mention to be able to import articles that are written offline.
Tuesday 12th February, 16:11:51 GMT
Sending ETRN to backup SMTP exchanges
Having played around with checking SMTP services for backup MX exchanges ("Testing SMTP exchanges") I then thought it would be useful to be able to easily trigger ETRN requests. Backup MX servers tend to poll the mail server periodically to do this automatically but being impatient...
Again using smtplib, this is even quicker and easier then the testing script:
1 #!/usr/bin/env python 2 3 import smtplib 4 5 backup_servers = { 'mx3.zoneedit.com' : [ 'bjdean.id.au' 6 , 'orientgrove.net' 7 ] 8 } 9 10 if __name__ == '__main__': 11 for backup_mx in backup_servers.keys(): 12 print ">>> Connecting to", backup_mx 13 server = smtplib.SMTP(backup_mx) 14 #server.set_debuglevel(1) 15 for domain in backup_servers[backup_mx]: 16 print ">>> >>> ETRN domain", domain 17 server.docmd('ETRN', domain)
And here's what I see (with debugging turned back on):
>>> Connecting to mx3.zoneedit.com >>> >>> ETRN domain bjdean.id.au send: 'ETRN bjdean.id.au\r\n' reply: '250 Queuing started\r\n' reply: retcode (250); Msg: Queuing started >>> >>> ETRN domain orientgrove.net send: 'ETRN orientgrove.net\r\n' reply: '250 Queuing started\r\n' reply: retcode (250); Msg: Queuing started
Tuesday 12th February, 11:21:14 GMT
Testing SMTP exchanges
With a few domains in tow, and a few different live and backup MX exchanges attached to those, I needed a quick way to work out what was working and what wasn't.
dnspython and smtplib make for a very quick script which tells me everything I need to know.
With a few quick code adjustments I can disect the failures or view the complete SMTP transcript - particularly handy if I'm discussing issues with up-stream providers.
Here's the code:
1 #!/usr/bin/env python 2 3 import smtplib 4 import dns.resolver 5 6 domains = [ 'mydomain.id.au' 7 , 'myotherdomain.org' 8 , 'bjdean.id.au' 9 ] 10 11 def test_domain(domain): 12 print "Testing", domain 13 14 for server in dns.resolver.query(domain, 'MX'): 15 test_smtp(domain, str(server.exchange).strip('.')) 16 17 def test_smtp(domain, exchange): 18 print "Sending test message via exchange", exchange 19 fromaddr = "test_smtp_servers-FROM@%s" % (domain) 20 toaddr = "test_smtp_servers-TO@%s" % (domain) 21 subject = "Test via %s for %s" % (exchange, domain) 22 msg = "From: " + fromaddr + "\r\n" \ 23 + "To: " + toaddr + "\r\n" \ 24 + "Subject: " + subject + "\r\n" \ 25 + "\r\n\r\n" \ 26 + subject 27 28 server = smtplib.SMTP(exchange) 29 #server.set_debuglevel(1) 30 try: 31 server.sendmail(fromaddr, toaddr, msg) 32 except Exception, e: 33 print "EXCHANGE FAILED:", e 34 #import pdb; pdb.set_trace() 35 server.quit() 36 37 if __name__ == '__main__': 38 for domain in domains: 39 test_domain(domain)
And here's what I see:
Testing mydomain.id.au Sending test message via exchange mx1.mydomain.id.au Sending test message via exchange mx2.mydomain.id.au Testing myotherdomain.org Sending test message via exchange myotherdomain.org Testing bjdean.id.au Sending test message via exchange mail.bjdean.id.au Sending test message via exchange mx2.zoneedit.com EXCHANGE FAILED: {'test_smtp_servers-TO@bjdean.id.au': (554, '5.7.1: Relay access denied')}