Every now and then I've looked at and discussed the various Amazon Web Services but have never actually got around to using any og them personally.
I still don't really need a dynamically and automatically scalable cluster cloud of virtual computers at my beck and call - however groovy that might seem.
What I do need at the moment is some extra storage space - both as a place to backup some data reliably and as a place to serve larger chunks of data which have a tendency to fill up the brilliant but not exactly storage-heavy VPS hosting solutions available these days.
Amazon Simple Storage Service (S3) to the rescue. S3 provides cheap storage via both REST and SOAP interfaces.
Better yet there's libraries already available in a number of languages - information and documentation is available at the Amazon S3 Community Code site.
In this case I'm using the Amazon S3 Library for REST in Python.
So what can I do with this?
Here's a rudimentary backup script: ( backup.py):
1 #!/usr/bin/env python 2 3 import os 4 import os.path 5 import sys 6 import time 7 8 import S3 9 10 AWS_ACCESS_KEY_ID = 'XXXXXXXXXXXXXXXXXXXX' 11 AWS_SECRET_ACCESS_KEY = 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY' 12 conn = S3.AWSAuthConnection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) 13 14 time_stamp = time.strftime("%Y%m%d-%H%M%S") 15 backup_bucket = "backup" 16 17 print "Storing in %s [%s]" % (backup_bucket, time_stamp), 18 resp = conn.create_bucket(backup_bucket) 19 print resp.message 20 21 for base_dir in sys.argv: 22 print base_dir 23 for root, dirs, files in os.walk(base_dir): 24 print root 25 for file in files: 26 file_path = os.path.join(root, file) 27 fh = open(file_path, 'rb') 28 data = fh.read() 29 fh.close() 30 31 backup_path = os.path.join(time_stamp, file_path.lstrip('/')) 32 print " .. %s" % backup_path, 33 resp = conn.put(backup_bucket, backup_path, data) 34 print " [%s]" % resp.message
This will walk through a given set of directories and try and upload all regular files it finds. Note that no handling exists for failed uploads (I did say rudimentary) or for non-regular files like symlinks.
I suppose the easiest or most reliable way to make this work across all file types would be to just backup tarballs - on the other hand that means that I need to have the space to store the tarball which somewhat defeats the purpose of cheaper storage.
Having gone and pushed a whole lot of stuff into my S3 space I may as well delete it (if for nothing else then for an excercise in walking through the S3 contents).
So, here's a deletion script: ( clear_environment.py)
1 #!/usr/bin/env python 2 3 import S3 4 5 AWS_ACCESS_KEY_ID = 'XXXXXXXXXXXXXXXXXXXX' 6 AWS_SECRET_ACCESS_KEY = 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY' 7 conn = S3.AWSAuthConnection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) 8 9 for bucket in conn.list_all_my_buckets().entries: 10 print bucket.name.encode('ascii', 'replace') 11 for item in conn.list_bucket(bucket.name).entries: 12 print " .. %s" % item.key.encode('ascii', 'replace'), 13 conn.delete(bucket.name, item.key) 14 print " [DELETED]" 15 conn.delete_bucket(bucket.name) 16 print "Deleted bucket"
Probably the main thing to note here is that Amazon S3 does not store objects in a hierarchy. There are a number of base level buckets (in this case named 'backup') which are then just filled up with uniquely keyed items.
A convention among various S3 file-storage/backup solutions has been to name this key using a unix-style path structure. If one of these solutions were to access the files stored by the backup script above they would allow navigation by 'directory' even though no actual directories existed.
So here we go, I've imported a the few blog/article-like entries I had started to store on my wiki into a new nanoblogger blog.
This is a static-html style blogging system built essentially in bash and using common unix tools to get the job done.
The nicest thing is that it's a command-line blogging system so it's pretty easy to write articles using vi, not to mention to be able to import articles that are written offline.
Having played around with checking SMTP services for backup MX exchanges ("Testing SMTP exchanges") I then thought it would be useful to be able to easily trigger ETRN requests. Backup MX servers tend to poll the mail server periodically to do this automatically but being impatient...
Again using smtplib, this is even quicker and easier then the testing script:
1 #!/usr/bin/env python 2 3 import smtplib 4 5 backup_servers = { 'mx3.zoneedit.com' : [ 'bjdean.id.au' 6 , 'orientgrove.net' 7 ] 8 } 9 10 if __name__ == '__main__': 11 for backup_mx in backup_servers.keys(): 12 print ">>> Connecting to", backup_mx 13 server = smtplib.SMTP(backup_mx) 14 #server.set_debuglevel(1) 15 for domain in backup_servers[backup_mx]: 16 print ">>> >>> ETRN domain", domain 17 server.docmd('ETRN', domain)
And here's what I see (with debugging turned back on):
>>> Connecting to mx3.zoneedit.com >>> >>> ETRN domain bjdean.id.au send: 'ETRN bjdean.id.au\r\n' reply: '250 Queuing started\r\n' reply: retcode (250); Msg: Queuing started >>> >>> ETRN domain orientgrove.net send: 'ETRN orientgrove.net\r\n' reply: '250 Queuing started\r\n' reply: retcode (250); Msg: Queuing started
With a few domains in tow, and a few different live and backup MX exchanges attached to those, I needed a quick way to work out what was working and what wasn't.
dnspython and smtplib make for a very quick script which tells me everything I need to know.
With a few quick code adjustments I can disect the failures or view the complete SMTP transcript - particularly handy if I'm discussing issues with up-stream providers.
Here's the code:
1 #!/usr/bin/env python 2 3 import smtplib 4 import dns.resolver 5 6 domains = [ 'mydomain.id.au' 7 , 'myotherdomain.org' 8 , 'bjdean.id.au' 9 ] 10 11 def test_domain(domain): 12 print "Testing", domain 13 14 for server in dns.resolver.query(domain, 'MX'): 15 test_smtp(domain, str(server.exchange).strip('.')) 16 17 def test_smtp(domain, exchange): 18 print "Sending test message via exchange", exchange 19 fromaddr = "test_smtp_servers-FROM@%s" % (domain) 20 toaddr = "test_smtp_servers-TO@%s" % (domain) 21 subject = "Test via %s for %s" % (exchange, domain) 22 msg = "From: " + fromaddr + "\r\n" \ 23 + "To: " + toaddr + "\r\n" \ 24 + "Subject: " + subject + "\r\n" \ 25 + "\r\n\r\n" \ 26 + subject 27 28 server = smtplib.SMTP(exchange) 29 #server.set_debuglevel(1) 30 try: 31 server.sendmail(fromaddr, toaddr, msg) 32 except Exception, e: 33 print "EXCHANGE FAILED:", e 34 #import pdb; pdb.set_trace() 35 server.quit() 36 37 if __name__ == '__main__': 38 for domain in domains: 39 test_domain(domain)
And here's what I see:
Testing mydomain.id.au Sending test message via exchange mx1.mydomain.id.au Sending test message via exchange mx2.mydomain.id.au Testing myotherdomain.org Sending test message via exchange myotherdomain.org Testing bjdean.id.au Sending test message via exchange mail.bjdean.id.au Sending test message via exchange mx2.zoneedit.com EXCHANGE FAILED: {'test_smtp_servers-TO@bjdean.id.au': (554, '5.7.1: Relay access denied')}
There's a couple of cheat/reference sheets out there for perl already:
Here's a perlpod quick reference:
SECTIONS INDENTATION START/STOP POD =head1 =over indentlevel =pod =head2 =item bullet =cut =head3 =back FORMATTING CODES FORMATTING CODES (cont.) Iitalic text F filename B bold text S text with non-breaking spaces C code text X index entry L hyperlink Z<> null E character escape B<< >> More than one delimiter is ok COMMON MAIN SECTIONS NAME BUGS/CAVEATS SYNOPSIS DIAGNOSTICS DESCRIPTION DEPENDENCIES COPYRIGHT LICENSE SEE ALSO AUTHOR
The first time you come across EBCDIC data in an ASCII based environment (or vice versa) things can become a tad confusing - fotunately there's an easy way to convert the data back and forth.
EBCDIC and ASCII are base character encodings - almost all current operating systems use one or the other of these (most using ASCII).
WikiPedia:ASCII :
ASCII (American Standard Code for Information Interchange), generally pronounced [æski], is a character encoding based on the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that work with text. Most modern character encodings have a historical basis in ASCII.
WikiPedia:EBCDIC :
EBCDIC (Extended Binary Coded Decimal Interchange Code) is an 8-bit character encoding (code page) used on IBM mainframe operating systems, like z/OS, OS/390, VM and VSE, as well as IBM minicomputer operating systems like OS/400 and i5/OS. It is also employed on various non-IBM platforms such as Fujitsu-Siemens' BS2000/OSD, HP MPE/iX, and Unisys MCP. It descended from punched cards and the corresponding six bit binary-coded decimal code that most of IBM's computer peripherals of the late 1950s and early 1960s used.
To transform data between EBCDIC and ASCII the UNIX utility '''dd''' comes to the rescue - better yet, it comes to the rescue in a very simple way.
1 $ cat ascii_data_file | dd conv=ebcdic
1 $ cat ebcdic_data_file | dd conv=ascii
Until I've come up with something to add, for the moment the best way I can think to describe testing with perl is to reference several excelent sources of documentation on the subject:
Those provide an excellent basis for constructing and running tests in perl.
Once the processes in those documents and modules have become familiar there are a number of excellent tools that can be used to make testing even more powerful:
Early experiences with using the Catalyst Web Framework tend to suggest that there are a few common starting points when deploying a new Catalyst application.
These steps take the 'complete' application harness (which is essentially a completely working but completely empty web application) and add a few features to get a little content and functionality into the application.
To begin with it's worth noting that most of documentatin for Catalyst is in the POD - so you can do a lot worse then point your web browser at Catalyst when you're looking for help. The next two resources I've found most useful are the mailing list and the Developer Community Site.
Another invaluable starting point is the Catalyst Tutorial.
Once an application has been created it's time to start adding functionality, so without further ado:
catalyst.pl TestApplication
Often the Template Toolkit will be all you need to manage the View output of an application.
The easiest way to start using Template Toolkit as the default view is to create a view using the Template Toolkit View helper:
./script/testapplication_create.pl view TT TT
Once this is done, the view is most easily accessed by using the DefaultEnd
Plugin which will direct all responses without content in the body to the
first View. This is done by adding DefaultEnd
to the use
Catalyst
part of lib/TestApplication.pm
use Catalyst qw/ -Debug ConfigLoader Static::Simple DefaultEnd /;
A few notes on use of the Template Toolkit view:
./script/testapplication_create.pl view TT TTSite
DefaultEnd
plugin has a view
configuration directive:
# In the YAML configuration file: testapplication.yml --- name: TestApplication view: TT # OR # In the application module: lib/TestApplication.pm __PACKAGE__->config( name => 'TestApplication', view => 'TT', );
lib/TestApplication/Controller/Root.pm
:
# # Output a friendly welcome message # sub default : Private { my ( $self, $c ) = @_; # If we reach here, the correct reponse is a 404 $c->response->status(404); # Hello World $c->response->body( $c->welcome_message ); }Of course removing the default Catalyst welcome page is also a good idea but can be done later when you get around to putting content into the site.
lib/TestApplicatoin.pm
):
__PACKAGE__->config( name => 'TestApplication', view => 'TT', 'View::TT' => { INCLUDE_PATH => [ __PACKAGE__->path_to('templates'), ], WRAPPER => 'site/wrapper.tt', } );This sets up a template include path in a
templates
directory in the root application directory. In addition a wrapper is defined. A Template Toolkit wrapper is a template which is wrapped around all content rendered by the library. When writing a wrapper the content of the page will be inserted where the [% content %]
directive is placed.
An example wrapper is as follows:
[% DEFAULT title = c.config.name %][% title %] [% title %]
[% content %]
sub index : Private { my ( $self, $c ) = @_; # Simple index $c->{stash}->{title} = "Index"; $c->{stash}->{text} = qq{In this case the ''text.tt'' template is very simple:Welcome to the application - nothing to see here yet
}; $c->{stash}->{template} = "text.tt"; }
[% text %]For further information on the automagically called private methods of controllers see Catalyst::Manual::Intro.
Class::DBI
.
In order to properly use data modelling it should not be a requirement that accessing the model is done through the catalyst application. By defining a Class::DBI
library independant of the catalyst application and then referring to that library this can be done. If no library is defined for the database yet you may be able to use the automatic database interrogation done for you by Class::Loader.
I'm using [http://www.sqlite.org/ SQLite] because it's very simple to do so - if you haven't seen SQLite before go have a look. :)
To start with - create an SQLite database (in this case in a {{db}} subdirectory):
BEGIN TRANSACTION; CREATE TABLE foo ( bar VARCHAR(132) ); INSERT INTO "foo" VALUES('sdfasd'); INSERT INTO "foo" VALUES('sdfasd'); INSERT INTO "foo" VALUES('sdfasd'); INSERT INTO "foo" VALUES('sdfasd'); INSERT INTO "foo" VALUES('sdfasd'); COMMIT;Then add the model:
$ ./script/testapplication_create.pl model AppDB CDBI dbi:SQLite:/path/to/TestApplication/db/db.sqliteOnce this is done automagically it's helpful for portability to kill that hardwired path to the database - modify the dsn in the created
AppDB.pm
file from:
dsn => 'dbi:SQLite:/path/to/db.sqlite',to:
dsn => 'dbi:SQLite:' . TestApplication->path_to('db') . '/db.sqlite',And that's it... No really... To access the Model inside the application use the
model
method:
my $foo_model = $c->model('AppDB::Foo'); my @foo_rows = $foo_model->retrieve_all();
And now for a super-short article with a fast answer to a problem - on the other hand, it's a problem I really needed a quick answer to the other day that I couldn't find. So here's the article:
This can be very useful because it's sometimes very helpful to know which of the installed versions of a module are being used - in my case I needed to build a set of libraries to deploy onto a server on which I could not easily build libraries. The problem was finding out when my script was quietly grabbing a module from the core perl installation instead of my library bundle.
A couple of interesting discussions on PerlMonks on the matter:
There were a couple of different approaches discussed, the most complex of which were ''re-following'' the @INC array to try and find which library ''would'' be used. The problem with that approach is that it's a guess about what will be used rather than a report of what was used.
It turns out there's a very very simple way...
One of the Perl predefined variables is %INC (not to be confused with the library search path @INC).
As per the perldoc:
%INC The hash %INC contains entries for each filename included via the "do", "require", or "use" operators. The key is the filename you specified (with module names con- verted to pathnames), and the value is the location of the file found. The "require" operator uses this hash to determine whether a particular file has already been included. If the file was loaded via a hook (e.g. a subroutine reference, see "require" in perlfunc for a description of these hooks), this hook is by default inserted into %INC in place of a filename. Note, however, that the hook may have set the %INC entry by itself to provide some more specific info.
So to get a run-time report of what modules are in use, and where the source files were, just print out %INC:
1 use Data::Dumper; 2 print Data::Dumper:;Dumper(\%INC);
For example:
1 $ perl -MData::Dumper -MEnglish -MCGI -e 'print Data::Dumper::Dumper(\%INC)' 2 $VAR1 = { 3 'warnings/register.pm' => '/usr/lib/perl5/5.8.6/warnings/register.pm', 4 'bytes.pm' => '/usr/lib/perl5/5.8.6/bytes.pm', 5 'Carp.pm' => '/usr/lib/perl5/5.8.6/Carp.pm', 6 'XSLoader.pm' => '/usr/lib/perl5/5.8.6/i386-linux-thread-multi/XSLoader.pm', 7 'English.pm' => '/usr/lib/perl5/5.8.6/English.pm', 8 'Exporter/Heavy.pm' => '/usr/lib/perl5/5.8.6/Exporter/Heavy.pm', 9 'vars.pm' => '/usr/lib/perl5/5.8.6/vars.pm', 10 'strict.pm' => '/usr/lib/perl5/5.8.6/strict.pm', 11 'Exporter.pm' => '/usr/lib/perl5/5.8.6/Exporter.pm', 12 'constant.pm' => '/usr/lib/perl5/5.8.6/constant.pm', 13 'warnings.pm' => '/usr/lib/perl5/5.8.6/warnings.pm', 14 'CGI/Util.pm' => '/usr/lib/perl5/5.8.6/CGI/Util.pm', 15 'overload.pm' => '/usr/lib/perl5/5.8.6/overload.pm', 16 'CGI.pm' => '/usr/lib/perl5/5.8.6/CGI.pm', 17 'Data/Dumper.pm' => '/usr/lib/perl5/5.8.6/i386-linux-thread-multi/Data/Dumper.pm' 18 };
Here's the perldoc on PerlDoc:perlvar.
Of course, there must be a gotch...
Modifying %INC is a common way to trick Perl into believing that a module has already been loaded (for instance when using something like Test::MockObject) but when that happens the hash value set is usually not a path to a file. That said - this method is not going to be all that reliable if used in an environment in which munging of %INC is going on.