Tue Feb 19 00:21:51 GMT 2008

Connecting to Amazon S3 with Python

Every now and then I've looked at and discussed the various Amazon Web Services but have never actually got around to using any og them personally.

I still don't really need a dynamically and automatically scalable cluster cloud of virtual computers at my beck and call - however groovy that might seem.

What I do need at the moment is some extra storage space - both as a place to backup some data reliably and as a place to serve larger chunks of data which have a tendency to fill up the brilliant but not exactly storage-heavy VPS hosting solutions available these days.

Amazon Simple Storage Service (S3) to the rescue. S3 provides cheap storage via both REST and SOAP interfaces.

Better yet there's libraries already available in a number of languages - information and documentation is available at the Amazon S3 Community Code site.

In this case I'm using the Amazon S3 Library for REST in Python.

So what can I do with this?

Here's a rudimentary backup script: ( backup.py):

 1  #!/usr/bin/env python
 2
 3  import os
 4  import os.path
 5  import sys
 6  import time
 7
 8  import S3
 9
10  AWS_ACCESS_KEY_ID = 'XXXXXXXXXXXXXXXXXXXX'
11  AWS_SECRET_ACCESS_KEY = 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY'
12  conn = S3.AWSAuthConnection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
13
14  time_stamp = time.strftime("%Y%m%d-%H%M%S")
15  backup_bucket = "backup"
16
17  print "Storing in %s [%s]" % (backup_bucket, time_stamp),
18  resp = conn.create_bucket(backup_bucket)
19  print resp.message
20
21  for base_dir in sys.argv:
22    print base_dir
23    for root, dirs, files in os.walk(base_dir):
24      print root
25      for file in files:
26        file_path = os.path.join(root, file)
27        fh = open(file_path, 'rb')
28        data = fh.read()
29        fh.close()
30
31        backup_path = os.path.join(time_stamp, file_path.lstrip('/'))
32        print " .. %s" % backup_path,
33        resp = conn.put(backup_bucket, backup_path, data)
34        print " [%s]" % resp.message

This will walk through a given set of directories and try and upload all regular files it finds. Note that no handling exists for failed uploads (I did say rudimentary) or for non-regular files like symlinks.

I suppose the easiest or most reliable way to make this work across all file types would be to just backup tarballs - on the other hand that means that I need to have the space to store the tarball which somewhat defeats the purpose of cheaper storage.

Having gone and pushed a whole lot of stuff into my S3 space I may as well delete it (if for nothing else then for an excercise in walking through the S3 contents).

So, here's a deletion script: ( clear_environment.py)

 1  #!/usr/bin/env python
 2
 3  import S3
 4
 5  AWS_ACCESS_KEY_ID = 'XXXXXXXXXXXXXXXXXXXX'
 6  AWS_SECRET_ACCESS_KEY = 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY'
 7  conn = S3.AWSAuthConnection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
 8
 9  for bucket in conn.list_all_my_buckets().entries:
10    print bucket.name.encode('ascii', 'replace')
11    for item in conn.list_bucket(bucket.name).entries:
12      print " .. %s" % item.key.encode('ascii', 'replace'),
13      conn.delete(bucket.name, item.key)
14      print " [DELETED]"
15    conn.delete_bucket(bucket.name)
16    print "Deleted bucket"

Probably the main thing to note here is that Amazon S3 does not store objects in a hierarchy. There are a number of base level buckets (in this case named 'backup') which are then just filled up with uniquely keyed items.

A convention among various S3 file-storage/backup solutions has been to name this key using a unix-style path structure. If one of these solutions were to access the files stored by the backup script above they would allow navigation by 'directory' even though no actual directories existed.


Posted by Bradley Dean | Permalink | Categories: Python, Programming

Wed Feb 13 01:38:43 GMT 2008

Testing out nanoblogger

So here we go, I've imported a the few blog/article-like entries I had started to store on my wiki into a new nanoblogger blog.

This is a static-html style blogging system built essentially in bash and using common unix tools to get the job done.

The nicest thing is that it's a command-line blogging system so it's pretty easy to write articles using vi, not to mention to be able to import articles that are written offline.


Posted by Bradley Dean | Permalink | Categories: Programming, SysAdmin

Tuesday 12th February, 16:11:51 GMT

Sending ETRN to backup SMTP exchanges

Having played around with checking SMTP services for backup MX exchanges ("Testing SMTP exchanges") I then thought it would be useful to be able to easily trigger ETRN requests. Backup MX servers tend to poll the mail server periodically to do this automatically but being impatient...

Again using smtplib, this is even quicker and easier then the testing script:

 1  #!/usr/bin/env python
 2
 3  import smtplib
 4
 5  backup_servers = { 'mx3.zoneedit.com' : [ 'bjdean.id.au'
 6                                          , 'orientgrove.net'
 7                                          ]
 8                   }
 9
10  if __name__ == '__main__':
11    for backup_mx in backup_servers.keys():
12      print ">>> Connecting to", backup_mx
13      server = smtplib.SMTP(backup_mx)
14      #server.set_debuglevel(1)
15      for domain in backup_servers[backup_mx]:
16        print ">>> >>> ETRN domain", domain
17        server.docmd('ETRN', domain)

And here's what I see (with debugging turned back on):

>>> Connecting to mx3.zoneedit.com
>>> >>> ETRN domain bjdean.id.au
send: 'ETRN bjdean.id.au\r\n'
reply: '250 Queuing started\r\n'
reply: retcode (250); Msg: Queuing started
>>> >>> ETRN domain orientgrove.net
send: 'ETRN orientgrove.net\r\n'
reply: '250 Queuing started\r\n'
reply: retcode (250); Msg: Queuing started

Posted by Bradley Dean | Permalink | Categories: Python, Programming

Tuesday 12th February, 11:21:14 GMT

Testing SMTP exchanges

With a few domains in tow, and a few different live and backup MX exchanges attached to those, I needed a quick way to work out what was working and what wasn't.

dnspython and smtplib make for a very quick script which tells me everything I need to know.

With a few quick code adjustments I can disect the failures or view the complete SMTP transcript - particularly handy if I'm discussing issues with up-stream providers.

Here's the code:

 1  #!/usr/bin/env python
 2
 3  import smtplib
 4  import dns.resolver
 5
 6  domains = [ 'mydomain.id.au'
 7            , 'myotherdomain.org'
 8            , 'bjdean.id.au'
 9            ]
10
11  def test_domain(domain):
12    print "Testing", domain
13
14    for server in dns.resolver.query(domain, 'MX'):
15      test_smtp(domain, str(server.exchange).strip('.'))
16
17  def test_smtp(domain, exchange):
18    print "Sending test message via exchange", exchange
19    fromaddr = "test_smtp_servers-FROM@%s" % (domain)
20    toaddr   = "test_smtp_servers-TO@%s" % (domain)
21    subject  = "Test via %s for %s" % (exchange, domain)
22    msg = "From: "    + fromaddr + "\r\n" \
23        + "To: "      + toaddr   + "\r\n" \
24        + "Subject: " + subject  + "\r\n" \
25        + "\r\n\r\n"                      \
26        + subject
27
28    server = smtplib.SMTP(exchange)
29    #server.set_debuglevel(1)
30    try:
31      server.sendmail(fromaddr, toaddr, msg)
32    except Exception, e:
33      print "EXCHANGE FAILED:", e
34      #import pdb; pdb.set_trace()
35    server.quit()
36
37  if __name__ == '__main__':
38    for domain in domains:
39      test_domain(domain)

And here's what I see:

Testing mydomain.id.au
Sending test message via exchange mx1.mydomain.id.au
Sending test message via exchange mx2.mydomain.id.au
Testing myotherdomain.org
Sending test message via exchange myotherdomain.org
Testing bjdean.id.au
Sending test message via exchange mail.bjdean.id.au
Sending test message via exchange mx2.zoneedit.com
EXCHANGE FAILED: {'test_smtp_servers-TO@bjdean.id.au': (554, '5.7.1 : Relay access denied')}

Posted by Bradley Dean | Permalink | Categories: Python, Programming

Thursday 14th September 15:49:36 AEST

PlainOldDocumentation Cheat-Sheet

There's a couple of cheat/reference sheets out there for perl already:

Here's a perlpod quick reference:

SECTIONS    INDENTATION         START/STOP POD
=head1      =over indentlevel   =pod
=head2      =item bullet        =cut
=head3      =back

FORMATTING CODES              FORMATTING CODES (cont.)
I     italic text       F   filename
B     bold text         S       text with non-breaking spaces
C     code text         X index entry
L     hyperlink         Z<>           null
E   character escape  B<<>>   More than one delimiter is ok

COMMON MAIN SECTIONS               
NAME        BUGS/CAVEATS
SYNOPSIS    DIAGNOSTICS
DESCRIPTION DEPENDENCIES
COPYRIGHT   LICENSE
SEE ALSO    AUTHOR

Posted by Bradley Dean | Permalink | Categories: Perl, Programming

Monday 22nd May, 2006 00:10:41 AEST

Transforming between ASCII and EBCDIC

The first time you come across EBCDIC data in an ASCII based environment (or vice versa) things can become a tad confusing - fotunately there's an easy way to convert the data back and forth.

EBCDIC and ASCII are base character encodings - almost all current operating systems use one or the other of these (most using ASCII).

Definitions from Wikipedia

WikiPedia:ASCII :

ASCII (American Standard Code for Information Interchange), generally pronounced [æski], is a character encoding based on the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that work with text. Most modern character encodings have a historical basis in ASCII.

WikiPedia:EBCDIC :

EBCDIC (Extended Binary Coded Decimal Interchange Code) is an 8-bit character encoding (code page) used on IBM mainframe operating systems, like z/OS, OS/390, VM and VSE, as well as IBM minicomputer operating systems like OS/400 and i5/OS. It is also employed on various non-IBM platforms such as Fujitsu-Siemens' BS2000/OSD, HP MPE/iX, and Unisys MCP. It descended from punched cards and the corresponding six bit binary-coded decimal code that most of IBM's computer peripherals of the late 1950s and early 1960s used.

To transform data between EBCDIC and ASCII the UNIX utility '''dd''' comes to the rescue - better yet, it comes to the rescue in a very simple way.

ASCII to EBCDIC

1  $ cat ascii_data_file | dd conv=ebcdic

EBCDIC to ASCII

1  $ cat ebcdic_data_file | dd conv=ascii

Posted by Bradley Dean | Permalink | Categories: Travel, Programming

Saturday 20th May, 2006 01:31:33 AEST

Perl Programming with Testing

Until I've come up with something to add, for the moment the best way I can think to describe testing with perl is to reference several excelent sources of documentation on the subject:

  • Test::Harness - Run Perl standard test scripts with statistics
  • Test::Simple - Basic utilities for writing tests
  • Test::More - yet another framework for writing test scripts
  • prove - A command-line tool for running tests against Test::Harness

Those provide an excellent basis for constructing and running tests in perl.

Once the processes in those documents and modules have become familiar there are a number of excellent tools that can be used to make testing even more powerful:


Posted by Bradley Dean | Permalink | Categories: Perl, Programming

Saturday 20th May, 2008 00:53:53 AEST

Starting a Catalyst application

Overview

Early experiences with using the Catalyst Web Framework tend to suggest that there are a few common starting points when deploying a new Catalyst application.

These steps take the 'complete' application harness (which is essentially a completely working but completely empty web application) and add a few features to get a little content and functionality into the application.

To begin with it's worth noting that most of documentatin for Catalyst is in the POD - so you can do a lot worse then point your web browser at Catalyst when you're looking for help. The next two resources I've found most useful are the mailing list and the Developer Community Site.

Another invaluable starting point is the Catalyst Tutorial.

Once an application has been created it's time to start adding functionality, so without further ado:

catalyst.pl TestApplication

Using Template Toolkit for the primary View

Often the Template Toolkit will be all you need to manage the View output of an application.

The easiest way to start using Template Toolkit as the default view is to create a view using the Template Toolkit View helper:

./script/testapplication_create.pl view TT TT

Once this is done, the view is most easily accessed by using the DefaultEnd Plugin which will direct all responses without content in the body to the first View. This is done by adding DefaultEnd to the use Catalyst part of lib/TestApplication.pm

use Catalyst qw/ -Debug
                 ConfigLoader
                 Static::Simple
                 DefaultEnd
               /;

A few notes on use of the Template Toolkit view:

  1. A more complex template skeleton site can be automatically generated using the Catalyst::Helper::View::TTSite plugin: ./script/testapplication_create.pl view TT TTSite
  2. If another View is added to the application all is not lost - the DefaultEnd plugin has a view configuration directive:
    # In the YAML configuration file: testapplication.yml
    ---
    name: TestApplication
    view: TT
    
    # OR
    
    # In the application module: lib/TestApplication.pm
    __PACKAGE__->config( name => 'TestApplication',
                         view => 'TT',
                       );
    
    

Changing the default page to be a 401: Not Found

Despite ''looking nice'' returning a '''200 Ok''' for any page requested from a website is very poor form causing all sorts of problems - for instance when spiders start walking through your site. As the default behaviour of the Catalyst base application is to do just that, this should be fixed: in lib/TestApplication/Controller/Root.pm:
#
# Output a friendly welcome message
#
sub default : Private {
    my ( $self, $c ) = @_;

    # If we reach here, the correct reponse is a 404
    $c->response->status(404);

    # Hello World
    $c->response->body( $c->welcome_message );
}
Of course removing the default Catalyst welcome page is also a good idea but can be done later when you get around to putting content into the site.

Adding a wrapper for the View

Because I didn't use the TTSite helper in this example I'll manually add a Template Toolkit wrapper to make building the rest of the pages a little easier. First - add some configuration to the main application (lib/TestApplicatoin.pm):
__PACKAGE__->config( name => 'TestApplication',
                     view => 'TT',
                     'View::TT' => {
                        INCLUDE_PATH => [
                          __PACKAGE__->path_to('templates'),
                          ],
                        WRAPPER => 'site/wrapper.tt',
                        }
                   );
This sets up a template include path in a templates directory in the root application directory. In addition a wrapper is defined. A Template Toolkit wrapper is a template which is wrapped around all content rendered by the library. When writing a wrapper the content of the page will be inserted where the [% content %] directive is placed. An example wrapper is as follows:
[% DEFAULT title = c.config.name %]


  [% title %]


[% title %]

[% content %]

Adding a root index page

Having started the server responding with 404 Not Found perhaps it would be nice to see at least one page that wasn't an error page. To add an index/root/welcome page add a private method called ''index'' to the Root controller:
sub index : Private {
    my ( $self, $c ) = @_;

    # Simple index
    $c->{stash}->{title} = "Index";
    $c->{stash}->{text} = qq{

Welcome to the application - nothing to see here yet

}; $c->{stash}->{template} = "text.tt"; }
In this case the ''text.tt'' template is very simple:
[% text %]
For further information on the automagically called private methods of controllers see Catalyst::Manual::Intro.

Recap...

By this point we have a base catalyst application with the following additions: - Template Toolkit is being used to manage rendering for Views - Changed the default behaviour for the application to return 404 Not Found instead of 200 Ok - Added a wrapper to the Template Toolkit to facilitate templated content - Added an index page This is an excellent starting point to implementing actual functionality with a minimum of effort so could well be a good place to stop this article, but because the MVC framework is all about web applications with backend databases it would be remiss of me not to include a reference to a data source (or Model, as data sources are called in the MVC world).

Adding a SQLite Model

The available helper libraries for Catalyst include several ways to easily incorporate SQL databases - including Class::DBI and DBIx::Class. In this example I will use Class::DBI. In order to properly use data modelling it should not be a requirement that accessing the model is done through the catalyst application. By defining a Class::DBI library independant of the catalyst application and then referring to that library this can be done. If no library is defined for the database yet you may be able to use the automatic database interrogation done for you by Class::Loader. I'm using [http://www.sqlite.org/ SQLite] because it's very simple to do so - if you haven't seen SQLite before go have a look. :) To start with - create an SQLite database (in this case in a {{db}} subdirectory):
BEGIN TRANSACTION;
CREATE TABLE foo (
    bar VARCHAR(132)
);
INSERT INTO "foo" VALUES('sdfasd');
INSERT INTO "foo" VALUES('sdfasd');
INSERT INTO "foo" VALUES('sdfasd');
INSERT INTO "foo" VALUES('sdfasd');
INSERT INTO "foo" VALUES('sdfasd');
COMMIT;
Then add the model:
$ ./script/testapplication_create.pl model AppDB CDBI dbi:SQLite:/path/to/TestApplication/db/db.sqlite 
Once this is done automagically it's helpful for portability to kill that hardwired path to the database - modify the dsn in the created AppDB.pm file from:
    dsn           => 'dbi:SQLite:/path/to/db.sqlite',
to:
    dsn           => 'dbi:SQLite:' . 
                     TestApplication->path_to('db') .
                     '/db.sqlite',
And that's it... No really... To access the Model inside the application use the model method:
my $foo_model = $c->model('AppDB::Foo');
my @foo_rows = $foo_model->retrieve_all();

And all is well...

And there we have it - a dully functioning web application (with no pages or functoins to speak of) but incorporating a fully functioning SQL database abstracted through a data model and using a powerful templating language to minimise data rendering complexity. On top of that you have a full MVC application ''out-of-the-box''. At this point it's probably time to start adding some useful functionality to the application, have fun with that.

Posted by Bradley Dean | Permalink | Categories: Perl, Programming

Friday 19th May, 2006 21:46:19 AEST

Which perl module files are being used?

And now for a super-short article with a fast answer to a problem - on the other hand, it's a problem I really needed a quick answer to the other day that I couldn't find. So here's the article:

Question: Which perl module files are being used during the running of a script?

This can be very useful because it's sometimes very helpful to know which of the installed versions of a module are being used - in my case I needed to build a set of libraries to deploy onto a server on which I could not easily build libraries. The problem was finding out when my script was quietly grabbing a module from the core perl installation instead of my library bundle.

A couple of interesting discussions on PerlMonks on the matter:

  1. Which Library Am I Using?
  2. look which and from where modules were included

There were a couple of different approaches discussed, the most complex of which were ''re-following'' the @INC array to try and find which library ''would'' be used. The problem with that approach is that it's a guess about what will be used rather than a report of what was used.

It turns out there's a very very simple way...

Answer

One of the Perl predefined variables is %INC (not to be confused with the library search path @INC).

As per the perldoc:

  %INC    The hash %INC contains entries for each filename included via the "do", "require",
          or "use" operators.  The key is the filename you specified (with module names con-
          verted to pathnames), and the value is the location of the file found.  The
          "require" operator uses this hash to determine whether a particular file has
          already been included.

          If the file was loaded via a hook (e.g. a subroutine reference, see "require" in
          perlfunc for a description of these hooks), this hook is by default inserted into
          %INC in place of a filename.  Note, however, that the hook may have set the %INC
          entry by itself to provide some more specific info.

So to get a run-time report of what modules are in use, and where the source files were, just print out %INC:

1  use Data::Dumper;
2  print Data::Dumper:;Dumper(\%INC);

For example:

 1  $ perl -MData::Dumper -MEnglish -MCGI -e 'print Data::Dumper::Dumper(\%INC)'
 2  $VAR1 = {
 3            'warnings/register.pm' => '/usr/lib/perl5/5.8.6/warnings/register.pm',
 4            'bytes.pm' => '/usr/lib/perl5/5.8.6/bytes.pm',
 5            'Carp.pm' => '/usr/lib/perl5/5.8.6/Carp.pm',
 6            'XSLoader.pm' => '/usr/lib/perl5/5.8.6/i386-linux-thread-multi/XSLoader.pm',
 7            'English.pm' => '/usr/lib/perl5/5.8.6/English.pm',
 8            'Exporter/Heavy.pm' => '/usr/lib/perl5/5.8.6/Exporter/Heavy.pm',
 9            'vars.pm' => '/usr/lib/perl5/5.8.6/vars.pm',
10            'strict.pm' => '/usr/lib/perl5/5.8.6/strict.pm',
11            'Exporter.pm' => '/usr/lib/perl5/5.8.6/Exporter.pm',
12            'constant.pm' => '/usr/lib/perl5/5.8.6/constant.pm',
13            'warnings.pm' => '/usr/lib/perl5/5.8.6/warnings.pm',
14            'CGI/Util.pm' => '/usr/lib/perl5/5.8.6/CGI/Util.pm',
15            'overload.pm' => '/usr/lib/perl5/5.8.6/overload.pm',
16            'CGI.pm' => '/usr/lib/perl5/5.8.6/CGI.pm',
17            'Data/Dumper.pm' => '/usr/lib/perl5/5.8.6/i386-linux-thread-multi/Data/Dumper.pm'
18          };

Here's the perldoc on PerlDoc:perlvar.

A note on a gotcha

Of course, there must be a gotch...

Modifying %INC is a common way to trick Perl into believing that a module has already been loaded (for instance when using something like Test::MockObject) but when that happens the hash value set is usually not a path to a file. That said - this method is not going to be all that reliable if used in an environment in which munging of %INC is going on.


Posted by Bradley Dean | Permalink | Categories: Perl, Programming