Brad's Blog: August 2008 Archives

Tue Aug 19 22:59:39 BST 2008

Building perl executables on windows with PAR

Quick note - I needed to package up some perl code as a windows executable so I could get it running on a client site without the perl interpreter installed - pretty much the normal reason.

I'd tried a couple of methods and have seen used the commercial perl2exe being thought there must be a better way.

The favourite seems to be PAR and PAR::Packer an after a little fiddling I got it working with Activestate perl.

It requires compilation of C code, so I needed a compiler which was when I found MinGW.

So, this is what I needed to do:

Install Activestate Perl
Install MinGW - making sure gcc is installed
Add the MinGW bin path (C:\MinGW\bin\) to the PATH
Install PAR::Packer using `perl -MCPAN -e shell`.
Make sure the perl site bin path (C:\Perl\site\bin) is on the PATH
Start building binaries: `pp -o test.exe test.pl`

Magic! :)

Posted by Bradley Dean | Permalink

Sun Aug 17 17:44:34 BST 2008

Why do so many organisations break the "multipart/alternative" standard???

Time was, you could open up an email in your favourite email client and read it without too many concerns about how that email was constructed. But times change, or so it would seem...

My pain of the moment is the abuse of the "multipart/alternative" version of multi-type emails. Specifically the almost complete absence of valid textual-alternatives in these emails.

Just to recap - RFC 2046 section 5.1.4 (Alternative Subtype) states that:

5.1.4.  Alternative Subtype

   The "multipart/alternative" type is syntactically identical to
   "multipart/mixed", but the semantics are different.  In particular,
   each of the body parts is an "alternative" version of the same
   information.

   Systems should recognize that the content of the various parts are
   interchangeable.  Systems should choose the "best" type based on the
   local environment and references, in some cases even through user
   interaction.  As with "multipart/mixed", the order of body parts is
   significant.  In this case, the alternatives appear in an order of
   increasing faithfulness to the original content.  In general, the
   best choice is the LAST part of a type supported by the recipient

...

I've long since given up configuring my email client (mutt) to indicate that text is the preferred type. The number of emails that arrive claiming to have "multipart/alternative" sections but which contain only "Your email client cannot read this email" in the text section finally meant that I had to process the HTML email sections instead.

I've almost grudgingly accepted that as just the way it is - but an email I received today tipped the scales. It read (start to finish):

[emailer_top]
[emailer_bottom]

I checked the email - nothing but that and a PDF attached. I tried viewing the email with a different client and finally realised that the entire text of the email was embedded in images!

The email was a "multipart/alternative" - but the text section read:

This message is in MIME format. Since your mail reader does not understand this
format, some or all of this message may not be legible.

Superb...

So this is just a plea into the void - when anyone receives such emails contact the sender and tell them what's wrong. If you know that you or your organisation are responsible for such emails being sent - STOP DOING IT. (please!)

Posted by Bradley Dean | Permalink

Tue Aug 5 23:53:29 BST 2008

Quick XML structure summariser

I've recently been dealing with XML data with no schema or DTD available. Just 'pretty printing' the XML goes some way but what I really wanted was a skeleton-view of the XML structure to find patterns.

XML::Parser is a perl module based on the Expat XML Parser library. This gave me everything I needed for a quick script with very helpful output:

 1  #!/usr/bin/env perl
 2
 3  use XML::Parser;
 4
 5  my $parser = XML::Parser->new( Handlers => { Start => \&handle_start }
 6                               , ErrorContext => 3
 7                               );
 8
 9  for my $filepath ( @ARGV ) {
10    print "***** Parsing: ${filepath} *****\n";
11    eval { $parser->parsefile($filepath) };
12    print "XML Parsing Error: $@" if $@;
13    print "***** End of: ${filepath} *****\n";
14  }
15
16  sub handle_start {
17    my ($expat, $element, %attrs) = @_;
18
19    print '/' . join('/', $expat->context(), $element);
20    if ( %attrs ) {
21      print ' ... ATTRS: ' . join(', ', sort keys %attrs)
22    }
23    else {
24      print ' ... NO-ATTRS';
25    }
26    print "\n";
27  }

This produces output like:

***** Parsing: /path/to/atom.xml *****
/feed ... ATTRS: version, xmlns, xmlns:dc
/feed/title ... ATTRS: mode
/feed/link ... ATTRS: href, rel, type
/feed/modified ... NO-ATTRS
/feed/author ... NO-ATTRS
/feed/author/name ... NO-ATTRS
/feed/author/url ... NO-ATTRS
/feed/entry ... NO-ATTRS
/feed/entry/title ... ATTRS: mode
/feed/entry/author ... NO-ATTRS
/feed/entry/author/name ... NO-ATTRS
/feed/entry/link ... ATTRS: href, rel, type
/feed/entry/id ... NO-ATTRS
/feed/entry/issued ... NO-ATTRS
/feed/entry/modified ... NO-ATTRS
/feed/entry/created ... NO-ATTRS
/feed/entry/content ... ATTRS: mode, type, xml:lang, xml:space
/feed/entry ... NO-ATTRS
/feed/entry/title ... ATTRS: mode
/feed/entry/author ... NO-ATTRS
/feed/entry/author/name ... NO-ATTRS
/feed/entry/link ... ATTRS: href, rel, type
/feed/entry/id ... NO-ATTRS
/feed/entry/issued ... NO-ATTRS
/feed/entry/modified ... NO-ATTRS
/feed/entry/created ... NO-ATTRS
/feed/entry/content ... ATTRS: mode, type, xml:lang, xml:space
***** End of: /path/to/atom.xml *****

Available here: xml_struct.txt

Posted by Bradley Dean | Permalink