filter_apache_log.pl

A simple script - only combined log format implemented at this point allows specific fields from an apache log (or any log with extension of the code) to be viewed.

The script filters out some lines (eg blank lines), passes through some lines (eg the filename lines from a multi-file tail) and will abort with any unknown line (so that you know to handle/skip/pass-through those lines)

Also here: https://gist.github.com/bjdean/5726807#file-filter_apache_log-pl

Example usage

Tail out all apache access logs and look at IPs and User-Agents:

/var/log/apache2$ tail -f *access*log | filter_apache_log.pl -ip --usera | head -30
==> 60iv.aicsa.org.au-access.log <==
66.249.73.16 "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 
66.249.73.16 "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 
80.57.78.214 "Mozilla/5.0 (compatible; MJ12bot/v1.4.3; http://www.majestic12.co.uk/bot.php?+)" 
80.57.78.214 "Mozilla/5.0 (compatible; MJ12bot/v1.4.3; http://www.majestic12.co.uk/bot.php?+)" 
91.64.153.168 "Mozilla/5.0 (compatible; MJ12bot/v1.4.3; http://www.majestic12.co.uk/bot.php?+)" 
91.64.153.168 "Mozilla/5.0 (compatible; MJ12bot/v1.4.3; http://www.majestic12.co.uk/bot.php?+)" 
91.64.153.168 "Mozilla/5.0 (compatible; MJ12bot/v1.4.3; http://www.majestic12.co.uk/bot.php?+)" 
91.64.153.168 "Mozilla/5.0 (compatible; MJ12bot/v1.4.3; http://www.majestic12.co.uk/bot.php?+)" 
100.43.83.153 "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 
100.43.83.153 "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 
==> access.log <==
127.0.0.1 "monit/4.8.1" 
127.0.0.1 "monit/4.8.1" 
127.0.0.1 "libwww-perl/5.808" 
127.0.0.1 "libwww-perl/5.808" 
127.0.0.1 "libwww-perl/5.808" 
127.0.0.1 "monit/4.8.1" 
127.0.0.1 "libwww-perl/5.808" 
127.0.0.1 "libwww-perl/5.808" 
127.0.0.1 "libwww-perl/5.808" 
127.0.0.1 "monit/4.8.1" 
==> aicsa.org.au-access.log <==
216.172.141.107 "Mozilla/5.0 (Windows NT 5.1; rv:13.0) Gecko/20100101 Firefox/13.0.1" 
183.221.250.141 "Mozilla/5.0 (Linux; U; Android 2.2; fr-fr; Desire_A8181 Build/FRF91) App3leWebKit/53.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1" 
184.154.124.146 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; PeoplePal 6.2)" 
5.39.95.193 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1" 
5.39.95.193 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1" 
5.39.95.193 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1" 
216.152.251.37 "Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.02" 

Source

Download: filter_apache_log.pl Raw:

#!/usr/bin/perl

# filter_apache_log.pl - quick filter of apache logs to show specific fields
# Copyright (C) 2008 Bradley Dean <bjdean@bjdean.id.au>
# 
# This program is free software: you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free Software
# Foundation, either version 3 of the License, or (at your option) any later
# version.
# 
# This program is distributed in the hope that it will be useful, but WITHOUT ANY
# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.  See the GNU General Public License for more details.
# 
# You should have received a copy of the GNU General Public License along with
# this program.  If not, see <http://www.gnu.org/licenses/>.

# A simple script - only combined log format implemented at this point
# allows specific fields from an apache log (or any log with extension
# of the code) to be viewed.
#
# The script filters out some lines (eg blank lines), passes through some
# lines (eg the filename lines from a multi-file tail) and will abort
# with any unknown line (so that you know to handle/skip/pass-through those
# lines)
#
# Example usage: Look at IPs and User-Agents
# 
#  $ tail -f /var/log/apache/*access*log | filter_apache_log.pl -ip -useragent

use strict;
use warnings;

use Getopt::Long;
use IO::Handle;

# No buffering output
autoflush STDOUT 1;

# commmand line arguments
my $show_all;
my (
        $show_ip, $show_ident, $show_user, $show_date, $show_path, $show_response,
        $show_size, $show_referrer, $show_useragent
);
my $result = GetOptions(
        "ip" => \$show_ip,
        "ident" => \$show_ident,
        "user" => \$show_user,
        "date" => \$show_date,
        "path" => \$show_path,
        "response" => \$show_response,
        "size" => \$show_size,
        "referrer" => \$show_referrer,
        "useragent" => \$show_useragent,
        "all" => \$show_all,
) or die;

# Read and filter
LINE:
while ( my $line = <STDIN> ) {
        # Special lines - pass through
        if (
                # tail -f file names
                $line =~ /^==>.*<==$/
        ) {
                print $line;
                next LINE;
        }

        # Special lines - skip
        if (
                # empty lines
                $line =~ /^\s*$/
        ) {
                next LINE;
        }

        # Apache line formats
        my ($ip, $ident, $user, $date, $path, $response, $size, $referrer, $useragent);
        if ( my @match = $line =~ /
                                ^\s*
                                ([0-9\.]+)\s # ip
                                (\S+)\s # ident
                                (\S+)\s # user
                                (\[.*?\])\s+ # date
                                (".*?")\s+ # path
                                (\S+)\s+ # response
                                (\S+)\s+ # size
                                (".*?")\s+ # referrer
                                (".*?") # user-agent
                                /x ) {
                ($ip, $ident, $user, $date, $path, $response, $size, $referrer, $useragent) = @match;
                my $line = "";
                $line .= fmt_val($ip) if ( $show_all || $show_ip );
                $line .= fmt_val($ident) if ( $show_all || $show_ident );
                $line .= fmt_val($user) if ( $show_all || $show_user );
                $line .= fmt_val($date) if ( $show_all || $show_date );
                $line .= fmt_val($path) if ( $show_all || $show_path );
                $line .= fmt_val($response) if ( $show_all || $show_response );
                $line .= fmt_val($size) if ( $show_all || $show_size );
                $line .= fmt_val($referrer) if ( $referrer ne '"-"' ) && ( $show_all || $show_referrer );
                $line .= fmt_val($useragent) if ( $show_all || $show_useragent );
                print "${line}\n" if ( $line =~ /\S/ );
                next LINE;
        }
        else {
                die "Unmatched log line: ${line}";
        }
}

sub fmt_val {
        my ($val) = @_;
        if ( $val ) {
                return "${val} "
        }
        else {
                return " ";
        }
}

BradsWiki: Programming Notes/PerlProgramming/ApacheLogFilter (last edited 2013-06-07 03:18:32 by BradleyDean)