ruby: parse Apache Serverlogs

Wed, 01. Jul 2009

Categories: en sysadmin Tags: apache logfile parse Ruby serverlogs

Yields a hashmap for every line:

 1require 'date'
 2
 3class Apache
 4  def self.each_request(src=$stdin)
 5    ip_pat=/(?:[0-9]+\.){3}[0-9]+/
 6    date_pat=/\[[^\]]+\]/
 7    req_pat=/"([A-Z]+)\s([^\s]+)\s([^\s]+)"/
 8    ref_pat=/"([^"]+)"/
 9    date_fmt = '[%d/%b/%Y:%H:%M:%S %Z]'
10    apache_pat = Regexp.new( "(#{ip_pat})\s([^\s]+)\s([^\s]+)\s" <<
11      "(#{date_pat})\s#{req_pat}\s([0-9]+)\s(-|[0-9]+)\s#{ref_pat}\s#{ref_pat}")
12    src.each_line do |l|
13      m = apache_pat.match l
14      if m
15        r = { :ip => m[1],
16          :uid => m[2],
17          :auth => m[3],
18          :date => DateTime.strptime(m[4],date_fmt),
19          :method => m[5],
20          :url => m[6],
21          :http => m[7],
22          :status => m[8],
23          :bytes => m[9],
24          :referrer => m[10],
25          :agent => m[11] }
26        yield r
27       else
28        $stderr.puts "Unparseable line: '#{l}'"
29      end
30    end
31  end
32end

maybe there are faster ways, but it’s quite convenient this way.