A handy dandy regular expression to parse out fields from Apache’s combined log format:
/^(\S+)\s # requestor (\S+)\s # ? (\S+)\s # ? \[([^\]]*)\]\s # time "([^"]*)"\s # URL (\d*)\s # result (\d*)\s # bytes "([^"]*)"\s # referrer "([^"]*)"$/x # user agent
Thanks for this, very useful. I updated the code to fill in your question makrs:
/^
(\S+)\s # requestor
(\S+)\s # RFC 1413 identity of the client determined by identd (highly unreliable – do not use)
(\S+)\s # http userid
\[([^\]]*)\]\s # time
“([^”]*)”\s # URL
(\d*)\s # result
(\d*)\s # bytes
“([^”]*)”\s # referrer
“([^”]*)” # user agent
$/x;
(\d*)\s # bytes, could also be a “-” char . So I suggest to use :
(\C*)\s
user agent may (and sometimes does) contain escaped quotes, eg “\”Mozilla/4.0\””, so I’d suggest “(.*)” for that