:I need to set up some new log reporting for a server which
:handles about 4 million hits a month. Logs are currently kept
:in sets of 2 days of activity.
:
:and doing dns lookups would take forever to do on a routine
:basis without some sort of effective cacheing.
:
:any suggestions for "big" logs..I'm getting ready to look at mkstats
:and something called analog i think..
We have about 250 Virtual hosts.
I use apache's custom log format to include '%v' - the virtual host
of the server - and the referer in a single large log file.
I do not have apache resolve the logs - I post-process them with
logresolve.
The key turned out to be to do the following :-
1. grep out each Virtual host to its own file
2. run logresolve on this (good locality compared to the single composite
log). Great speedups by taking this step.
3. run analog on the result.
Cheers, Andy!
#! /bin/sh
#
# process http logs
#
# daily script run as the www server
#
# http://www.veryComputer.com/
# For:
# Rocky Mountain Internet http://www.veryComputer.com/
#
# May safely be run many times, but not around the witching hour
# of root's rotation ..
#
GZIP=/usr/local/bin/gzip
BINDIR=/www/apache-ssl
LOGDIR=/var/log/apache-ssl-logs
#
# root turns the log over to this file - now static
#
LASTLOG=$LOGDIR/access_old
# Backups, gzipped
#
ARCHIVE=/var/log/old-apache-logs
#
# site root directories are assumed to be of the form
#
# $SITEROOT/www.wizzy.com/index.html
#
SITEROOT=/www
# Roll time back 12 hours to get the month yesterday ..
monthday=`TZ=GMT+19 date +%b-%d`
month=`TZ=GMT+19 date +%b`
onevirtualsite() {
SITE=$1
SITEDIR=$SITEROOT/$SITE/statistics
[ ! -d $SITEDIR ] && return # outta here
[ ! -f $SITEDIR/analog.conf ] && return # outta here
LOG=$SITEDIR/$month.html
grep " $SITE " $LASTLOG | \
$BINDIR/logresolve | \
$GZIP --stdout > $SITEDIR/access_log.$monthday.gz
$GZIP --decompress --stdout $SITEDIR/access_log.${month}-??.gz | \
$BINDIR/analog +g$SITEDIR/analog.conf \
+C"REFEXCLUDE http://$SITE/*" - > $LOG.new
mv -f $LOG.new $LOG
Quote:}
# Backup, in case we*up
$GZIP --stdout $LASTLOG > $ARCHIVE/access_log.${monthday}.gz
# virtual IPs
cd $SITEROOT
for d www.* ; do
onevirtualsite $d
done
exit 0