[syslog-ng]Playing "catchup"...
Jay Guerette
syslog-ng@lists.balabit.hu
Sun, 2 Jan 2005 00:50:08 -0500
Below is a template for a program I've used, in about 20 different
variations, for the past 4 years. I recycle the same code, and just
change the parsing function.
- It runs as a daemon.
- It follows the specified log file, and presents each new line to the
log_parse function.
- It writes an offset in a conf file after each successful line parsed.
- Whenever it starts, it will start exactly where it left off, and
catch up if necessary.
- It follows log files when they rotate (sort of: it assumes the log
file is named '/var/log/foo.$YEAR.$MONTH.$DAY') .
- You can debug/test your parsing by running it with the arg '-nodaemon'.
- If you run it, it will kill any already running instances.
- Yeah sure, there should be more error checking, and this or that
should be written differently, but this WORKS and has never failed me
in 4 years. If I ever have enough free time to add error checking or
rewrite this or that, it probably means I've died.
I have 8 of these running on 7 different log files on a busy
production system, currently handling about 13M lines a day.
Somewhere I have a fully commented version lying around. :-P If you
have trouble untangling it, let me know and I'll help you out.
<code>
#! /usr/bin/perl
$log = '/var/log/foo';
$args = join(' ',@ARGV);
($name) = ($0=~/([\s\w\-_]+)$/); $0 = $name;
if ($args!~/-nodaemon/i) { &daemonize; }
else { select(STDOUT); $|=1; }
if (open(PID,"<$pid")) {
kill(15, $lastpid) if (defined($lastpid = <PID>));
close(PID);
}
open PID,">$pid"; print PID "$$\n"; close PID;
foreach (`ps axo pid,cmd`) {
next unless (/$name/i);
($pid)=(/(\d+)\s+$name/i);
next if ($pid == $$);
print "TERM $name [$pid]\n";
kill('TERM',$pid); sleep(2);
if (kill(0,$pid)) {
print "KILL $name [$pid]\n";
kill('KILL',$pid); sleep(2);
}
die "couldn't KILL existing process!\n" if kill(0,$pid);
}
$conf = "/var/run/$name.conf";
$pid = "/var/run/$name.pid";
($hup,$term,$maxread,$maxbytes,$byteoffset)=(0,0,15,0,0);
$SIG{HUP}=sub{ $hup=1; };
$SIG{INT}=sub{ $term=1; };
$SIG{TERM}=sub{ $term=1; };
setpriority 'PRIO_PROCESS',$$,-5;
while (!$hup && !$term) {
if (@log_buffer = check_log()) {
foreach $line (@log_buffer) {
$maxread = $length if (($length = 1 + length($line)) > $maxread);
log_parse($line);
open CONF, ">$conf"; print CONF "$maxread $maxbytes ".($byteoffset
+= $length); close CONF;
sysseek(LOG, $byteoffset, 0);
}
@log_buffer = ();
}
select(undef, undef, undef, 0.05);
}
if ($term || ($args=~/-nodaemon/i)) {
exit;
}
defined(my $parent=fork) or die;
exit if ($parent);
exec($name);
exit;
sub daemonize {
chdir '/' or die;
open STDIN,'</dev/null' or die "Couldn't break fron STDIN?\nAborting.\n";
open STDOUT,'>/dev/null' or die "Couldn't break from STDOUT?\nAborting.\n";
defined(my $pid=fork) or die "Couldn't fork!\nAborting.\n";
exit if $pid;
use POSIX 'setsid';
POSIX::setsid or die "Couldn't set SID?\nAborting.\n";
open STDERR,'>/dev/null' or die "Couldn't break from STDERR?\nAborting.\n";
}
sub log_open {
($mday,$mon,$year) = (localtime)[3..5];
$current = sprintf("%d.%02d.%02d", 1900+$year, 1+$mon, 0+$mday);
sysopen(LOG, "$log.$current", 0);
unlink($conf) if ($args=~/-init/i);
if (-e $conf) {
open CONF,"<$conf"; ($maxread,$maxbytes,$byteoffset)=split('
',<CONF>); close CONF;
$byteoffset = $byteeof if ($byteoffset > ($byteeof = 0 + sysseek(LOG, 0, 2)));
$byteoffset = 0 + sysseek(LOG, $byteoffset, 0);
}
else {
$byteoffset = 0 + sysseek(LOG, 0, 2);
}
open CONF,">$conf"; print CONF "$maxread $maxbytes $byteoffset"; close CONF;
print "opened: $log.$current\n";
}
sub log_close {
utime(undef, undef, $conf);
close LOG;
print "closed: $log.$current\n";
}
sub check_log {
if ($log_wait) {
return if (time() < $log_wait);
undef($log_wait);
log_open();
return;
}
if ($stat_log) {
return if (time() < $stat_log);
undef($stat_log);
($mday,$mon,$year) = (localtime)[3..5];
my $stamp = sprintf("%d.%02d.%02d", 1900+$year, 1+$mon, 0+$mday);
if ($stamp ne $current || (stat("$log.$current"))[7] < $byteoffset) {
log_close();
$log_wait = time() + 1;
return;
}
}
$bytes = sysread(LOG, $buffer, $maxread);
if (!defined($bytes)) {
$log_wait = time() + 1;
return;
}
unless ($bytes) {
$stat_log = time() + 5;
return;
}
while ($bytesread = sysread LOG, $buffer, $maxread, $bytes) {
last if (($bytes += $bytesread) > $maxbytes && $maxbytes > 0);
}
$buffer = substr($buffer, 0, $last) if (($last = rindex($buffer, "\n")) >= 0);
foreach $part (split('\n', $buffer)) {
push(@log_buffer, $part);
}
return(@log_buffer);
}
sub log_parse {
my $line = shift;
$line =~ s/[\r\n\s]+/ /g;
return if ($line =~ /^\s+$/);
print "$line\n";
# do something here
}
</code>
On Sat, 01 Jan 2005 18:32:01 -0800, Ed Walker <ewalker@surfcity.net> wrote:
> In the event that SQL injection dies on the central loghost, we've thought
> of keeping a copy in a file as well.
>
> And if, for some reason, syslog-ng on the central server dies, or
> connectivity is lost, we're already keeping a copy of the logging data on
> each remote server.
>
> So, what's the best way to play "catch up" when something dies, and making
> it as easy as possible to import the missed data, without accidentally
> introducing duplication?