apache log backups

Darxus at ChaosReigns.com Darxus at ChaosReigns.com
Wed Jan 10 15:32:28 GMT 2007


On 01/09, Wayne Davison wrote:
> > example, www.chaosreigns.com-access.log.196.gz on the origin is the same
> > file as www.chaosreigns.com-access.log.186.gz on the destination, so

> The --fuzzy option might help, but only if the filenames that moved
> don't already exist.  Rsync expects that an existing file is the right

No most of the files already exist.

> file to use for old data, and never checks for a different file as a
> better source of old data.  For the case of rotated logs, you're better
> off first running an appropriate number of file rotations on the
> receiving system, and then doing a copy of the new files.

Surely an external script running on the other end trying to figure out how
to get the data on the destination to match the origin before rsyncing is
not the ideal way to handle this.

I realize handling it with checksums might be overkill.  What about,
anytime rsync comes across a file on the origin that includes
.log.<number>.gz (or bz2) it looks for a file with the same size and
modification time with a different number on the destination, it remembers
the difference for that directory and does the rest of the comparisons
in that directory for matching files with the same offset first, then
with the same file name as usual (per file)?

For example, it comes across www.chaosreigns.com-access.log.196.gz on the
origin, and notices that that file on the destination doesn't match time
and filesize, so it looks at the files in this sequence:

www.chaosreigns.com-access.log.197.gz
www.chaosreigns.com-access.log.195.gz
www.chaosreigns.com-access.log.198.gz
www.chaosreigns.com-access.log.194.gz
www.chaosreigns.com-access.log.199.gz
www.chaosreigns.com-access.log.193.gz
www.chaosreigns.com-access.log.200.gz
www.chaosreigns.com-access.log.192.gz
www.chaosreigns.com-access.log.201.gz
www.chaosreigns.com-access.log.191.gz
www.chaosreigns.com-access.log.202.gz
...
www.chaosreigns.com-access.log.187.gz
www.chaosreigns.com-access.log.208.gz
www.chaosreigns.com-access.log.186.gz

Until it gets to www.chaosreigns.com-access.log.186.gz and notices it does
match timestamp and file size, and records (only while it's in this
directory) that the offset is -10.  So when it looks at the file named,
say, www.chaosreigns.com-access.log.39.gz it first checks
www.chaosreigns.com-access.log.29.gz on the destination and if that
matches, renames it to www.chaosreigns.com-access.log.29.gz.
If it doesn't match it behaves as it normally would, transferring
www.chaosreigns.com-access.log.39.gz from the origin.

Any chance of that feature being added?

These logs being unnecessarily transferred accounted for about 55% of the
data in my 26 hour rsync... 14 unnecessary hours.

-- 
"Blessed are they who, in the face of death, think only about the
front sight."
http://www.ChaosReigns.com


More information about the rsync mailing list