rsync problem and question about using rsync with Maildir

jw schultz jw at pegasys.ws
Fri Aug 22 08:29:16 EST 2003


On Thu, Aug 21, 2003 at 06:42:17AM -0700, Zachary Denison wrote:
> 
> Hi,
> 
> I have a Maildir store (about 500 GB) on a linux
> (redhat 8) server which I am trying to mirror to
> another identical server.  I have 4 GB of ram on both
> machines.  I am using rsync 2.5.5.  At present the
> machines are on the same lan (100Mbit).
> 
> Everytime I run the rsync, it runs for a short amount
> of time, (anywhere from 5-20 minutes) and then one of
> the machines (either the source or the destination
> machine) crashes and requires reboot.

That the machines crash is a kernel bug.  There may be
changes to how you run rsync to avoid it but any crash is
the kernel's fault.  Since you are running RH, check their
errata kernels.  Having 4GB RAM you may be running out of
zone normal and having a bounce-buffer failure.  It may pay
to boot with only 900M enabled.

> The actual command I am using is:
> 
> /usr/bin/rsync -qaz --rsh=ssh --stats --progress
> --rsync-path=/usr/bin/rsync --delete --force /users/
> 10.10.12.161:/users/
> 
> Examination of the systems "free" command during rsync
> execution shows that the rsync rapidly consumes the
> systems memory.

You describe this area as a Maildir store.  If that is in
maildir or mh format (as opposed to mbox) that means one
file per message.  Rsync's memory requirement grows linearly
with file count.  This memory consumption occurs during the
"building file list" period prior to syncing files.  If the
crash is happening during that interval then reducing the
size of the file list will probably avoid the problem.

Reducing the file list size should be fairly easy to do.
Just doing a separate rsync invocation for each user should
break it up sufficiently.

> Is this an appropriate use for rsync?  My goal is to

yes.

> be able to first synchronize the maildirs on a 100mb
> lan.  Then ship the destination machine to a remote
> location and then run periodic rsync backups over WAN
> (each site has regular 1.5mb connection to internet)
> to the remote destination server as a backup.  

Perfectly sensible.

> Does this scenario sound feasible given that the users
> directories will contain ONLY Maildirs.

yes.

> Also since Maildirs contain a large number of files
> does it make sense to tar and/or gzip each users
> Maildir and rsync the tar files?

Not really.  Least of all the gzip.

> Also is rsync over ssh contributing to my problem,

No.

> does it make sense to run an rsync server instead?

Not if there is confidential info.

> Thank you very much in advance for any hints you can
> give me.

One thing i should mention.  Maildir files are seldom, if
ever modified.  In fact unless you have a user interface
that allows users to edit a message i don't think (i could
be wrong) the files will ever be modified, only created,
renamed and deleted.  Even message user-level message
editing normally creates a new message and deletes the old.
Instead of modifying the files maildir uses file name and
location to indicate status.  New files sit in the new/
directory.

What this means is that rsync isn't going to be the most
efficient way to synchronise maildirs.  When status changes
occur rsync will only see that there is a new file and an
old one is gone.  It won't know that the file was renamed.
A utility that examines filenames will probably be able to
identify these changes and rename files instead of
retransmitting them.  It would not surprise me if such a
utility already existed.  500GB of mail sounds sufficiently
worth creating such a utility if someone has not already
done so.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt



More information about the rsync mailing list