matt at mattmccutchen.net
Mon Aug 18 16:53:25 GMT 2008
On Mon, 2008-08-18 at 10:36 -0500, lists at trcintl.com wrote:
> I have an identical set of directories at two locations. When a file
> is added to one location, I'll call it the source side, I want to run
> a script that picks up that file and copies it to the other location,
> say the destination side. Simple enough.
> However, I want to schedule the script to run, say every 15 minutes.
> That way if a file is put on the source side, the cript will pick it
> up and begin copying it. However, if the file is a few hundred MB, it
> might take longer than 15 minutes to copy it. So the next time the
> script runs, I need rsync to skip that file since it is still being
> copied from the first run and move to the next file. That same thing
> might be repeated during the next run.
> In other words, I can't wait until the first run has completed the
> large copy to begin copying additional files. I want to start a
> second, third, fourth, etc copy that begins working on any additional
> files that may have been placed on the source side.
Fixing the problem with locking is trickier than it might appear.
Suppose two large files A and B are added to the source. The script
runs and starts copying A; the rsync generator works ahead and tells the
sender that B also needs a transfer. The generator shouldn't lock B at
this point, because that would force B to wait for A, defeating the
purpose of using multiple concurrent rsyncs. 15 minutes later, a second
instance of the script runs, skips A, and starts copying B. When the
first rsync sender finishes A, it needs to know to skip B even though
the generator has requested a transfer, and even if the second instance
has exited (and released any lock?).
I'm thinking that it may be easier to use one rsync run per file and
have the script keep track of what is working on what.
If the goal is primarily to avoid having small files wait behind large
files, another approach would be to have multiple periodic rsync jobs,
each of which deals with files in a different size range (using
--min-size and --max-size). At most one instance of each job would run
at a time.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://lists.samba.org/archive/rsync/attachments/20080818/cd79f48a/attachment.bin
More information about the rsync