improved man page (was why does --size-only not detect change only
is size (but also time)?)
C Sights
csights at fastmail.fm
Sat Apr 21 00:01:45 GMT 2007
> I proposed essentially the same thing here:
>
> http://lists.samba.org/archive/rsync/2007-February/017221.html
>
> Let's work on an improved man page together!
Ok! You certainly did a lot of thinking organizing those sections.
>
> I like letting the user choose whether to tweak or rewrite the file,
> but IMNSHO, --tweak should remain the default behavior. The only
> times you would want --no-tweak are (1) when you never want a process
> to see an intermediate state of a file with some but not all tweaks
> performed and (2) when you don't want to affect other hard links, in
> which case you could use a milder option --no-tweak-hlinked that I
> proposed here:
>
> http://lists.samba.org/archive/rsync/2006-September/016207.html
Yeah, I can see the use of --no-tweak-hlinked. My interested also stems from
an incremental backup with hard-links to older files. In my case though I
want all hard-linked files to be tweaked rather than broken. (I am backing
up the system, not user data, so I don't care if the modtimes change.)
I think --tweak is a default behavior only from a certain perspective: It is
the default if you understand how rsync *decides* the data portion of the
file is different, not from the perspective of whether the data portion *is*
different.
An example assuming Untitled.pdf and Untitled-2.pdf have the same data part
but different modtimes:
#rsync -ani --size-only Untitled.pdf Untitled-2.pdf
.f..t...... Untitled.pdf
#rsync -ani Untitled.pdf Untitled-2.pdf
>f..t...... Untitled.pdf
From the "how rsync decides the data is different" perspective, in the first
update the metadata is tweaked because rsync decided the data part was the
same and the second update was a Copy-Delete-Move (CDM) (is there an official
phrase for this?).
From the "is the data different" perspective, tweaking is not the default in
both cases. The first was tweaked, the second was CDM, even though the data
was the same.
If there ever is a "--tweak" and "--no-tweak" it would be most easy to
understand if they were implemented from a "is the data different"
perspective. I.e. If the file's data *is* the same, just tweak the
metadata. This is seems equivalent to requiring --checksum. (Though if I
understand checksum correctly, it could be implemented a little more
efficiently if first --size was done, then the receiver asked the sender for
checksums of only those files whose sizes match. Right now I think the
sender checksums every file.)
Of course, the man page can be updated to help people understand how rsync
decides the file's data is different now.
> That is not rsync's current default behavior. It would be rsync's
> default behavior if --no-tweak were made the default, except for one
> technicality. Rewriting due to a preserved metadata difference would
> be considered a local creation rather than a transfer and itemized
> with first character "c" rather than "<" or ">", just as with
> --link-dest.
I didn't catch the c thing... :(
>
> I would explain it like this:
[I would add a preamble defining "file", "metadata", and "data, see below]
> By default, rsync considers a file's data to be unchanged if its size
> and last modification time match. If rsync considers the data
> changed, it transfers the file. If rsync considers the data unchanged
> but preserved metadata differs, rsync applies the new metadata to the
> existing file ("tweaks" the file)
[here redirect to a larger section explaining these things more fully, see
below]
> unless directed not to do so by
> --no-tweak or --no-tweak-hlinked. In this case, the receiver locally
> copies the file, applies the new metadata to the copy, and moves the
> copy over the original.
Here is another version with the addition of those suggestions:
"Up-to-date criterion:"
A file has two parts: metadata and data. Metadata includes such things as the
modtime, ownership, permissions, etc. The data part of a file is the
contents of the file.
By default, rsync considers a file's data to be unchanged if its size
and last modification time match. If rsync considers the data changed, it
transfers the file by creating a temporary file, deleting the original, then
moving the temporary file into the original file's position. If rsync
considers the data unchanged but preserved[?] metadata differs, rsync applies
the sender's metadata to the existing file. (see "On-the-disk file transfer
mechanics" for options changing how files are updated.)
--size-only
Instead of the default behavior, rsync considers a file's data to be
unchanged if only its size match.
-c, --checksum
Instead of the default behavior, rsync considers a file's data to be
unchanged if its size and checksum match. The sender generates 128-bit MD4
checksums for all file's data, while the receiver generates checksums for
only file's data whose sizes match. This can slow things down significantly
relative to default or --size-only behavior.
Note that rsync always verifies that each transferred file was correctly
reconstructed on the receiving side by checking a whole-file checksum that is
generated when as the file is transferred, but that automatic
after-the-transfer verification has nothing to do with this option's
before-the-transfer "Does this file need to be updated?" check.
-I, --ignore-times
Instead of the default behavior, rsync does no checks and updates all files.
--modify-window
[...etc....]
"On-the-disk file transfer mechanics:"
By default, when rsync detects a difference in the file's data a temporary
file is created in the same directory, the original is deleted, then moves
the temporary file into the original file's position. If rsync considers the
data unchanged but metadata differs, rsync applies the sender's metadata to
the existing file. (See "Up-to-date criterion" for how rsync detects changes
in the file's data.)
[
--tweak
--no-tweak
--no-tweak-hlinked
]
--inplace
--append
--sparse (implies --no-tweak)
--delay-updates (conflicts with [?] --tweak, --inplace, --append)
--partial
--partial-dir
"Over-the-wire file transfer mechanics:" [I think this should be a separate
section]
By default, rsync uses the rsync algorithm(!) .....
--whole-file
--block-size
--checksum-seed
--compress
--compress-level
[...etc....]
> I think you still mean something different by "transfer" than the rest
> of us (and the existing man page) do. "Transferring" refers only to a
> regular file's *data*, not permissions, times, or anything like that.
> (The "transfer" also sometimes refers to the entire file list or rsync
> run, but Wayne has been moving away from this usage.) Please refer to
> these options as options for changing what is "preserved".
Why not just define "file", "data", and "metadata" (as above), then
say "tranfer file", "transfer data" (or more explicitly "tranfer file's
data"), "transfer preserved metadata"?
Why not get the man page source and first rearrange the options, then start
clarifying them (add default behavior, define file, metadata, data)? I
wouldn't want to rewrite from scratch b/c there is a lot knowledge in there
already. I think that would be a big improvement!
Bye,
C.
More information about the rsync
mailing list