improved man page (was why does --size-only not detect change only is size (but also time)?)

C Sights csights at fastmail.fm
Sat Apr 21 00:01:45 GMT 2007


> I proposed essentially the same thing here:
>
> http://lists.samba.org/archive/rsync/2007-February/017221.html
>
> Let's work on an improved man page together!

	Ok!  You certainly did a lot of thinking organizing those sections.
>
> I like letting the user choose whether to tweak or rewrite the file,
> but IMNSHO, --tweak should remain the default behavior.  The only
> times you would want --no-tweak are (1) when you never want a process
> to see an intermediate state of a file with some but not all tweaks
> performed and (2) when you don't want to affect other hard links, in
> which case you could use a milder option --no-tweak-hlinked that I
> proposed here:
>
> http://lists.samba.org/archive/rsync/2006-September/016207.html

	Yeah, I can see the use of --no-tweak-hlinked.  My interested also stems from 
an incremental backup with hard-links to older files.  In my case though I 
want all hard-linked files to be tweaked rather than broken.  (I am backing 
up the system, not user data, so I don't care if the modtimes change.)
	I think --tweak is a default behavior only from a certain perspective:  It is 
the default if you understand how rsync *decides* the data portion of the 
file is different, not from the perspective of whether the data portion *is* 
different.

An example assuming Untitled.pdf and Untitled-2.pdf have the same data part 
but different modtimes:

#rsync -ani --size-only Untitled.pdf Untitled-2.pdf
.f..t...... Untitled.pdf

#rsync -ani Untitled.pdf Untitled-2.pdf
>f..t...... Untitled.pdf

	From the "how rsync decides the data is different" perspective, in the first 
update the metadata is tweaked because rsync decided the data part was the 
same and the second update was a Copy-Delete-Move (CDM) (is there an official 
phrase for this?).
	From the "is the data different" perspective, tweaking is not the default in 
both cases.  The first was tweaked, the second was CDM, even though the data 
was the same.
	If there ever is a "--tweak" and "--no-tweak" it would be most easy to 
understand if they were implemented from a "is the data different" 
perspective.  I.e.  If the file's data *is* the same, just tweak the 
metadata.  This is seems equivalent to requiring --checksum.  (Though if I 
understand checksum correctly, it could be implemented a little more 
efficiently if first --size was done, then the receiver asked the sender for 
checksums of only those files whose sizes match.  Right now I think the 
sender checksums every file.)

	Of course, the man page can be updated to help people understand how rsync 
decides the file's data is different now.

> That is not rsync's current default behavior.  It would be rsync's
> default behavior if --no-tweak were made the default, except for one
> technicality.  Rewriting due to a preserved metadata difference would
> be considered a local creation rather than a transfer and itemized
> with first character "c" rather than "<" or ">", just as with
> --link-dest.

I didn't catch the c thing... :(

>
> I would explain it like this:

[I would add a preamble defining "file", "metadata", and "data, see below]

> By default, rsync considers a file's data to be unchanged if its size
> and last modification time match.  If rsync considers the data
> changed, it transfers the file.  If rsync considers the data unchanged
> but preserved metadata differs, rsync applies the new metadata to the
> existing file ("tweaks" the file) 

[here redirect to a larger section explaining these things more fully, see 
below]

> unless directed not to do so by 
> --no-tweak or --no-tweak-hlinked.  In this case, the receiver locally
> copies the file, applies the new metadata to the copy, and moves the
> copy over the original.

	Here is another version with the addition of those suggestions:

"Up-to-date criterion:"

A file has two parts: metadata and data.  Metadata includes such things as the 
modtime, ownership, permissions, etc.  The data part of a file is the 
contents of the file.

By default, rsync considers a file's data to be unchanged if its size
and last modification time match.  If rsync considers the data changed, it 
transfers the file by creating a temporary file, deleting the original, then 
moving the temporary file into the original file's position.  If rsync 
considers the data unchanged but preserved[?] metadata differs, rsync applies 
the sender's metadata to the existing file.  (see "On-the-disk file transfer 
mechanics" for options changing how files are updated.)

--size-only
	Instead of the default behavior, rsync considers a file's data to be 
unchanged if only its size match.

-c, --checksum
	Instead of the default behavior, rsync considers a file's data to be 
unchanged if its size and checksum match.  The sender generates 128-bit MD4 
checksums for all file's data, while the receiver generates checksums for 
only file's data whose sizes match.  This can slow things down significantly 
relative to  default or --size-only behavior.

	Note that rsync always verifies that each transferred file was correctly 
reconstructed on the receiving side by checking a whole-file checksum that is 
generated when as the file is transferred, but that automatic 
after-the-transfer verification has nothing to do with this option's 
before-the-transfer "Does this file need to be updated?" check.

-I, --ignore-times
	Instead of the default behavior, rsync does no checks and updates all files.

--modify-window
[...etc....]


"On-the-disk file transfer mechanics:"

By default, when rsync detects a difference in the file's data a temporary 
file is created in the same directory, the original is deleted, then moves 
the temporary file into the original file's position.  If rsync considers the 
data unchanged but metadata differs, rsync applies the sender's metadata to 
the existing file. (See "Up-to-date criterion" for how rsync detects changes 
in the file's data.)
[
--tweak
--no-tweak
--no-tweak-hlinked
]
--inplace
--append
--sparse (implies --no-tweak)
--delay-updates (conflicts with [?] --tweak, --inplace, --append)
--partial
--partial-dir


"Over-the-wire file transfer mechanics:" [I think this should be a separate 
section]

By default, rsync uses the rsync algorithm(!) .....
--whole-file
--block-size
--checksum-seed
--compress
--compress-level

[...etc....]


> I think you still mean something different by "transfer" than the rest
> of us (and the existing man page) do.  "Transferring" refers only to a
> regular file's *data*, not permissions, times, or anything like that.
> (The "transfer" also sometimes refers to the entire file list or rsync
> run, but Wayne has been moving away from this usage.)  Please refer to
> these options as options for changing what is "preserved".

Why not just define "file", "data", and "metadata" (as above), then 
say "tranfer file", "transfer data" (or more explicitly "tranfer file's 
data"), "transfer preserved metadata"?  

Why not get the man page source and first rearrange the options, then start 
clarifying them (add default behavior, define file, metadata, data)?   I 
wouldn't want to rewrite from scratch b/c there is a lot knowledge in there 
already.  I think that would be a big improvement!

Bye,
	C.


More information about the rsync mailing list