[Bug 14109] New: Support Custom Fuzzy Basis Selection Algorithm
samba-bugs at samba.org
samba-bugs at samba.org
Sun Sep 1 22:55:39 UTC 2019
https://bugzilla.samba.org/show_bug.cgi?id=14109
Bug ID: 14109
Summary: Support Custom Fuzzy Basis Selection Algorithm
Product: rsync
Version: 3.1.3
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P5
Component: core
Assignee: wayne at opencoder.net
Reporter: lonniebiz at yahoo.com
QA Contact: rsync-qa at samba.org
The --fuzzy argument does an incredible job at syncing large files when it
chooses the correct fuzzy basis.
However, the default "fuzzy-basis-destination-file-selection algorithm" is not
correct for every situation, so I propose the ability to pass an argument to
the fuzzy parameter that specifies which
"fuzzy-basis-destination-file-selection algorithm" to use.
I've posted a question detailing my needs here:
https://unix.stackexchange.com/questions/538548/
In short, some of the files in my source-folder are 200GB in size. When rsync
chooses the correct existing-destination-file for its "fuzzy basis", my
synchronization (of these files) seems magical in term of the data that gets
transferred over the wire.
However, when it chooses the wrong existing-destination-file as the source
file's fuzzy basis, the data transfer can take days.
Look at the filenames in both my source-folder an destination-folder (below):
# Source Folder's new files (from today's on-site backup):
file100-2019_09-01_12am.log
file100-2019_09-01_12am.lzo
file101-2019_09-01_12am.log
file101-2019_09-01_12am.lzo
file102-2019_09-01_12am.log
file102-2019_09-01_12am.lzo
# Destination-Folder's old files (from yesterday's off-site backup):
file100-2019_08-31_12am.log
file100-2019_08-31_12am.lzo
file101-2019_08-31_12am.log
file101-2019_08-31_12am.lzo
file102-2019_08-31_12am.log
file102-2019_08-31_12am.lzo
In my case, the fuzzy-basis-selection-algorithm needs to select the existing
destination-file that:
1) Has the same file extension as the source file
2) Begins with the most consecutively identical characters as the source file
The default algorithm does not meet these requirements.
Therefore, I propose the ability to pass an argument that allows the user to
specify non-default fuzzy basis selection algorithms.
There should probably be a few common, baked-in ones (as time goes on) that you
can choose from by name and it would be even more flexible if rsync also
permitted the user the ability pass a file into the command that specifies a
custom "fuzzy-basis-destination-file-selection algorithm".
Naturally, if these features are granted, the documentation would also need to
be update to give guidance on specifying these things.
If these things are already implemented, and I have somehow overlooked them,
would you kindly post an answer to my question here?:
https://unix.stackexchange.com/questions/538548/
--
You are receiving this mail because:
You are the QA Contact for the bug.
More information about the rsync
mailing list