Hi,<div><br></div><div>Thanks for that!</div><div><br><div class="gmail_quote">On 12 August 2012 18:41, Wayne Davison <span dir="ltr"><<a href="mailto:wayned@samba.org" target="_blank">wayned@samba.org</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="gmail_quote"><div class="im">I have imagined making the code pretend that the partial file and any destination file are concatenated together for the purpose of generating checksums.  That would allow content references to both files, but rsync would need to be enhanced to open both files in both the generator and the receiver and be able to figure out what read goes where (which shouldn't be too hard).  I'd suggest that the code read the partial file first, padding out the end of its data to an even checksum-sized unit so that the destination file starts on a even checksum boundary (so that the code never needs to combine data from two files in a single checksum or copy reference).</div>


</div></blockquote><div><br></div><div>So if I'm inspired and somehow magically find the time, it's at least feasible.</div><div><br></div><div>I'm not seeing why the generator would need to be different, though; the receiver would be doing the see-through magic (treating the partial as though it were overlaid on the beginning of the target).</div>


<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="gmail_quote"><div class="im"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div class="gmail_quote">If so, it would appear that this means a large amount of unnecessary data may end up being transferred in the second sync of a large file if you interrupt the first sync.</div></blockquote><div><br>


</div></div><div>It all depends on where you interrupt it and how much data matches in the remaining portion of the destination file.  It does give you the option of discarding the partial data if it is too short to be useful, or possibly doing your own concatenation of the whole (or trailing portion) of the destination file onto the partial file, should you want to tweak things before resuming the transfer.</div>


<span class="HOEnZb"><font color="#888888"><div></div></font></span></div></blockquote></div></div><div><br></div><div>Ah, yes, I _nearly_ got there, didn't I, with my "boxing clever" workaround. If one knows one's in this situation, just append data from the target file to the partial file to fill in the missing bits (e.g., if the target is 100K and the partial is 20K, append the _last_ 80K of target to partial), and when rsync runs it'll only send what it has to. A C program to recursively walk a tree and do that on the selected partials where it makes sense (e.g., my VM HDD files) and not to others (which might have deletions or insertions) is probably 20-30 lines of code.</div>


<div><br></div><div>On 12 August 2012 19:08, Wayne Davison <span dir="ltr"><<a href="mailto:wayned@samba.org" target="_blank">wayned@samba.org</a>></span> wrote:</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="gmail_quote"><div class="im">On Sun, Aug 12, 2012 at 10:41 AM, Wayne Davison <span dir="ltr"><<a href="mailto:wayned@samba.org" target="_blank">wayned@samba.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="gmail_quote"><div>I have imagined making the code pretend that the partial file and any destination file are concatenated together for the purpose of generating checksums.</div></div></blockquote><div>

<br></div></div><div>Actually, that could be bad if the destination and partial file are both huge.  What would be better would be to send just the size of the destination file in checksums, but overlay the start of the destination's data with the partial-file's data (and just ignore any partial-block from the end of the partial file).</div>


</div></blockquote><div><br></div><div>Yes, I wasn't thinking concatenation, but more like what LVM and similar do with snapshots: The partial file is a bunch of snapshot blocks with the curious property of only being at the beginning of the file. So given a file with 50K blocks, and a partial with 20K blocks, the code would view the combined result as the first 20K blocks of the partial followed by the subsequent 30K blocks from the target. (Hence my "see through" terminology above.)</div>


</div><div><br></div><div>E.g., resorting to ASCII-art, the receiver code see a virtual file:</div><div><br></div><div><font face="courier new, monospace">                       </font><span style="font-family:'courier new',monospace">+--------------+</span></div>


<div><font face="courier new, monospace">                       | </font><span style="font-family:'courier new',monospace">partial file |</span></div><div><font face="courier new, monospace">+--------------+       </font><span style="font-family:'courier new',monospace">+--------------+</span><font face="courier new, monospace"> </font><span style="font-family:'courier new',monospace">+--------------+</span></div>


<div><font face="courier new, monospace">| virtual file |  +--->| Blks 0-9K   </font><span style="font-family:'courier new',monospace"> </span><span style="font-family:'courier new',monospace">|</span><span style="font-family:'courier new',monospace"> </span><span style="font-family:'courier new',monospace">|  target file |</span></div>


<div><font face="courier new, monospace">+--------------+  | +->| Blks 10K-19K |</font><font face="courier new, monospace"> </font><span style="font-family:'courier new',monospace">+--------------+</span></div>


<div><font face="courier new, monospace">| Blks 0-9K    |--+ |  +--------------+ |</font><font face="courier new, monospace"> Blks 0-9K   </font><span style="font-family:'courier new',monospace"> </span><span style="font-family:'courier new',monospace">|</span></div>


<div><font face="courier new, monospace">| Blks 10K-19K |----+                   </font><span style="font-family:'courier new',monospace">| Blks 10K-19K |</span></div><div><div><font face="courier new, monospace">| Blks 20K-29K |----------------------->| Blks 20K-29K |</font></div>


</div><div><div><font face="courier new, monospace">| Blks 30K-39K |</font><span style="font-family:'courier new',monospace">----------------------->| Blks 30K-39K |</span></div><div><font face="courier new, monospace">| Blks 40K-49K |</font><span style="font-family:'courier new',monospace">----------------------->| Blks 40K-49K |</span></div>


</div><div><div><div><font face="courier new, monospace">+--------------+                        </font><span style="font-family:'courier new',monospace">+--------------+</span></div></div></div><div><font face="courier new, monospace"><br>


</font></div><div>The receiver would perform checksums against that virtual file, and when time to copy a block, if the block needs to be transferred, do that; if not, grab it from the target file.</div><div><br></div><div>


Again, this all really only applies in the simple case of files that are nice, discrete blocks of data. Not knowing the delta algorithm, I have no idea what would happen if the above were applied to a file that got (say) 5K of blocks deleted at the beginning followed by 1K blocks of inserted data. The virtual file would appear to have duplicated data in that case, which the delta algorithm would then have to get rid of / cope with. I wouldn't be too surprised to find that it lead to inefficiency in other types of files.</div>


<div><br></div><div>Thanks again,</div><div><br></div><div>-- T.J.</div>