Estimating backup usage with dir-merge filter

Thu Oct 6 16:49:15 MDT 2011

>> It sounds like you missed the point of Kevin's message (in the other fork of this thread).  The point wasn't to use
>> `du`, it was that you can run your stats against the backed-up files, not the source.  Then you're only running stats
>> against the results of running the backup using the filters, so you don't need to filter them again.
> 
> I got that but neglected to respond to the whole group.  My mistake.
> The backups are being performed using BackupPC to a central server
> where compression and de-duplication is done.  While it's true that
> the actual storage on the backup server being consumed by each user is
> less because of these, I don't have any problem hiding this from them
> and instead telling them what their uncompressed and duplicated usage
> is instead.  It has more of an effect that way if you know what I
> mean.
> 
>> If that doesn't make sense or isn't possible (backups are on some remote server), then just use your rsync command
>> with '--list-only', and post-process that list.
> 
> I've been tinkering with using --verbose and --dry-run then parsing
> the total size our of the last line of the output and I think I'm
> close.  Curiously, when I don't include the --filter option as a
> baseline, I'm not getting the same results as "du".
> 
> $ du -sb . | awk '{print $1}'
> 508625653
> 
> $ rsync --dry-run --verbose -a . /tmp/does_not_exist | tail -1 | awk
> '{print $4}'
> 506037893
> 
> The difference is minimal and probably negligible for this purpose but
> I'm still curious where it's coming from.  Maybe there are some sparse
> files in there somewhere.

Do you have the same discrepancy if you use the --stats option?

------------------------------------
 This email is protected by LBackup
 http://www.lbackup.org