How to make big MySQL database more diffable/rsyncable? (aka rsyncing big files)

Tony Abernethy tony at servacorp.com
Fri Jul 10 07:33:36 MDT 2009


Silly question, but are you doing something like compacting, 
removing air, or "optimizing" the database in any form?

If the blobs keep moving around, that makes finding common stuff 
much much harder.

It will still have to read both sides even if almost everything is the same
 

> -----Original Message-----
> From: rsync-bounces+tony=servacorp.com at lists.samba.org 
> [mailto:rsync-bounces+tony=servacorp.com at lists.samba.org] On 
> Behalf Of Krzysztof Nosek
> Sent: Friday, July 10, 2009 7:48 AM
> To: malayter at gmail.com; rsync at lists.samba.org
> Subject: Re: How to make big MySQL database more 
> diffable/rsyncable? (aka rsyncing big files)
> 
> Hello,
> > So I do not think the basic data structure is the problem,  
> > unless mysql hotcopy does something really strange like 
> inserting a  
> > timestamp or other changing data info every few KB in the 
> output stream.
> >   
> No, really,  mysqlhotcopy performs just a raw file system copy of 
> /var/lib/mysql taken from the locked database. If nothing particular 
> happens meanwhile in the running database, the copy is 1:1 with the 
> original. Easy to check with any smaller database.
> > I would suggest trying a tool like xdelta (on the same 
> machine) against two  
> > consecutive backup files, just to see if it can extract 
> similarities. If  
> > xdelta can find significant matched data, rsync should be 
> able to as well.
> >   
> I'd love to do that but I can't make it actually working:
> xdelta: open 
> ../mantis_game_20090707/mantis_bug_file_table.MYD failed: 
> Value too large for defined data type
> 
> Same for dumps. I think it's running out of memory just like 
> diff does 
> with files that large, isn't it?
> 
> > Also, is the transfer CPU bound or network bound? Can you 
> send the output  
> > of rsync with the --stats and -v options?
> >   
> I'm pretty sure it's the network. The rsync jobs on both 
> machines use no 
> more than 30-50% of the CPU. I may be wrong - please find the 
> log attached.
> Perhaps I am memory bound, could it be?
> 
> Regards,
> nosek
> 



More information about the rsync mailing list