--fuzzy question

Wed May 20 19:29:09 GMT 2009

On Wed, May 20, 2009 at 2:26 AM, Julian Pace Ross <linux at prisma.com.mt> wrote:
> Thanks Ryan!
> In fact I found it's a combination of factors you mentioned... i.e. a
> compressed SQL .bak file, so contrary to what I thought, the fuzzy file was
> indeed being found but no matches were being found in the file... thanks
> again for the info.

If you have the disk space at both ends, I would suggest doing what I
do for SQL backup synchronization.

1) Write *uncompressed* .bak files for your databases (with timestamps
in the file name, such as those produced by the database maintenance
plan engine). This enables the use of --fuzzy, as you have discovered.
2) use Rsync to transfer the uncompressed files, but with the -z
option enbaled. This compresses the data over the wire, but
decompresses it at the receiving end.
3) Adjust the rsync block size to something smaller if necessary to
find more matches. I basically went down to 32KB rsync blocks for one
15 GB database file (rsync would by default use something like 129KB
on a file this big). This eats up a lot more CPU, but if irsync can
still output data faster than your network connection can handle, it
is the most time-efficient way to go. Use multiples of 8KB, as that is
the internal page size inherent in MS SQL Server databases. Trial and
error is your friend here. Run rsyc with low priority (START /LOW
rsync.exe) so the CPU usage doesn't impact SQL Server.
4) Minimize any jobs you have to automatically rebuild indexes. Use
UPDATE STATISTICS instead on a daily basis, and rebuild only when
index fragmentation gets heavy. There are lots of scripts out there on
the net which will automate that for you.
5) Minimize the rebuilds of denormalized "reporting" tables or other
non-essential data. Move these off into other databases that you don't
replicate if possible.
6) Watch out for non-sequential clustered indexes. We use GUIDs for
primary keys on many tables, and this causes updates and inserts to be
spread randomly throughout the table as it is physically stored. Even
channging just 5% of the data can result in a change to every database
page in such a scenario). Hot tables which use emails or other VARCHAR
fields as clustered index keys also result in similar behavior.

Most of these suggestions would apply for rsyncing any sort of
database backup file... Exchange, PostgreSQL, Oracle, or even
(horror!) MySQL.

-- 
RPM