[Samba] cleaning up duplicate files on the file server

Aaron Kincer kincera at gmail.com
Mon Feb 5 20:31:59 GMT 2007


Dealing with data duplication is not always particularly easy. What I 
would suggest is the following:

1) Identify the duplicates with the oldest modification date
2) Notify your users that you are making changes and to be on the 
lookout for any problems
3) Change the file permissions so that they can't be accessed by anyone 
other than you
4) If after some predetermined length of time (measured in months 
preferably) nobody has complained, delete the duplicates

Changing the permissions offers you an easy way to simulate deleting 
without actually deleting. You could issue a command to dump the ACLs 
for each file into a log by using a modified form of the command I've 
posted in the past for setting the archive bit of files that have been 
modified. Here is is for your convenience:

/usr/bin/find /share/ -name '*' -mtime 0 -exec setfattr 
--name=user.DOSATTRIB --value=0x30783230 {} \;

You could change the find command to use your find duplicates and change 
the setfattr to getfacl. With some fancy footwork, you should be able to 
do all of that and redirect output into a text file in the event that 
you have to restore permissions to their previous state. Of course, you 
could also use this command to set permissions on all of the files by 
using setfacl.

Just a suggestion. Any shell gurus out there that can offer up better or 
more clear advice please do so.

James A. Dinkel wrote:
> I imagine we can save some space on our file server by cleaning up all
> the files that are saved multiple times by different people.  There is
> already the fdupes command in linux that will scan a directory tree and
> report what files have duplicates.  This could be easily scripted to
> turn those duplicate files into symlinks to one file.
>
>  
>
> The problem is see, then, is what would happen if someone tries to
> change a duplicate file that they think is their own copy.  Of course,
> everyone with a symlink to that file would get the changes, which is not
> what I would want.  What it would need is some sort of copy-on-edit
> mechanism, so when the file is changed, instead of changing the original
> file, the symlink is replaced with the edited version of the file.
>
>  
>
> Does this make sense?  Has anyone else thought about this, or found an
> elegant solution to this?
>
>  
>
> James Dinkel
>
> Network Engineer
>
> Butler County of Kansas
>
>  
>
> There are 10 types of people in the world:  those who understand binary,
> and those who don't.
>
>  
>
>   



More information about the samba mailing list