[Samba] Large numbers of files in a directory - take #2 :-)

John H Terpstra jht at samba.org
Thu Feb 3 20:01:28 GMT 2005


Folks,

This will go into the docs as soon as 3.0.11 is out.

- John T.

On Thursday 03 February 2005 12:22, Jeremy Allison wrote:
> Ok, second attempt now I'm sure the code is working :-).
>
> JohnT - if you want to turn this into a HOWTO or part of
> the book, be my guest. Remember it'll be in 3.0.12, not
> 3.0.11 or below.
>
> -----------------------------------------------------------
> I've been working (inspired by James Peach of SGI) on the
> problem of using Samba3 with applications that need large
> numbers of files (100,000 or more) per directory.
>
> I think the current code in SVN in the SAMBA_3_0 branch
> may hold the fix for this problem, so I'd like to request
> people who need this functionality to give it a try.
>
> The key was fixing the directory handling to read only
> the current list requested instead of the old (up to 3.0.11)
> behaviour of reading the entire directory into memory before
> doling out names. Normally this would have broken OS/2
> applications which have *very* strange delete semantics :-),
> but by stealing logic from Samba4 (thanks tridge) I think
> the current code in SVN handles this correctly.
>
> So here's how to set up an application that needs large
> number of files per directory in a way that doesn't damage
> performance.
>
> Firstly, you need to canonicalize all the files in the
> directory to have one case, upper or lower - take your
> pick (I chose upper as all my files were already upper
> case names). Then set up a new custom share for the
> application as follows:
>
> [bigshare]
>         path = /home/jeremy/tmp/manyfilesdir
>         read only = no
> 	case sensitive = True
>         default case = upper
>         preserve case = no
>         short preserve case = no
>
> Of course, use your own path and settings, but set the
> case options to match the case of all the files in your
> directory. The path should point at the large directory
> needed for the application - any new files created in
> there and in any paths under it will be forced by smbd
> into upper case - but smbd will no longer have to scan
> the directory for names - it knows that if a file doesn't
> exist in upper case then it doesn't exist at all.
>
> The secret to this is really in the "case sensitive = True"
> line - it tells smbd never to scan for case-insensitive
> versions of names. So if an application asks for a file
> called "FOO", and it can't be found by a simple stat call,
> then smbd will return file not found immediately without
> scanning the containing directory for a version of a different
> case. The other "xxx case xxx" lines make this work by forcing
> a consistent case on all files created by smbd.
>
> Remember, all files and directories under the "path" directory
> must be in upper case with this smb.conf stanza as smbd won't
> be able to find lower case filenames with these settings. Also
> note this is done on a per-share basis, allowing this to be set
> only for a share servicing an application with this problematic
> behaviour (using large numbers of entries in a directory) - the
> rest of your smbd shares don't need to be affected.
>
> This makes smbd *much* faster when dealing with large directories.
> My test case has over 100,000 files and smbd now deals with this
> very efficiently.
>
> So please give this a test if you have problems with
> Samba and large sized directories. Remember this is in SVN code
> only, it isn't in the 3.0.11 pre releases or rc candidates,
> as we need to ensure this new code is correct. If you
> can help me test it it'll be in 3.0.12 (security problems
> notwithstanding :-).
>
> Cheers,
>
> 	Jeremy.

-- 
John H Terpstra
Samba-Team Member
Phone: +1 (650) 580-8668

Author:
The Official Samba-3 HOWTO & Reference Guide, ISBN: 0131453556
Samba-3 by Example, ISBN: 0131472216
Hardening Linux, ISBN: 0072254971
Other books in production.


More information about the samba mailing list