Samba - CPU and memory usage - Proposed solution(?)

Nikos Balkanas nbalk at hol.gr
Sat Jan 22 05:53:27 GMT 2005


Hello,

Solution developed against samba 2.2.22. Didn't and do not have the
opportunity to test samba 3.0.0.

At the time I was working as a technical architect for Tellas, the 2nd
largest Telcom in Greece. We used large billing and CRM systems (Geneva,
Siebel). Filesystem and Database were hosted on Solaris SF68000 servers (4-6
CPUs/domain). Therefore, we used samba on the Unix servers.

These systems generate lots of data, and they use the proper interface
between database and filesystem. That is, bulk (bills, contracts) are kept
as files, and only the path is in the database. This of course (depending on
company size and traffic) generates single directories with millions of
files in each one. Samba can handle up to ~20,000 files/per directory
without significant server or service degradation.

At 70,000 files/directory (10 directories), siebel would delay ~20" to
display a customer's contracts making it very difficult for CRM to work. At
the same time geneva with ~1,000,000/directory would delay ~20' to display a
particular bill. All this time geneva smbd processes were ~150 MB RAM and
CPU 100%. 4 simultaneous such requests by CRM and support could stonewall
the domain. 10 simultaneous requests would crash the server (easy to do when
a single request lasts ~20'). No browsing or wildmasks of files needed, only
exact file request through the database.

Putting samba through the debugger, I noticed that on every file request, it
would scan all the files in the large directory, while converting to Unix
filenames and building up the filename cash until it reaches 150 MB. I
developed a configurable parameter "many files", which when set, disables
file browsing (who needs listing of ~1,000,000 files?) and performs a "stat"
to get the file.

The improvement was huge and manyfold. Response went down to < 1", CPU to ~
.1% and RAM ~ 2.5 MB/process. More importantly, these results are
independent on how many files are in a directory (as long as the filesystem
doesn't run out of inodes!). Even more, security is better, since CRM agents
cannot view, modify or delete files from the mapped filesystem, but instead
they go only through the application as intended. Since this is a per
directory configurable parameter, other samba directories with fewer files
can have full browsing/listing at the same time.

The solution was tested against Windows XP. Windows XP must use a similar
"stat" mechanism, since it went very fast with ~1,000,000 files/directory.
Directory listing is slow (as expected), and in batches of 200 or so files
at a time. However, you cannot disable browsing, and therefore it is an
inferior solution, since security is more lax, and each time that a bill is
about to be saved, the full browsing window is opened, with all the
side-efects on the server. It uses, however, fewer packets than samba to do
file requests.

As mentioned I have no idea, and I am not able to test 3.0. My apologies if
you already have corrected for it. If not, and there is interest for the
patch let me know - but it will be against 2.2.22. The patch has been tested
succesfully on Telas' production environment for ~1 year without any
complains. With this patch, samba can be the top choice for large serious
professional production systems. Otherwise directories should be kept less
than 20,000 files.

Cheers,
Nikos Balkanas




More information about the samba-technical mailing list