[jcifs] listFiles on a large directory runs out of memory

Michael B Allen mba2000 at ioplex.com
Mon Jun 11 15:31:34 GMT 2007


On Mon, 11 Jun 2007 10:02:30 -0400
Jake Goulding <goulding at vivisimo.com> wrote:

> I have a directory with 350k files, and doing a listFiles() on it causes 
> my VM to run out of memory (128 MB). Dumping the heap and running jhat 
> yields:
> 
> String - 682810 - 13656200 bytes
> URL - 169916 - 14952608 bytes
> SmbFile - 169767 - 24955749 bytes
> Character[] - 513042 - 50242964 bytes
> 
> SmbFile has references to 3 Strings, plus the URL, and I'm sure that URL 
> has a few Strings, and the Strings all resolve down to the Character[].
> 
> Is there some way of processing directories in batches? Or some way of 
> having a callback, where each file listed is created, then passed back, 
> and could be destroyed after it is done processing. Thanks!

Hi Jake,

You can use the
SmbFile.listFiles(SmbFileFilter filter) or better still the
SmbFile.listFiles(DosFileFilter) methods. Just process the file in the
filter's accept() method and always return false to indicate that you do
not want to add the file to the SmbFile[] list (just ignore the empty
array returned by listFiles). That SmbFile object should be garbage
collected after the FileFilter's acept() method returns (provided you
do not save a reference to it).

Also, as a bonus, if you know you're only looking for files as opposed to
directories or files with a certain extension then using DosFileFilter
(as opposed to the simpler SmbFileFilter) gives you *server side*
wildcard (e.g. '*.doc') and attribute (e.g. SmbFile.ATTR_DIRECTORY)
filtering which requires less processing on both ends. If you do not use
a wildcard other than '*' or filter on attributes there is no performance
benifit to using DosFilterFilter over SmbFileFilter.

There is an example that does this. Look at examples/LargeListFiles.java.

There is one concern however ...

The SmbFile.listFiles() method employs a series of requests that return
buffers of at most 65535 bytes or 200 files (whichever limit is reached
first). So to read 350,000 files would require a minimum of 1750 requests
by JCIFS.

Without using FileFilters these 1750 request would be performed as fast as
the program could transmit requests and decode the responses. If you now
use a FileFilter, requests are not sent until the FileFilter completely
processes one buffer full of files (e.g. 200 files).

For example, lets say your FileFilter.accept() method takes 10 seconds
to process a single file. Then 10 seconds x 200 files is 2000 seconds or
33 minutes. That means the time between the first request and the second
will be 33 minutes. The jcifs.smb.client.soTimeout value is 35 seconds
so that would definitely be a problem. You would have to either process 200
files in under 35 seconds or increase the soTimeout value or decrease
the jcifs.smb.client.listSize value from 200.

Peace, Love and Granola,
Mike

-- 
Michael B Allen
PHP Active Directory Kerberos SSO
http://www.ioplex.com/


More information about the jcifs mailing list