[jcifs] JCIFS pagination

Sridhar Jonnalagadda jonnalagadda.sridhar at gmail.com
Wed Oct 29 09:54:22 MDT 2014


Hi Mike,

First I appreciate your time to explain the sequence of things that might
go wrong with my approach.

I need to give further information about my requirement, to make you more
comfortable with the change. You have a valid point from typical pagination
point of view, where user click next button and it will be whole different
session for each page. Keeping the connection alive from server to SMB
share for something that might (or) might not happen is a lame and,
horrible design by all counts.

1. Let's say we have 10,000 objects in the share drive.
2. Mobile device is requesting to view contents of share drive.
3. Mobile sends a HTTP request to the server that is using JCIFS lib to
connect with SMB drive.
4. Approach #1: Use listFiles API and convert most of info from SmbFile to
JSON and serialize and send it. Since server has more memory, it might be
able to serialize array of 10,000 objects. On the device to consume this
whole
    payload and serialize, it is going to blow the memory on device.
5. Approach #2: Use new pagination APIs and take 5-10 entries at a time,
jsonify the entires and, send that chunk as multi-part HTTP response to the
device. In this case device and server will not have memory issues. This
means          access to drive is going to happen in 5-10 seconds delay for
each page size. Until server iterate through the list, it will be same
session.

You might pose a question, why we are not doing proper pagination. The
server integrates with different repositories such as share-point, webdav
and etc. Like SMB, other repositories may not support pagination. We are
going with lowest common denominator.

Based on above constraints described, we are using approach #2.

With further background, do you think the new APIs are going to cause any
harm on server? Your suggestion is to consume first 10 and start returning
false for remaining 9990 objects from filter. It still gets the objects
from server and discards at the client. Most likely it will be wasting
processing time.

I'l post the pagination changes on GitHub, once I conclude functional and
load testing. It will be a while before I post the change.


Thanks & Regards,
Sridhar Jonnalagadda.

On Tue, Oct 28, 2014 at 9:52 PM, Michael B Allen <ioplex at gmail.com> wrote:

> On Mon, Oct 27, 2014 at 10:37 AM, Sridhar Jonnalagadda
> <jonnalagadda.sridhar at gmail.com> wrote:
> > Hi Mike & Chris,
> >
> > Appreciate your response.
> >
> > I'm trying to consolidate both of your responses and provide more inputs
> on
> > my requirement.
> >
> > Based on your feedback, I need to provide more inputs to make it more
> clear.
> >
> > Clarifications:
> >
> > 1. listFiles API will be not be changed.
> >
> > 2. There will be new API(s) to SmbFile, which takes page size as input
> and
> > return array based on the size. This means page size is not specific to
> my
> > needs. This API can be called multiple time to retrieve page worth of
> > details until all elements are retrieved.
> >
> > 3. Yes, it is specific to my application needs to support pagination. The
> > server is handling request from mobile devices. Mobile devices are
> > requesting to view contents of SMB share. Since SMB share volumes keeps
> > growing and, server need to handle multiple clients concurrently, there
> > should be a way to handle thousands of clients by limiting memory usage.
> To
> > do this only option is to support pagination.
> >
> > 4. Since JCIFS is blocking API, I took care in my application to make it
> > look like asynchronous. I was thinking about users out there who might
> get
> > benefit of making it as async library.
> >
> > 5. For my application sorting is not a concern. Does CIFS / SMB server
> > protocol support sorting?
> >
> > 6. My concern is let say, the share has 1,000 objects and, 1000 devices
> > trying to use the API, then all of them together might be using
> significant
> > amount of memory.
> >
> >
> > I changed SmbFile to have three new public APIs. These APIs help me to
> > provide iteration behavior and keep memory usage at minimum possible.
> >
> > 1.  public SmbFile[] getFirstPageContents(final SmbFileFilter
> > fileFilter,final SmbFilenameFilter fileNameFilter, int pageSize) throws
> > SmbException
> > 2.  public SmbFile[] getNextPageContents() throws SmbException
> > 3.  public void endPaginationRequest()
> >
> >
> > There are couple of additions constructors.
> >
> > 1.  Trans2FindFirst2( String filename, String wildcard, int
> > searchAttributes, int pageSize )
> > 2. Trans2FindNext2(final  int sid, final int pageSize, final int
> resumeKey,
> > final String filename )
>
> Hi Sridhar,
>
> I don't think that would work. The Trans2Find{First,Next}2 commands
> might look like they can be used for "paging" as you describe but that
> is not the intent. The indent of those commands is simply to buffer
> chunks of directory entries efficiently (because the disk is slow
> compared to the client code). You cannot just leave the list handle
> open for a long time and then expect to retrieve another "page" 10
> minutes later. The connection would get closed, the list handle would
> be invalidated because objects changed, the server would hang as it
> struggles to resurrect the old list handle, etc. And you cannot use
> the resumeKey to initiate a new request with an offset > 0. That
> resumeKey has to be the resumeKey supplied in the previous command.
>
> The only vaguely practical way to implement what you describe is to
> just list everything and collect the desired segment of files using
> FileFilter.accept(). And the implementation would be trivial. You
> would just count and return false until you reach the start of the
> "page", then return true until you reach the end of the page and then
> return false for everything else. Then SmbFile.list() will return the
> desired "page" of SmbFile[]. It's so mind numbingly elegant I honestly
> don't know why you would want to do it any other way.
>
> You should also realize that the network and the server's disk are
> going to be a lot slower than anything JCIFS is doing. Meaning JCIFS
> is going to spend a lot of time just waiting for
> Trans2Find{First,Next}2 responses. So your optimization would have
> almost no practical effect.
>
> But returning false from FileFilter.accept() would save memory because
> JCIFS will not convert the Trans2Find{First,Next} response data to an
> SmbFile when accept() returns false. Creating objects in Java is
> expensive. So if you want to save memory because many mobile devices
> are "paging" through objects, using FileFilter.accept() would be
> worthwhile.
>
> Unlike a lot of Java code, JCIFS is very efficient. Share a drive with
> lots of files and run the multi-threaded crawler example against it.
> Run it a few times so that the server cache is hot. I bet you could
> every object on the whole machine in under 30 seconds. When I wrote
> that threaded crawler example (10 years ago), I listed every object on
> my NT4 workstation in 10 seconds.
>
> > At present I'm testing the changes and would like to contribute this
> change.
> > Please let me know the process.
>
> Create a new package like jcifs-paging-1.3.17.zip and put it on github
> and then post a link here. Linux style development where everyone just
> pushes their own complete stand-alone package that is ready-to-run is
> vastly superior to mucking about with patches and version control
> systems. If what you have done is really good, people will use your
> package. Then maybe we'll have to consider what you have done.
>
> Mike
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/jcifs/attachments/20141029/f8814b4b/attachment.html>


More information about the jCIFS mailing list