stat (AGAIN!)

Sun Mar 16 01:06:18 GMT 2003

I hope we're not all sick of this one yet.

I've been going through the code and the message traffic on this subject in
the archive (especially that related to Carl's finings in
http://lists.samba.org/pipermail/samba-vms/2002-December/002003.html .Form
my test, I have found that significant improvements (about half the I/O) can
be made if we drag around the FID along with the filename so that we can use
RMS calls to get file attributes when doing an open by name block (by FID).
This can be done without changing the essential logic of the code and
without going into undocumented territory.

To test things out, I put together and ran three small programs and did a
DIR /SIZ /TOTAL as a control. All the programs had the same basic structure
as that implemented in SAMBA; that is, read in the whole directory, then
grab the file information in a separate loop. I tested the programs with
printf and the debugger to see that I was processing the directory properly.
I then commented out the printf statements and re-compiled with /NODEBUG for
the test runs.

EXP1 used opendir(), readdir() and closedir() to read the filenames and then
ran a second loop to call stat().

EXP2 used $PARSE and $SEARCH in the first loop to get filenames and FIDs. I
kept a copy of the DVI and DID from one of these calls as additional input
to the second loop. Here, I used $OPEN and $CLOSE to grab the attributes I
needed. I set up a NAM block linked to a XAB to get file size and date
information.

EXP3 was the same as EXP2 for the first loop. In the second loop, I used the
ACP-QIO interface to get the information out of a FIB linked to a SBK to
give me size and date information.

I used DIR/SIZ/TOTAL as a control, since I figured this would be the best
possible performance (almost true).

I was concerned with I/O counts and somewhat with CPU time. For what we are
trying to do, I believe that I/O bandwidth is a far more relevant limiting
factor than CPU. Here's what I measured for a fairly hefty directory (1426
files yielding a .DIR of 100 blocks):

      DIR/SIZ   EXP1   EXP2   EXP3
I/O    2916     4309   2190   4292
CPU    4.39    25.02   7.30   9.54

The clear winner is EXP2 using the RMS interface opening by FID. I was
surprised that the ACP-QIO interface did so poorly in the test from an I/O
standpoint. I suppose this is a FID-CACHE issue. The other surprise was that
EXP2 beat the DIRECTORY command from an I/O standpoint! However, DIR still
wins in the CPU department (looping through the files twice kills in CPU).

I am taking a look at what is needed to hack in the extra info to make the
RMS calls work. Also, if the check summing idea I had for detecting
directory changes is viable, I'm hoping to integrate it with the $PARSE and
$SEARCH calls in vms_opendir (checksum the FIDs and filenames returned).

The only other significant improvement I can think of is to support partial
rebuilds of the SAMBA directory cache. Even with a change in space
allocation on the disk, or a change somewhere in the filenames/FIDs for a
given directory, most of the files will still be the same 99.99% of the time
for a directory of any size. Implementing this one seems to be a lot more
complicated.

Peter
*****************
Peter Smode
Kitsilano Network Research
psmode at kitsnet.vancouver.bc.ca