Performance problem (2.2.4) [long]

Thu Dec 5 05:01:09 GMT 2002

"John E. Malmberg" <wb8tyw at qsl.net> wrote:
>"Michael D. Ober" wrote:
>> I have been working with this issue now for a while and it appears 
>> that the problem isn't the directory listing itself, but the file 
>> size calculation that takes forever.  Any ideas on how to speed this 
>> up.
>
>The culprit is the call to stat() is expensive on OpenVMS and is 
>required to do this and several other operations in the directory listing.

Having noticed that the stat() function is very slow, I have recently
been experimenting with trying to get the same information that the stat()
function returns, but faster. My tests have been focused on gathering
information for a whole directory of files, rather than just one.

What I have tried is this:

1) Use opendir(), loop over readdir(), closedir() to read in the entire
 contents of the directory, allocating and loading a list of data structures
 with the filenames (space for these also allocated on the fly) and FID values.

 - this list is in the filenames' alphabetical order, using it directly
   resulted in excessive amounts of null time, so I now also create an
   array of pointers that is sorted by file sequence number of the FID
   (using the FID's sequence number and the extra bits from the NMX byte
   of the 3rd word to get the true sequence number, which is directly
   related to which virtual block of the INDEXF.SYS file the file header
   for that file is in). The resulting improvement was considerable.

2) Directly open the INDEXF.SYS file (by assigning a channel to the
 disk device and doing an IO$_ACCESS on that file's FID, which is
 a known set of values).

3) Read the home block (VBN 2 in INDEXF.SYS) and parse the information 
 to get the offset to the VBN in the file where the file headers start.

4) Flush the FID cache to insure that any recent changes are accounted for.

5) Loop through the list of files sorted by sequence number, doing an
 IO$_READVBLK QIO on the calculated virtual block number for that file's
 file header. Copy the information out of the file header to fill a
 stat data structure (the same structure definition used for the stat()
 functions data) for the file. At the moment this only works on ODS-2
 disks - I have not put in a test to check for ODS-5 and use the
 appropriate file information data structure (the dates are at different
 offsets so it won't work as-is) - this one data area in the file header
 is the only difference in the file headers between the two, I think. Some
 of the data is now processed outside of this loop in an attempt to reduce
 the latencies involved in reading the INDEXF.SYS file, with minimal success
 at this point - currently the creation and modification date-time values are
 saved in VMS format and converted to unix times in a separate loop (each
 has to be processed to convert to the unix format and adjust for timezone
 and daylight savings) and in this loop the stat.st_dev field is filled in,
 and the stat.sd_gid value is calculated from the stat.sd_uid value (which
 is the user's full UIC including the group). Moving this out of the loop
 reduced the average total time by at most a couple of milliseconds (the
 real improvement came from switching from alphabetical order to FID order).

6) Close INDEXF.SYS.

7) There is also a routine to clean up all the memory allocated for the
 list structures (which are actually allocated 10 at a time, which requires
 a bit of fiddling to free them properly) and filename strings.

So how well does this work compared to using stat()? In terms of
CPU time it is better by about a factor of 6. In terms of elapsed
time it takes slightly over half as long.

Some data...

First: "with stat in lib$find_file loop" - this is just what it says.
 A loop using lib$find_file() to go through all the files in a directory
 and a stat() performed on each. 

Second: "without stat in lib$find_file loop" - this is just the
 lib$find_file() loop without the stat() to determine how much
 overhead this has.

Third: "with sys$parse and sys$search" - this is using these two RMS
 routines directly. I have not strung any XABs onto the FAB to try to
 get the data we are after - the date-time values and most of the others
 are fairly easy but I'm not sure how to get the file mode data (in
 particular the "this is a directory" value) using these. Since this
 isn't returning all the data the times returned are lower than they
 would be if the various XABs to retrieve the data were added (and
 if the VMS to Unix time manipulations were also added).

Fourth: "using Scan_Dir routine" - this is my routine as described above.
 After the list is filled, the list is walked to print out the filenames
 (all of the above printf() them as they are produced by lib$find_file or
 sys$search). This version is from loading the data for the files in
 alphabetical order, the same as they are loaded from the directory file
 into the list. 

Fifth: "using Scan_Dir routine (sorted by FID version)" - this
 is the version that reads the file headers in the file sequence
 number order, i.e. the order that they appear in the INDEXF.SYS file.
 This also has a few other optimizations, such as moving the time
 conversion calculations out of the file header reading loop.

Notes:

A) The stat() loop and "without stat()" loop data was gathered earlier
 then the others, when there were probably two fewer files in the directory
 (I remember it as being 290, whereas the others were done more recently
 with 292, and some of the 290 in there are not the same versions of the
 files). The directory for which the file information is being gathered
 is my login directory.

B) These times are in "ticks" as reported by the difference between
 two calls to a "get time in modes" routine which examines the system's
 SMP$GL_CPU_DATA data structures. There are 1024 of these ticks per
 second, as far as I can tell, on an Alpha (it was 100 on a VAX with
 VMS V5.5-2, and probably still is with more recent versions). As such
 these values are not process specific, but the CPU time in the various
 modes as consumed by all processes on the system during the interval.

C) All testing was done on an XP900 with very little going on other than
 the tests. It may not be relevant, but the system's quantum parameter
 is set to 10Ms and the process running the test was operating at
 normal user priority (i.e. a base of 4).

D) In the test program the names of each file are printed to the screen
 in order to simulate doing something with the data - there is as brief
 wait at the start of the program during which I minimize the terminal
 window to reduce the display rate impact. I should probably do something
 better that this, but it seemed good enough at the time.

E) The data is for three runs. The first column of percentages is for all
 of the time averages including null, the second excluding null.

F) The scan_dir test's times do include calling the cleanup routine which
  does what it needs to do to call free() for each piece of memory that
  was allocated.

* With stat in lib$find_file loop:
run      1     2     3      Averages
         ---   ---   ---    -----------------------
kernel = 276   272   261    269.67 (31.7%)  (37.6%)
exec   = 134   120   115    123    (14.4%)  (17.1%)
super  = 4     9     9      7.33   ( 0.9%)  ( 1.0%)
user   = 327   276   216    273    (32.1%)  (38.0%)
inter  = 45    33    56     44.67  ( 5.2%)  ( 6.2%)
spin   = 0     0     0      0
null   = 72    137   186    133.67 (15.7%)  XXXXX

Total, excluding null =     717.67
Total, including null =     851.33

* Without stat in lib$find_file loop:
run      1     2     3      Averages
         ---   ---   ---    -----------------------
kernel = 55    17    33     35     (21.3%)  (21.8%)
exec   = 16    14    11     13.67  ( 8.3%)  ( 8.5%)
super  = 0     0     0      0      ( 0.0%)  ( 0.0%)
user   = 156   82    73     103.67 (63.0%)  (64.6%)
inter  = 10    8     7      8.33   ( 5.1%)  ( 5.2%)
spin   = 0     0     0      0
null   = 5     2     5      4      ( 2.4%)   XXXXX

Total, excluding null =     160.67 (22.4% of stat case)
Total, including null =     164.67 (19.3% of stat case)

* With sys$parse and sys$search instead of stat, no lib$find_file:
run      1     2     3      Averages
         ---   ---   ---    -----------------------
kernel = 149   150   167    155.33 (34.2%)  (49.6%)
exec   = 50    53    47     50     (11.0%)  (16.0%)
super  = 0     0     0      0      ( 0.0%)  ( 0.0%)
user   = 94    84    75     84.33  (18.6%)  (26.9%)
inter  = 30    24    17     23.67  ( 5.2%)  ( 7.6%)
spin   = 0     0     0      0
null   = 135   141   145    140.33 (30.9%)   XXXXX

Total, excluding null =     313.33 (43.7% of stat case)
Total, including null =     453.67 (53.3% of stat case)

* Using Scan_Dir routine:
run      1     2     3      Averages
         ---   ---   ---    -----------------------
kernel = 79    45    55     59.67  ( 7.4%)  (30.9%)
exec   = 9     8     10     9      ( 1.1%)  ( 4.7%)
super  = 1     2     3      2      ( 0.2%)  ( 1.0%)
user   = 170   84    80     111.33 (13.8%)  (57.7%)
inter  = 13    11    9      11     ( 1.4%)  ( 5.7%)
spin   = 0     0     0      0
null   = 612   622   602    612    (76.0%)   XXXXX

Total, excluding null =     193    (26.9% of stat case)
Total, including null =     805    (94.6% of stat case)

* Using Scan_Dir routine (sorted by FID version): 
run      1     2     3      Averages
         ---   ---   ---    -----------------------
kernel = 30    31    36     32.33  ( 8.5%)  (30.8%)
exec   = 6     15    6      9      ( 2.7%)  ( 9.7%)
super  = 0     2     2      1.33   ( 0.6%)  ( 2.3%)
user   = 71    61    60     64     (13.5%)  (49.1%)
inter  = 6     8     11     8.33   ( 2.2%)  ( 8.1%)
spin   = 0     0     0      0
null   = 354   354   353    353.67 (72.5%)   XXXXX

Total, excluding null =     114.67 (16.0% of stat case)
Total, including null =     468.67 (55.1% of stat case)

-----------------

I am not certain where all that null time is coming from. At least
some of it is due to disk seek and rotational latency issues, which
is why switching to walking the list by the order they are in the
INDEXF.SYS file and moving some operations out of the read loop just
about cut it in half. I'm not sure that it will be possible to reduce
it any more as the only processing that is still going on inside the
read loop is copying data, some bitwise operations ("and"ing, "or"ing,
negating, and shifting), and a couple of adds. I may try to move
everything except the data copying out of the loop, but there isn't
much left to move and moving the things I did move made only a
very small improvement. It is possible that it may help to only
copy the data that Samba actually uses - there are fields in the
stat structure that it may not need, and I'm filling them all.

There are a variety of improvements that could be made to the scan_dir
routines, but they are increasingly difficult to implement (especially
correctly). For example, at the moment the disk I/O is done synchronously
with a sys$qiow reading one block, the data is processed, then moving on
to the next read. Some sort of multiple buffer with asynchronous I/O
scheme could probably reduce that null time considerably.

If Samba's attempts to list all the files in a directory could be
switched over to this method of doing them all at once then pulling
the data out of the list as it is needed and cleaning up after it is
all sent, this could reduce the time it takes by maybe up to 45% if
done using the current version of the routines (it wouldn't reduce file
name translation or network transmission times or any other processing,
so it would probably be a bit less). It would need to be modified to
only attempt this for the full directory listings, not when looking
for individual files (as the PC gets the listing for a directory it
also sends a request for each subdirectory you sent to check it for a
file called desktop.ini, for example).

If all the necessary data is obtainable with it, it may also be better
to directly use the RMS sys$parse and sys$search routines - I'm not sure
why since I had originally assumed that this was how stat() was getting
its data, but it could be doing something a lot like I am doing but just
for one file at a time. RMS does do a variety of caching that might be
helping its times. It is hard to say what it's times would be like if
all the neccessary XABs are added to the FAB, but it will probably be
slower than the scan_dir method as it is close now and already uses
nearly a factor of 3 more CPU time.

>Potential solutions mostly involve recoding SAMBA to reduce the number 
>of times that it calls stat().

Caching some of it may help a little - it does stat() some filespecs
repeatedly. (It turns out that Samba does have a thing called a "stat
cache", but after looking at it I think it isn't caching the results
of stat() calls. It appears to be used for some sort of translated
filename caching so that it can stat() a filename without having to
retranslate it, just look it up.)

>This also affect UNIX but not to the degree that it hurts OpenVMS.
>
>But the true fix may require significant optimizations to the UNIX code 
>base.

I suspect that doing something like I have tested could be in the
realm of "significant" - but it is possible that this would only
involve changing things is a rather small number of places in the
code. I haven't checked.

--- Carl Perkins
    Computer Systems Manager
    Geochemical & Environmental Research Group
    Texas A&M University
    carl at gerg.tamu.edu