Performance problem (2.2.4) [long]
Carl Perkins
carl at gergx.gerg.tamu.edu
Thu Dec 5 05:01:09 GMT 2002
"John E. Malmberg" <wb8tyw at qsl.net> wrote:
>"Michael D. Ober" wrote:
>> I have been working with this issue now for a while and it appears
>> that the problem isn't the directory listing itself, but the file
>> size calculation that takes forever. Any ideas on how to speed this
>> up.
>
>The culprit is the call to stat() is expensive on OpenVMS and is
>required to do this and several other operations in the directory listing.
Having noticed that the stat() function is very slow, I have recently
been experimenting with trying to get the same information that the stat()
function returns, but faster. My tests have been focused on gathering
information for a whole directory of files, rather than just one.
What I have tried is this:
1) Use opendir(), loop over readdir(), closedir() to read in the entire
contents of the directory, allocating and loading a list of data structures
with the filenames (space for these also allocated on the fly) and FID values.
- this list is in the filenames' alphabetical order, using it directly
resulted in excessive amounts of null time, so I now also create an
array of pointers that is sorted by file sequence number of the FID
(using the FID's sequence number and the extra bits from the NMX byte
of the 3rd word to get the true sequence number, which is directly
related to which virtual block of the INDEXF.SYS file the file header
for that file is in). The resulting improvement was considerable.
2) Directly open the INDEXF.SYS file (by assigning a channel to the
disk device and doing an IO$_ACCESS on that file's FID, which is
a known set of values).
3) Read the home block (VBN 2 in INDEXF.SYS) and parse the information
to get the offset to the VBN in the file where the file headers start.
4) Flush the FID cache to insure that any recent changes are accounted for.
5) Loop through the list of files sorted by sequence number, doing an
IO$_READVBLK QIO on the calculated virtual block number for that file's
file header. Copy the information out of the file header to fill a
stat data structure (the same structure definition used for the stat()
functions data) for the file. At the moment this only works on ODS-2
disks - I have not put in a test to check for ODS-5 and use the
appropriate file information data structure (the dates are at different
offsets so it won't work as-is) - this one data area in the file header
is the only difference in the file headers between the two, I think. Some
of the data is now processed outside of this loop in an attempt to reduce
the latencies involved in reading the INDEXF.SYS file, with minimal success
at this point - currently the creation and modification date-time values are
saved in VMS format and converted to unix times in a separate loop (each
has to be processed to convert to the unix format and adjust for timezone
and daylight savings) and in this loop the stat.st_dev field is filled in,
and the stat.sd_gid value is calculated from the stat.sd_uid value (which
is the user's full UIC including the group). Moving this out of the loop
reduced the average total time by at most a couple of milliseconds (the
real improvement came from switching from alphabetical order to FID order).
6) Close INDEXF.SYS.
7) There is also a routine to clean up all the memory allocated for the
list structures (which are actually allocated 10 at a time, which requires
a bit of fiddling to free them properly) and filename strings.
So how well does this work compared to using stat()? In terms of
CPU time it is better by about a factor of 6. In terms of elapsed
time it takes slightly over half as long.
Some data...
First: "with stat in lib$find_file loop" - this is just what it says.
A loop using lib$find_file() to go through all the files in a directory
and a stat() performed on each.
Second: "without stat in lib$find_file loop" - this is just the
lib$find_file() loop without the stat() to determine how much
overhead this has.
Third: "with sys$parse and sys$search" - this is using these two RMS
routines directly. I have not strung any XABs onto the FAB to try to
get the data we are after - the date-time values and most of the others
are fairly easy but I'm not sure how to get the file mode data (in
particular the "this is a directory" value) using these. Since this
isn't returning all the data the times returned are lower than they
would be if the various XABs to retrieve the data were added (and
if the VMS to Unix time manipulations were also added).
Fourth: "using Scan_Dir routine" - this is my routine as described above.
After the list is filled, the list is walked to print out the filenames
(all of the above printf() them as they are produced by lib$find_file or
sys$search). This version is from loading the data for the files in
alphabetical order, the same as they are loaded from the directory file
into the list.
Fifth: "using Scan_Dir routine (sorted by FID version)" - this
is the version that reads the file headers in the file sequence
number order, i.e. the order that they appear in the INDEXF.SYS file.
This also has a few other optimizations, such as moving the time
conversion calculations out of the file header reading loop.
Notes:
A) The stat() loop and "without stat()" loop data was gathered earlier
then the others, when there were probably two fewer files in the directory
(I remember it as being 290, whereas the others were done more recently
with 292, and some of the 290 in there are not the same versions of the
files). The directory for which the file information is being gathered
is my login directory.
B) These times are in "ticks" as reported by the difference between
two calls to a "get time in modes" routine which examines the system's
SMP$GL_CPU_DATA data structures. There are 1024 of these ticks per
second, as far as I can tell, on an Alpha (it was 100 on a VAX with
VMS V5.5-2, and probably still is with more recent versions). As such
these values are not process specific, but the CPU time in the various
modes as consumed by all processes on the system during the interval.
C) All testing was done on an XP900 with very little going on other than
the tests. It may not be relevant, but the system's quantum parameter
is set to 10Ms and the process running the test was operating at
normal user priority (i.e. a base of 4).
D) In the test program the names of each file are printed to the screen
in order to simulate doing something with the data - there is as brief
wait at the start of the program during which I minimize the terminal
window to reduce the display rate impact. I should probably do something
better that this, but it seemed good enough at the time.
E) The data is for three runs. The first column of percentages is for all
of the time averages including null, the second excluding null.
F) The scan_dir test's times do include calling the cleanup routine which
does what it needs to do to call free() for each piece of memory that
was allocated.
* With stat in lib$find_file loop:
run 1 2 3 Averages
--- --- --- -----------------------
kernel = 276 272 261 269.67 (31.7%) (37.6%)
exec = 134 120 115 123 (14.4%) (17.1%)
super = 4 9 9 7.33 ( 0.9%) ( 1.0%)
user = 327 276 216 273 (32.1%) (38.0%)
inter = 45 33 56 44.67 ( 5.2%) ( 6.2%)
spin = 0 0 0 0
null = 72 137 186 133.67 (15.7%) XXXXX
Total, excluding null = 717.67
Total, including null = 851.33
* Without stat in lib$find_file loop:
run 1 2 3 Averages
--- --- --- -----------------------
kernel = 55 17 33 35 (21.3%) (21.8%)
exec = 16 14 11 13.67 ( 8.3%) ( 8.5%)
super = 0 0 0 0 ( 0.0%) ( 0.0%)
user = 156 82 73 103.67 (63.0%) (64.6%)
inter = 10 8 7 8.33 ( 5.1%) ( 5.2%)
spin = 0 0 0 0
null = 5 2 5 4 ( 2.4%) XXXXX
Total, excluding null = 160.67 (22.4% of stat case)
Total, including null = 164.67 (19.3% of stat case)
* With sys$parse and sys$search instead of stat, no lib$find_file:
run 1 2 3 Averages
--- --- --- -----------------------
kernel = 149 150 167 155.33 (34.2%) (49.6%)
exec = 50 53 47 50 (11.0%) (16.0%)
super = 0 0 0 0 ( 0.0%) ( 0.0%)
user = 94 84 75 84.33 (18.6%) (26.9%)
inter = 30 24 17 23.67 ( 5.2%) ( 7.6%)
spin = 0 0 0 0
null = 135 141 145 140.33 (30.9%) XXXXX
Total, excluding null = 313.33 (43.7% of stat case)
Total, including null = 453.67 (53.3% of stat case)
* Using Scan_Dir routine:
run 1 2 3 Averages
--- --- --- -----------------------
kernel = 79 45 55 59.67 ( 7.4%) (30.9%)
exec = 9 8 10 9 ( 1.1%) ( 4.7%)
super = 1 2 3 2 ( 0.2%) ( 1.0%)
user = 170 84 80 111.33 (13.8%) (57.7%)
inter = 13 11 9 11 ( 1.4%) ( 5.7%)
spin = 0 0 0 0
null = 612 622 602 612 (76.0%) XXXXX
Total, excluding null = 193 (26.9% of stat case)
Total, including null = 805 (94.6% of stat case)
* Using Scan_Dir routine (sorted by FID version):
run 1 2 3 Averages
--- --- --- -----------------------
kernel = 30 31 36 32.33 ( 8.5%) (30.8%)
exec = 6 15 6 9 ( 2.7%) ( 9.7%)
super = 0 2 2 1.33 ( 0.6%) ( 2.3%)
user = 71 61 60 64 (13.5%) (49.1%)
inter = 6 8 11 8.33 ( 2.2%) ( 8.1%)
spin = 0 0 0 0
null = 354 354 353 353.67 (72.5%) XXXXX
Total, excluding null = 114.67 (16.0% of stat case)
Total, including null = 468.67 (55.1% of stat case)
-----------------
I am not certain where all that null time is coming from. At least
some of it is due to disk seek and rotational latency issues, which
is why switching to walking the list by the order they are in the
INDEXF.SYS file and moving some operations out of the read loop just
about cut it in half. I'm not sure that it will be possible to reduce
it any more as the only processing that is still going on inside the
read loop is copying data, some bitwise operations ("and"ing, "or"ing,
negating, and shifting), and a couple of adds. I may try to move
everything except the data copying out of the loop, but there isn't
much left to move and moving the things I did move made only a
very small improvement. It is possible that it may help to only
copy the data that Samba actually uses - there are fields in the
stat structure that it may not need, and I'm filling them all.
There are a variety of improvements that could be made to the scan_dir
routines, but they are increasingly difficult to implement (especially
correctly). For example, at the moment the disk I/O is done synchronously
with a sys$qiow reading one block, the data is processed, then moving on
to the next read. Some sort of multiple buffer with asynchronous I/O
scheme could probably reduce that null time considerably.
If Samba's attempts to list all the files in a directory could be
switched over to this method of doing them all at once then pulling
the data out of the list as it is needed and cleaning up after it is
all sent, this could reduce the time it takes by maybe up to 45% if
done using the current version of the routines (it wouldn't reduce file
name translation or network transmission times or any other processing,
so it would probably be a bit less). It would need to be modified to
only attempt this for the full directory listings, not when looking
for individual files (as the PC gets the listing for a directory it
also sends a request for each subdirectory you sent to check it for a
file called desktop.ini, for example).
If all the necessary data is obtainable with it, it may also be better
to directly use the RMS sys$parse and sys$search routines - I'm not sure
why since I had originally assumed that this was how stat() was getting
its data, but it could be doing something a lot like I am doing but just
for one file at a time. RMS does do a variety of caching that might be
helping its times. It is hard to say what it's times would be like if
all the neccessary XABs are added to the FAB, but it will probably be
slower than the scan_dir method as it is close now and already uses
nearly a factor of 3 more CPU time.
>Potential solutions mostly involve recoding SAMBA to reduce the number
>of times that it calls stat().
Caching some of it may help a little - it does stat() some filespecs
repeatedly. (It turns out that Samba does have a thing called a "stat
cache", but after looking at it I think it isn't caching the results
of stat() calls. It appears to be used for some sort of translated
filename caching so that it can stat() a filename without having to
retranslate it, just look it up.)
>This also affect UNIX but not to the degree that it hurts OpenVMS.
>
>But the true fix may require significant optimizations to the UNIX code
>base.
I suspect that doing something like I have tested could be in the
realm of "significant" - but it is possible that this would only
involve changing things is a rather small number of places in the
code. I haven't checked.
--- Carl Perkins
Computer Systems Manager
Geochemical & Environmental Research Group
Texas A&M University
carl at gerg.tamu.edu
More information about the samba-vms
mailing list