[Samba] Friendly Reminder: Would you please comment on my findings?
awl1
awl1 at mnet-online.de
Fri Aug 18 20:54:29 UTC 2017
Hello Andrew,
many thanks for joining this discussion! :-)
Am 18.08.2017 um 21:46 schrieb Andrew Bartlett:
> I do realise you are in between a rock and a hard place. You have
> identified an interesting issue, triggered by a massive protocol change
> (so not able to be bisected down to a regression) that requires
> significant work to understand and may or not be possible to resolve.
Note that I have tracked down the issue to what I believe to be the root
cause, and the root cause is NOT an issue in Samba, but an issue with
Microsoft's SMB2/SMB3 client that uses completely inefficient
SMB2_FIND_ID_BOTH_DIRECTORY_INFO requests in SMB2/3 as opposed to
efficient FIND_FIRST2 requests in SMB1:
The main parts of my analysis of the issue are contained here:
https://lists.samba.org/archive/samba/2017-July/209749.html
https://lists.samba.org/archive/samba/2017-July/209750.html
https://lists.samba.org/archive/samba/2017-July/209751.html
Just citing key findings for your reference:
In SMB1, the Windows client executes one FIND_FIRST2 Request for each
file to be copied (i.e. in my scenario, ~ 1000 requests) returning
STATUS_NO_SUCH_FILE every time before actually creating/writing to the file.
When looking at the same file in the SMB2 3.1.1, the Windows client
issues a different Find operation (SMB2_FIND_ID_BOTH_DIRECTORY_INFO with
Pattern "*") that does not look for the particular file name that is
about to be written, but seems to try and list the whole
current directory's content with a pattern of "*". Note that, looping
through the 2000 files to be written in my scenario, the length of the
Samba's Find Response increases with every file successfully copied:
When copying file number 1000, the Find Response sends back a list of
all 999 files that have been successfully copied to this directory
before, and this list of 999 file names is not needed for any meaningful
purpose, as the goal only is to check whether file number 1000 already
exists in this list of 999 files (which it of course never
does!) or not. The last such call to SMB2_FIND_ID_BOTH_DIRECTORY_INFO
contained in the traces has a response length of about 64kB (containing
filenames that have already been written to the target directory but are
not needed/helpful in any way) and interestingly does not return
"STATUS_NO_MORE_FILES", but "STATUS_INFO_LENGTH_MISMATCH", maybe because
the buffer size for the result of the pattern lookup is only 64kB!?
Looking at the exact same scenario from a Linux Linux mount.cifs
vers=3.0 client unveils only four (!!!) SMB2 Find requests for the whole
scenario, where Windows Explorer sends no less than 2140 SMB2 Find
requests to copy ~ 1000 files to the share (1036 times
"SMB2_FIND_ID_BOTH_DIRECTORY_INFO, Pattern: *" plus 1104 times
SMB2_FIND_NAME_INFO Pattern: <file name>), and Windows command line
"xcopy" is even worse (3741 find requests in order to copy ~ 1000 files).
While even the Linux SMB2 client is still slower than the Windows SMB1
client, I tend to think that the remaining difference from 25 seconds
with SMB 1.5 in Win10 to 36 seconds with SMB2 3.0 in Linux (44%) might
be tolerable...
So IMHO I have already uncovered that it is the implementation of the
Windows SMB2/SMB3 client that is faulty, and what I'd ask the Samba team
is to
a) verify that my assessment is correct and
b) engage in raising this huge performance regression with Microsoft
(because this will definitely end up nowhere when I am trying to raise
this with MS as a private individual customer based on a single Win10
Pro license)...
> Have you tried to engage with Thecus on this? I know it seems odd, and
> getting to speak with an engineer who actually understands what you are
> trying to warn them might be very difficult, but they will be upgrading
> at some point and then it is a regression to them, and they may have
> the incentive to look into it. It seems like a long shot, but similar
> long shots include getting the attention of another NAS Vendor already
Engaging with Thecus on this will be rejected, as my NAS (a 2008
N4200PRO) is an EOL product. I have compiled my own version of Samba
4.6.5 and deployed it onto my NAS as an installable module, replacing
default Samba 3.5.16.
> using Samba 4.x, like NETGEAR, or as an enterprise linux customer?
As the performance regression bug is in the Windows client, even using a
very recent NAS with Samba 4.x will most definitely show the exact same
behaviour.
> Does this just happen on your NAS, or can you reproduce on stock Samba
> locally on a PC? Are you sure it always happened with SMB2? If you
> can find any SMB2-supporting release (early support was in 3.5 I think,
> and 3.6 had it off by default) that is not slow then bisect your way
> between that and master, it might undercover a regression (for example,
> due to our symlinks security fix).
As I have tracked it down to be a Windows SMB2 client-side issue, this
will most definitely show with every Windows SMB2 client and any Samba
server that speaks SMB2 or higher (i.e. versions 3.6 onwards).
I had already tried to "bisect" this very early in the process and
analyze other Samba versions, as laid out here:
https://lists.samba.org/archive/samba/2017-July/209731.html
http://home.mnet-online.de/awl1/Performance%20Regression.xls
http://home.mnet-online.de/awl1/Performance%20Regression.pdf
The results for different Samba server versions were consistent, but
only then (i.e. after my bisect attempts) it became apparent that it
rather is the Windows client to blame, and only if the protocol is SMB2...
> I hope this helps,
It will be most helpful when somebody from the Samba team (whether
Jeremy, you or somebody else) can spend some time in order to try and
understand/reproduce/assess my analysis. If you agree with my findings,
the "real" work afterwards will be to raise the issue with Mi9crosoft
and make them aware of the detrimental effects that their client-side
implementation of SMB2 has with regards to performance, when comparing
to the exact same scenario in SMB1.
As stated before, I am very convinced that I am not the only one
affected by this issue. The sad truth rather seems to be that everybody
else besides me seems to have silently accepted the poor performance in
a "huge number of small files" scenario, even though it became only as
poor as it is with SMB2 and was perfectly fine before with SMB1... :-(:-(:-(
Did I succeed in making myself clear enough? (Not that easy for a
non-native English speaker, as the issue is rather complex...)
Many thanks for considering my request for help with this & best regards
Andreas
More information about the samba
mailing list