[Samba] debugging really high network usage with no apparent cause
chris (fool) mccraw
gently at gmail.com
Thu May 5 14:31:35 MDT 2011
i'm not a heavy samba user in general (works without any serious
tweaking on my modest home network), but one of my clients is and we
are seeing some strange behavior. perhaps it makes sense to someone
with more clue. if not, i could use some advice in how to figure out
what is going on.
the setup: a couple dozen windows clients (xp pro 64bit & vista
ultimate 64bit) taking fileservice, netbios, and WINS from an
up-to-date centos 5.5 64bit x86 machine running samba 3.3x
(samba3x-common-3.3.8-0.52.el5_5.2 and friends) on a switched gigabit
network. the client machines all have local authentication, but do
use samba to do hostname resolution of short hostnames (ie, goliath).
the issue: "network seems slow". quantifiably, some of the batch
(rendering, in this case) jobs that are running on the windows
machines are going much more slowly than expected--factor of 2-5x
slower than usual is a real killer when "usual" is 12 hours.
after checking on the machines and the server, i see nothing untoward
chewing up the CPU on either side--on the server, there are as many
smbd's as clients and they are splitting the CPU's evenly (server is a
16-core machine with 32G of ram and a big hardware raid hanging off 2
high-end areca cards). the server sits at a load average of about 2
when all the nodes are cranking and interactive use (on the server) is
fine then, though fileservice speed is definitely maxed out at
wirespeed when i run tests on a quiet network (spread between all the
clients. single clients only manage around 350Mbit/sec, but i
attribute that to windows, not the server, since it can talk scp at
closer to 900Mbit/sec to another linux machine and i can run more than
one ~300Mbit smb stream at a time to different clients).
so i checked the smbd/nmbd/winbindd logs and see nothing strange. i
fire up wireshark on the clients and holy crap there is a lot of
traffic when there should be none! i filtered out broadcast,
multicast, and indeed all traffic not destined for the host i was
monitoring and found that 2 separate clients (win xp and vista) were
each chugging along at approximately 1GByte/minute of traffic (in
approx 700k packets/min) *from* the server in a single tcp
conversation, when the workload should have been more like 0--these
guys were crunching numbers, not reading or writing files, not even
doing anything that should need nameservice (no interactive use or
background programs running). i don't know the tools they are using
well, but they are supposed to be reading in a small source file once
at the beginning of the job and dumping out, all at once at the end of
the process, a video frame in the neighborhood of 4MByte.
thinking something really untoward must be going on under the hood, i
cranked up debugging on smbd, nmbd, and winbindd to 3 with smbcontrol
and the things samba is logging did not change in any way (though
smbcontrol did report that debugging was set to level 3 across the
board for all 3 daemons).
all i understand from the wireshark dump is that all of this traffic
is between client and samba server, on port 445 for vista, and on port
139 for xp. and nearly all of it is server->client. wireshark's
"info" column labels the suspicious traffic as follows:
about 80% as TCP "[TCP segment of a reassembled PDU]",
~10% as TCP, "60579 > microsoft-ds [ACK] Seq=<integer> Ack=<other
integer> Win=65535, Len=0"
~5% as SMB "Read andX Request, FID 0x2aee, 16384 bytes at offset <integer>"
~5% as SMB "Read AndX Response, 16384 bytes"
if you already see what is going on, great, please let me know! if
not (and i'm in the "not" boat), could you recommend what i can look
at to determine the cause of so much traffic? it is the case that
such traffic is not network-wide, but only on clients that are running
these batch processes--idle clients have a modest and normal number of
network packets (less than a hundred a second even without filtering
broadcast/etc). it just seems to me like there shouldn't be that much
traffic to a loaded (cpu-bound, effectively i/o-less) client that is
working on a 400k source file and is hours away from writing byte 1 of
an output file. i'm happy to crank up debugging further, run tests on
the clients or server, or post packet dumps if these things would help
in the diagnosis. please let me know what further information would
thanks in advance for your help.
More information about the samba