Fwd: backing up to a Drobo CIFS VFS: Error -11 sending data on socket to server, Bufferbloat?

Richard Sharpe realrichardsharpe at gmail.com
Fri May 12 16:44:26 UTC 2017


On Fri, May 12, 2017 at 7:11 AM, Richard Sharpe
<realrichardsharpe at gmail.com> wrote:
> On Fri, May 12, 2017 at 6:45 AM, Robert Kudyba <rkudyba at fordham.edu> wrote:
>>> Given your kernel log message I would expect seeing the reconnecting happen.
>>
>> I scheduled a tshark -c over night and was able to get it to run
>> within minutes of the reconnect attempts.It seems pretty consistent,
>> about 2 hours after the start of the cron the Error -11's start to
>> happen. Last night it also happened about 30 minutes into the job:
>>
>> May 11 23:47:51 localworkstation kernel: CIFS VFS: Server drobo-rh has
>> not responded in 120 seconds. Reconnecting...
>> May 11 23:49:59 localworkstation kernel: CIFS VFS: Server drobo-rh has
>> not responded in 120 seconds. Reconnecting...
>> May 11 23:52:57 localworkstation kernel: CIFS VFS: Server drobo-rh has
>> not responded in 120 seconds. Reconnecting...
>>
>> May 12 01:19:44 localworkstation cupsd: REQUEST localhost - - "POST /
>> HTTP/1.1" 200 181 Renew-Subscription successful-ok
>> May 12 01:21:20 localworkstation kernel: CIFS VFS: sends on sock
>> ffff9f2817bede00 stuck for 15 seconds
>> May 12 01:21:20 localworkstation kernel: CIFS VFS: Error -11 sending
>> data on socket to server
>>
>> Here's the tshark -r tshark.pcap -Y 'smb||smb2' results from 2 files:
>>
>>   593 0.273687063 drobo → localserver SMB 117 [TCP ACKed unseen
>> segment] Write AndX Response, 65536 bytes
>>   666 0.282264835 drobo → localserver SMB 117 Write AndX Response, 65536 bytes
>>     8 0.136979937 drobo → localserver SMB 117 [TCP ACKed unseen
>> segment] [TCP Previous segment not captured] Write AndX Response,
>> 65536 bytes
>>    36 0.142752037 drobo → localserver SMB 117 Write AndX Response, 65536 bytes
>>    63 0.149017707 drobo → localserver SMB 117 Write AndX Response, 65536 bytes
>>
>>> I would wrap your backup script like so to get the right capture:
>>>
>>>     capfile="$(date -Ihours)"
>>>     tcpdump -w "$capfile" port 445 &
>>>     cappid=$!
>>>     sleep 2
>>>     .....<your script>....
>>>     sleep 2
>>>     kill -6 $cappid
>>
>> OK just did that and started running the backup manually so far I've
>> seen variations/multiples of these (cut for easier reading):
>> 38735  18.521926 localserver → drobo SMB 10202 Write AndX Request,
>> FID: 0x1fca, 65536 bytes at offset 87949312[TCP segment of a
>> reassembled PDU]
>> 38760  18.527189 localserver → drobo SMB 7306 Write AndX Request, FID:
>> 0x1fca, 65536 bytes at offset 88014848[TCP segment of a reassembled
>> PDU]
>> 38771  18.529817 drobo → localserver SMB 117 Write AndX Response, FID:
>> 0x1fca, 65536 bytes
>> 44079  19.615349 drobo → localserver SMB 117 [TCP ACKed unseen
>> segment] Write AndX Response, 65536 bytes
>> 44102  19.626642 drobo → localserver SMB 117 [TCP ACKed unseen
>> segment] Write AndX Response, 65536 bytes
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> You are seeing some packet drops which might make it hard to figure
> out the issue, but those are simply the writes.
>
> What I would look for is places where the writes stop.
>
> You will likely see it better in wireshark by using Statistics->I/O
> Graph as it will show you where the throughput rate drops to zero.
>
> Then you can click on the graph at the point where it drops to zero to
> get the main window to go to that packet and then you can relate it to
> the tshark output.
>
> The earliest message you posted suggests that the client (Linux) was
> complaining of no activity for 120 seconds, which suggests that the
> Drobo stopped responding at some point. Thus I would expect to see one
> or more writes that did not get responses, followed by a connection
> drop and reconnect.

BTW, I should mention that if the capture is large you will need lots
of memory to deal with it. I have a Lenovo P50 with 64GB of ram and a
1TB SSD and I can deal with 5GB captures reasonably comfortably, but
with only 8 or 16GB, forget it.

-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)



More information about the samba-technical mailing list