Bad write "optimization" in Samba 2.2.8

Jeremy Allison jra at samba.org
Wed Feb 4 01:42:26 GMT 2004


On Wed, Feb 04, 2004 at 02:32:30AM +0100, Dummbaz wrote:
> 
> recently, I found that on my SuSE Linux 9.0 system (employing Samba 
> 2.2.8), I got much less I/O throughput than on SuSE 8.1 (employing Samba 
> 2.2.5).
> 
> I tested with IOZONE 2.01, a simple program I used for writing and 
> reading a 512 MByte file, consisting of 8192 byte blocks.
> 
> The read performance was O.K. with both versions, but the write 
> performance suffered an 80% loss with 2.2.8 (i.e. less than 2 MByte/s 
> instead of 11 MByte/s).
> 
> Upon further investigation with strace, I found the following access 
> pattern with 2.2.8:
> 
> 18720 23:56:09.013708 _llseek(24, 1073151, [1073151], SEEK_SET) = 0
> 18720 23:56:09.013771 write(24, "\0", 1) = 1
> 18720 23:56:09.015442 send(12, 
> "\0\0\0/\377SMB/\0\0\0\0\210\1\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 51, 0) = 51
> 18720 23:56:09.015616 select(21, [5 12 20], NULL, NULL, {60, 0}) = 1 (in 
> [12], left {60, 0})
> 18720 23:56:09.015808 read(12, "\0\0\0P", 4) = 4
> 18720 23:56:09.015879 read(12, 
> "\377SMB2\0\0\0\0\30\7H\0\0\0\0\0\0\0\0\0\0\0\0\1\0\377"..., 80) = 80
> 18720 23:56:09.015971 gettimeofday({1075848969, 15995}, NULL) = 0
> 18720 23:56:09.016047 fstat64(24, {st_mode=S_IFREG|0664, 
> st_size=1073152, ...}) = 0
> 18720 23:56:09.016141 _llseek(24, 0, [1073152], SEEK_CUR) = 0
> 18720 23:56:09.016270 send(12, 
> "\0\0\0T\377SMB2\0\0\0\0\210A\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 88, 0) = 88
> 18720 23:56:09.016399 select(21, [5 12 20], NULL, NULL, {60, 0}) = 1 (in 
> [12], left {60, 0})
> 18720 23:56:09.016632 read(12, "\0\0\0A", 4) = 4
> 18720 23:56:09.016697 read(12, 
> "\377SMB/\0\0\0\0\30\7h\0\0\0\0\0\0\0\0\0\0\0\0\1\0\377"..., 65) = 65
> 18720 23:56:09.016789 gettimeofday({1075848969, 16812}, NULL) = 0
> 18720 23:56:09.016858 _llseek(24, 1081343, [1081343], SEEK_SET) = 0
> 18720 23:56:09.016921 write(24, "\0", 1) = 1
> 
> With 2.2.5, this read:
> 
> 596   00:02:48.986598 _llseek(5, 262144, [262144], SEEK_SET) = 0
> 596   00:02:48.986657 write(5, 
> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
> 596   00:02:48.986822 send(12, 
> "\0\0\0/\377SMB/\0\0\0\0\210\1\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 51, 0) = 51
> 596   00:02:48.986960 select(21, [12 19 20], NULL, NULL, {60, 0}) = 1 
> (in [12], left {60, 0})
> 596   00:02:48.987402 read(12, "\0\0 @", 4) = 4
> 596   00:02:48.987496 read(12, 
> "\377SMB/\0\0\0\0\30\7H\0\0\0\0\0\0\0\0\0\0\0\0\1\0\377"..., 8256) = 8256
> 596   00:02:48.987627 gettimeofday({1075849368, 987648}, NULL) = 0
> 596   00:02:48.987693 _llseek(5, 270336, [270336], SEEK_SET) = 0
> 596   00:02:48.987752 write(5, 
> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
> 596   00:02:48.987914 send(12, 
> "\0\0\0/\377SMB/\0\0\0\0\210\1\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 51, 0) = 51
> 596   00:02:48.988054 select(21, [12 19 20], NULL, NULL, {60, 0}) = 1 
> (in [12], left {59, 980000})
> 596   00:02:49.002349 read(12, "\0\0 @", 4) = 4
> 596   00:02:49.002450 read(12, 
> "\377SMB/\0\0\0\0\30\7H\0\0\0\0\0\0\0\0\0\0\0\0\1\0\377"..., 8256) = 8256
> 596   00:02:49.002586 gettimeofday({1075849369, 2644}, NULL) = 0
> 596   00:02:49.002694 _llseek(5, 278528, [278528], SEEK_SET) = 0
> 
> Note the write() calls with only 1 byte instead of 8192.
> 
> Maybe I interpret this wrong, but it seems as if there is an 
> "optimization" in 2.2.8 which uses _llseek() and writes just one byte 
> every once in a while in the special case of consecutive zeros being 
> written. Actually, when I fixed IOZONE to write blocks of 0xff, 
> performance rose to the old level.
> 
> Although I have not tested it, I assume this approach is used with Samba 
> 3.x as well. I think this is a typical over-optimization. 
> Matter-of-fact, the assumption that skipping over a portion of a file 
> and writing just parts does always yield zeroes in the skipped parts is 
> plain wrong IMHO, apart from the fact that with Linux, this is actually 
> much slower than just writing the data as in 2.2.5.

Can you post your benchmark tester (I'm assuming this is a Windows
program) ? This will help me track down the different access patterns.

Jeremy.


More information about the samba-technical mailing list