Bad write "optimization" in Samba 2.2.8
Jeremy Allison
jra at samba.org
Wed Feb 4 01:42:26 GMT 2004
On Wed, Feb 04, 2004 at 02:32:30AM +0100, Dummbaz wrote:
>
> recently, I found that on my SuSE Linux 9.0 system (employing Samba
> 2.2.8), I got much less I/O throughput than on SuSE 8.1 (employing Samba
> 2.2.5).
>
> I tested with IOZONE 2.01, a simple program I used for writing and
> reading a 512 MByte file, consisting of 8192 byte blocks.
>
> The read performance was O.K. with both versions, but the write
> performance suffered an 80% loss with 2.2.8 (i.e. less than 2 MByte/s
> instead of 11 MByte/s).
>
> Upon further investigation with strace, I found the following access
> pattern with 2.2.8:
>
> 18720 23:56:09.013708 _llseek(24, 1073151, [1073151], SEEK_SET) = 0
> 18720 23:56:09.013771 write(24, "\0", 1) = 1
> 18720 23:56:09.015442 send(12,
> "\0\0\0/\377SMB/\0\0\0\0\210\1\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 51, 0) = 51
> 18720 23:56:09.015616 select(21, [5 12 20], NULL, NULL, {60, 0}) = 1 (in
> [12], left {60, 0})
> 18720 23:56:09.015808 read(12, "\0\0\0P", 4) = 4
> 18720 23:56:09.015879 read(12,
> "\377SMB2\0\0\0\0\30\7H\0\0\0\0\0\0\0\0\0\0\0\0\1\0\377"..., 80) = 80
> 18720 23:56:09.015971 gettimeofday({1075848969, 15995}, NULL) = 0
> 18720 23:56:09.016047 fstat64(24, {st_mode=S_IFREG|0664,
> st_size=1073152, ...}) = 0
> 18720 23:56:09.016141 _llseek(24, 0, [1073152], SEEK_CUR) = 0
> 18720 23:56:09.016270 send(12,
> "\0\0\0T\377SMB2\0\0\0\0\210A\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 88, 0) = 88
> 18720 23:56:09.016399 select(21, [5 12 20], NULL, NULL, {60, 0}) = 1 (in
> [12], left {60, 0})
> 18720 23:56:09.016632 read(12, "\0\0\0A", 4) = 4
> 18720 23:56:09.016697 read(12,
> "\377SMB/\0\0\0\0\30\7h\0\0\0\0\0\0\0\0\0\0\0\0\1\0\377"..., 65) = 65
> 18720 23:56:09.016789 gettimeofday({1075848969, 16812}, NULL) = 0
> 18720 23:56:09.016858 _llseek(24, 1081343, [1081343], SEEK_SET) = 0
> 18720 23:56:09.016921 write(24, "\0", 1) = 1
>
> With 2.2.5, this read:
>
> 596 00:02:48.986598 _llseek(5, 262144, [262144], SEEK_SET) = 0
> 596 00:02:48.986657 write(5,
> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
> 596 00:02:48.986822 send(12,
> "\0\0\0/\377SMB/\0\0\0\0\210\1\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 51, 0) = 51
> 596 00:02:48.986960 select(21, [12 19 20], NULL, NULL, {60, 0}) = 1
> (in [12], left {60, 0})
> 596 00:02:48.987402 read(12, "\0\0 @", 4) = 4
> 596 00:02:48.987496 read(12,
> "\377SMB/\0\0\0\0\30\7H\0\0\0\0\0\0\0\0\0\0\0\0\1\0\377"..., 8256) = 8256
> 596 00:02:48.987627 gettimeofday({1075849368, 987648}, NULL) = 0
> 596 00:02:48.987693 _llseek(5, 270336, [270336], SEEK_SET) = 0
> 596 00:02:48.987752 write(5,
> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
> 596 00:02:48.987914 send(12,
> "\0\0\0/\377SMB/\0\0\0\0\210\1\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 51, 0) = 51
> 596 00:02:48.988054 select(21, [12 19 20], NULL, NULL, {60, 0}) = 1
> (in [12], left {59, 980000})
> 596 00:02:49.002349 read(12, "\0\0 @", 4) = 4
> 596 00:02:49.002450 read(12,
> "\377SMB/\0\0\0\0\30\7H\0\0\0\0\0\0\0\0\0\0\0\0\1\0\377"..., 8256) = 8256
> 596 00:02:49.002586 gettimeofday({1075849369, 2644}, NULL) = 0
> 596 00:02:49.002694 _llseek(5, 278528, [278528], SEEK_SET) = 0
>
> Note the write() calls with only 1 byte instead of 8192.
>
> Maybe I interpret this wrong, but it seems as if there is an
> "optimization" in 2.2.8 which uses _llseek() and writes just one byte
> every once in a while in the special case of consecutive zeros being
> written. Actually, when I fixed IOZONE to write blocks of 0xff,
> performance rose to the old level.
>
> Although I have not tested it, I assume this approach is used with Samba
> 3.x as well. I think this is a typical over-optimization.
> Matter-of-fact, the assumption that skipping over a portion of a file
> and writing just parts does always yield zeroes in the skipped parts is
> plain wrong IMHO, apart from the fact that with Linux, this is actually
> much slower than just writing the data as in 2.2.5.
Can you post your benchmark tester (I'm assuming this is a Windows
program) ? This will help me track down the different access patterns.
Jeremy.
More information about the samba-technical
mailing list