[Samba] File corruption

Warren Odom wodom at stenocall.com
Mon Jun 14 01:05:02 GMT 2004


We have moved our 3rd-party multiuser billing system database files from
Novell NetWare to Samba (first 2.2.8a, and now, just last Friday evening,
upgraded to 3.0.2a) on Mandrake Linux 9.2 (kernel version 2.4).  Now, about
once every week or so, we get a file corruption, and last week (even after
upgrading some NICs) seemed to be even worse, with 5 or 6 problems.

Because we had no problems before changing servers, I think hardware errors
are probably not to blame, even though I've seen them implicated in Samba
discussions.  And because it only occurs when multiple users are in a file
(never otherwise, even after many, many index rebuilds and other file repair
operations done by a single user), my guess is that it stems from some sort
of locking or other synchronization problem.

Also, so far there does not seem to be a pattern as to which workstations
have errors, except generally the most-used ones.

We have a mix of Win 98 and Win 2K clients (mostly the former).  We used to
have two Win 95 workstations, but upgraded them to 98 to try to solve the
problems.

No Unix programs access these files (except for nightly backups), only the
billing software using Samba.

The workstations still login to NetWare as the primary network login, then
use the Windows networking to map the drive to Samba.  Our Samba
configuration file is very simple, with only one share.

I've tried various combinations of these three settings:

1.  I turned off all oplocks, and that didn't fix it.
2.  I set sync always = yes and strict sync = yes, and that didn't fix it
either.  (I have turned these off & on several times to see if there's any
effect.)
3.  Most recently I have set strict locking = yes.

Week before last we had 3 corruptions in 2 days.  After the first two,
that's when I finally turned on #3 above, and then within a few hours had
the third corruption.  The boss is really getting upset that I have to kick
everyone off the system to rebuild the problem file--some of these files are
> 300MB and take 2 hours or more to rebuild.  He is saying another problem,
and Samba goes into the trash and we revert to the Novell server.

I know it's hard to track down things like this, but here are some specific
questions:

1.  Are there any other options anyone can suggest trying?  Also, apart from
a server crash, would you expect #2 to be actually relevant to the problem
or not?

2.  I know Samba is supposed to re-read the config file periodically, and
I'm counting on that when I change the various options.  But how can I
really tell whether or not Samba has changed the option--and more to the
point, changed its behavior?  Do any of the above options have inherent
delays before Samba can change?  The way some of the corruptions have come
shortly after I changed a setting which would be expected to make the files
MORE safe, not less, have me wondering whether Samba is really changing the
settings.  I can use smbstatus to confirm there are no oplocks, but what
about the other settings?

In other words, must I stop & restart Samba after changes such as these
(thereby temporarily kicking everyone off the system, a real hassle)?

3.  What debugging level would be required for a developer to investigate
this?  Would it be preferable to be a combined log, or would separate logs
for each workstation be usable?  Is there a way to get Samba logs to contain
only the most recent stuff leading up to a non-reproducible-on-demand
incident like this, without filling them up with hours or days of clutter?

4.  Does anyone know of some software I could run to actually test Samba for
problems?  Something that would really exercise multi-user access.?

Any help would be MUCH appreciated.  I'm running out of time.

      Thanks -- Warren



More information about the samba mailing list