[Samba] Debugging Samba4 - application sometimes fails because files are invisible/gone

Jeremy Allison jra at samba.org
Fri Apr 8 00:01:01 UTC 2016


On Tue, Apr 05, 2016 at 09:28:12AM +0200, Johannes Amorosa | Celluloid VFX wrote:
> Hello Samba list,
> we have a problem that our proprietary application sometimes can't
> find files on our samba share. I'm hoping
> for some help on this list.
> 
> Our setup is two ADs as replicated domain Controller ( Ubuntu
> 12.04.5 LTS, Version 4.1.17-SerNet-Ubuntu-10.precise)
> and several domain member as file servers and mixed clients (~40 x
> Win7, Ubuntu and OSX). The ADs use internal DNS.
> 
> We have a proprietary software that runs as a cluster and needs a
> common shared network volume. This volume is
> on a domain member running (Ubuntu 12.04.5 LTS, Version
> 4.1.17-SerNet-Ubuntu-10.precise) with a zfs Raid 0.6.3.
> 
> Authentication is done via pam and works fine. All test described
> [1]here succeed and we're using this setup in production for over
> a year.
> 
> Problem: Sometimes (1-2/month) our application fails with a error
> message like:
> \\cell-dead-01\deadlinerepo\jobs\56fe4a61b9baa917e4169c31\DraftCreateMovie.py
> (System.IO.FileNotFoundException)"
> 
> Although the file exists and has the same acl like everything else:
> /silo/deadlinerepo/jobs/56fe4a61b9baa917e4169c31/DraftCreateMovie.py
> 
> We know that zfs is maybe not production ready and needs at least to
> be upgraded to 0.6.5.6.
> We should upgrade samba as well at least to 4.2.X. This will be done
> hopefully in may. It's possible
> we hit a bug in the application itself. Meanwhile I'm trying to make
> sense of samba log files and
> basically fail of spaminess. I configured vfs_audit to get behind
> these issues to see who is
> responsible. I'm seeing a lot of errors and want to know what to
> make out of it. In one day
> audit.log increased to 35mb.
> 
> Here a some snippets:
> 
> deadlinerepo|is_offline|fail (Operation not
> supported)|scripts/Submission/HServerSubmission.py
> deadlinerepo|translate_name|fail (Operation not supported)
> deadlinerepo|sys_acl_get_file|fail (Operation not
> supported)|scripts/Submission
> deadlinerepo|open|ok|r|custom/scripts/Submission
> deadlinerepo|realpath|fail (No such file or directory)|custom/events/Draft
> 
> Interesting enough the app runs perfect most of the time - but if
> this happens it ruins a day of computation and
> deadlines are always super tight meaning overtime for some of us.
> Can someone shed some
> light on this? Thank you for your time.
> Joe

Sorry, but there's not enough info for us to
determine what might be the problem. Getting
it repeatable will be the first step.



More information about the samba mailing list