[Samba] Debugging Samba4 - application sometimes fails because files are invisible/gone

Johannes Amorosa | Celluloid VFX johannesa at celluloid-vfx.com
Fri Apr 8 09:17:44 UTC 2016



On 04/08/2016 02:01 AM, Jeremy Allison wrote:
> On Tue, Apr 05, 2016 at 09:28:12AM +0200, Johannes Amorosa | Celluloid VFX wrote:
>> Hello Samba list,
>> we have a problem that our proprietary application sometimes can't
>> find files on our samba share. I'm hoping
>> for some help on this list.
>>
>> Our setup is two ADs as replicated domain Controller ( Ubuntu
>> 12.04.5 LTS, Version 4.1.17-SerNet-Ubuntu-10.precise)
>> and several domain member as file servers and mixed clients (~40 x
>> Win7, Ubuntu and OSX). The ADs use internal DNS.
>>
>> We have a proprietary software that runs as a cluster and needs a
>> common shared network volume. This volume is
>> on a domain member running (Ubuntu 12.04.5 LTS, Version
>> 4.1.17-SerNet-Ubuntu-10.precise) with a zfs Raid 0.6.3.
>>
>> Authentication is done via pam and works fine. All test described
>> [1]here succeed and we're using this setup in production for over
>> a year.
>>
>> Problem: Sometimes (1-2/month) our application fails with a error
>> message like:
>> \\cell-dead-01\deadlinerepo\jobs\56fe4a61b9baa917e4169c31\DraftCreateMovie.py
>> (System.IO.FileNotFoundException)"
>>
>> Although the file exists and has the same acl like everything else:
>> /silo/deadlinerepo/jobs/56fe4a61b9baa917e4169c31/DraftCreateMovie.py
>>
>> We know that zfs is maybe not production ready and needs at least to
>> be upgraded to 0.6.5.6.
>> We should upgrade samba as well at least to 4.2.X. This will be done
>> hopefully in may. It's possible
>> we hit a bug in the application itself. Meanwhile I'm trying to make
>> sense of samba log files and
>> basically fail of spaminess. I configured vfs_audit to get behind
>> these issues to see who is
>> responsible. I'm seeing a lot of errors and want to know what to
>> make out of it. In one day
>> audit.log increased to 35mb.
>>
>> Here a some snippets:
>>
>> deadlinerepo|is_offline|fail (Operation not
>> supported)|scripts/Submission/HServerSubmission.py
>> deadlinerepo|translate_name|fail (Operation not supported)
>> deadlinerepo|sys_acl_get_file|fail (Operation not
>> supported)|scripts/Submission
>> deadlinerepo|open|ok|r|custom/scripts/Submission
>> deadlinerepo|realpath|fail (No such file or directory)|custom/events/Draft
>>
>> Interesting enough the app runs perfect most of the time - but if
>> this happens it ruins a day of computation and
>> deadlines are always super tight meaning overtime for some of us.
>> Can someone shed some
>> light on this? Thank you for your time.
>> Joe
> Sorry, but there's not enough info for us to
> determine what might be the problem. Getting
> it repeatable will be the first step.
>
Thank you Jeremy for answering my post - I have upgraded all our DCs and 
fileservers to 4.2. in hope
of not hitting that bug again - zfs upgrade requires a reboot - we have 
a window next week.

Unfortunately after the upgrade my audit log stays empty.

-- 
Johannes Amorosa | Celluloid VFX

Celluloid Visual Effects GmbH & Co. KG
Paul-Lincke-Ufer 39/40, 10999 Berlin
phone +49 (0)30 / 54 735 220
fax   +49 (0)30 / 54 735 221




More information about the samba mailing list