[Samba] Debugging Samba4 - application sometimes fails because files are invisible/gone

Johannes Amorosa | Celluloid VFX johannesa at celluloid-vfx.com
Tue May 10 08:29:48 UTC 2016



On 04/08/2016 11:17 AM, Johannes Amorosa | Celluloid VFX wrote:
>
>
> On 04/08/2016 02:01 AM, Jeremy Allison wrote:
>> On Tue, Apr 05, 2016 at 09:28:12AM +0200, Johannes Amorosa | 
>> Celluloid VFX wrote:
>>> Hello Samba list,
>>> we have a problem that our proprietary application sometimes can't
>>> find files on our samba share. I'm hoping
>>> for some help on this list.
>>>
>>> Our setup is two ADs as replicated domain Controller ( Ubuntu
>>> 12.04.5 LTS, Version 4.1.17-SerNet-Ubuntu-10.precise)
>>> and several domain member as file servers and mixed clients (~40 x
>>> Win7, Ubuntu and OSX). The ADs use internal DNS.
>>>
>>> We have a proprietary software that runs as a cluster and needs a
>>> common shared network volume. This volume is
>>> on a domain member running (Ubuntu 12.04.5 LTS, Version
>>> 4.1.17-SerNet-Ubuntu-10.precise) with a zfs Raid 0.6.3.
>>>
>>> Authentication is done via pam and works fine. All test described
>>> [1]here succeed and we're using this setup in production for over
>>> a year.
>>>
>>> Problem: Sometimes (1-2/month) our application fails with a error
>>> message like:
>>> \\cell-dead-01\deadlinerepo\jobs\56fe4a61b9baa917e4169c31\DraftCreateMovie.py 
>>>
>>> (System.IO.FileNotFoundException)"
>>>
>>> Although the file exists and has the same acl like everything else:
>>> /silo/deadlinerepo/jobs/56fe4a61b9baa917e4169c31/DraftCreateMovie.py
>>>
>>> We know that zfs is maybe not production ready and needs at least to
>>> be upgraded to 0.6.5.6.
>>> We should upgrade samba as well at least to 4.2.X. This will be done
>>> hopefully in may. It's possible
>>> we hit a bug in the application itself. Meanwhile I'm trying to make
>>> sense of samba log files and
>>> basically fail of spaminess. I configured vfs_audit to get behind
>>> these issues to see who is
>>> responsible. I'm seeing a lot of errors and want to know what to
>>> make out of it. In one day
>>> audit.log increased to 35mb.
>>>
>>> Here a some snippets:
>>>
>>> deadlinerepo|is_offline|fail (Operation not
>>> supported)|scripts/Submission/HServerSubmission.py
>>> deadlinerepo|translate_name|fail (Operation not supported)
>>> deadlinerepo|sys_acl_get_file|fail (Operation not
>>> supported)|scripts/Submission
>>> deadlinerepo|open|ok|r|custom/scripts/Submission
>>> deadlinerepo|realpath|fail (No such file or 
>>> directory)|custom/events/Draft
>>>
>>> Interesting enough the app runs perfect most of the time - but if
>>> this happens it ruins a day of computation and
>>> deadlines are always super tight meaning overtime for some of us.
>>> Can someone shed some
>>> light on this? Thank you for your time.
>>> Joe
>> Sorry, but there's not enough info for us to
>> determine what might be the problem. Getting
>> it repeatable will be the first step.
>>
> Thank you Jeremy for answering my post - I have upgraded all our DCs 
> and fileservers to 4.2. in hope
> of not hitting that bug again - zfs upgrade requires a reboot - we 
> have a window next week.

Fyi. we have a suspect client side caching (broken by design?) - testing 
this week and keep you posted

https://technet.microsoft.com/en-us/library/ff686200%28v=ws.10%29.aspx


>
> Unfortunately after the upgrade my audit log stays empty.
>

-- 
Johannes Amorosa | Celluloid VFX

Celluloid Visual Effects GmbH & Co. KG
Paul-Lincke-Ufer 39/40, 10999 Berlin
phone +49 (0)30 / 54 735 220
fax   +49 (0)30 / 54 735 221




More information about the samba mailing list