[Samba] Bulk smbcacls calls

Wed Dec 11 05:01:51 MST 2013

Hi Noel

Is there any documentation regarding the protocols used by smbcacls to 
get the raw acls and lookup the user/group ids? Eg how to make the raw 
requests (I'm not great with C)?

On 11/12/2013 11:15, Noel Power wrote:
> Hi Peter
> On 11/12/13 10:34, Peter Flood wrote:
>> Hi Noel
>>
>> That sounds like a great addition to smbcacls.
>>
>> We played around with the source of the of sbmcacls and found that a
>> lot of the time is spent converting the numeric user/group ids to
>> their human equivalents. eg if 1 file has 3 users and 5 groups there's
>> 8 requests to resolve the numeric user/group ids (1 request per
>> conversion if I recall correctly), so we realised by repeatedly
>> calling smbcacls use we were effectively looking up the same groups
>> multiple times, we needed to cache the lookup results.
> very interesting, I didn't yet find the need to do any in-depth
> performance analysis, caching those values in the context of recursive
> operations seems indeed to be a good idea.
>> Then we found that we could get the acls with numeric output, similar
>> output to smbcacls but without the conversion to human form, with
>> latest version of pysmbc from
>> https://git.fedorahosted.org/cgit/pysmbc.git/ (the latest commits
>> added the functionality we wanted which wasn't in the version from
>> pypi). To get the human user/group representation we use smbcacls and
>> parse the output and store in a numeric -> human map so we only make
>> max 1 request per new user/group encountered (a bit hackish but it
>> works for us). It would be good to be able to make the same lookup
>> request that smbcacls makes to resolve a user/group id in python,
> yup, like mentioned above, I think smbcalcs would benifit from caching
> that info
>> it would be a useful addition to pysmbc if the data is available from
>> libsmbclient.
> libsmbclient is somewhat outside my experience sofar (I am new to samba,
> smbcacls is the only thing I have looked at in any depth).
>> By doing it this way  we've found that we can process 200-300 files
>> per second in our setup (approx 13,000 files, not sure how many
>> directories).
> so, if I run 'smbcacls --get -r --numeric' on the same test directory
> (20,069 files in 2,842 directories) it finishes in ~30 seconds
That's fast, I'd like to be able to do it at that speed.
>> We scan to get all the individual file objects into our database then
>> make 1 request per file to get the acls, using a recursive version of
>> smbcalcs and matching files in the output back to those in our db
>> would be awkward in our situation, especially if files are added or
>> removed in the period between the scan and recursive smbcacls call.
> not entirely sure what you mean about the awkwardness of "recursive
> version of smbcalcs and matching files in the output back to those in
> our db", surely smbcalcs ( a recursive version ) should mean you don't
> need to do this 2 step process, you should just get the info you need.
> Regarding files being added and removed, isn't that going to be a
> problem ( regarding stale data ) no matter what approach you take (
> unless you can somehow lock the directory being precossed for the
> duration of the operation(s) )?
Yes, replacing our current scan with a recursive smbcalcs call warrants 
further investigation.
>> I welcome any comments regarding our approach.
>>
>> I'll give your new version of smbc a go this afternoon if I get a chance.
> please do!
>
> thanks,
>
> Noel

Thanks
Peter