[Samba] Bulk smbcacls calls

Wed Dec 11 04:15:34 MST 2013

Hi Peter
On 11/12/13 10:34, Peter Flood wrote:
> Hi Noel
>
> That sounds like a great addition to smbcacls.
>
> We played around with the source of the of sbmcacls and found that a
> lot of the time is spent converting the numeric user/group ids to
> their human equivalents. eg if 1 file has 3 users and 5 groups there's
> 8 requests to resolve the numeric user/group ids (1 request per
> conversion if I recall correctly), so we realised by repeatedly
> calling smbcacls use we were effectively looking up the same groups
> multiple times, we needed to cache the lookup results.
very interesting, I didn't yet find the need to do any in-depth
performance analysis, caching those values in the context of recursive
operations seems indeed to be a good idea.
>
> Then we found that we could get the acls with numeric output, similar
> output to smbcacls but without the conversion to human form, with
> latest version of pysmbc from
> https://git.fedorahosted.org/cgit/pysmbc.git/ (the latest commits
> added the functionality we wanted which wasn't in the version from
> pypi). To get the human user/group representation we use smbcacls and
> parse the output and store in a numeric -> human map so we only make
> max 1 request per new user/group encountered (a bit hackish but it
> works for us). It would be good to be able to make the same lookup
> request that smbcacls makes to resolve a user/group id in python,
yup, like mentioned above, I think smbcalcs would benifit from caching
that info
> it would be a useful addition to pysmbc if the data is available from
> libsmbclient.
libsmbclient is somewhat outside my experience sofar (I am new to samba,
smbcacls is the only thing I have looked at in any depth).
> By doing it this way  we've found that we can process 200-300 files
> per second in our setup (approx 13,000 files, not sure how many
> directories). 
so, if I run 'smbcacls --get -r --numeric' on the same test directory
(20,069 files in 2,842 directories) it finishes in ~30 seconds
>
> We scan to get all the individual file objects into our database then
> make 1 request per file to get the acls, using a recursive version of
> smbcalcs and matching files in the output back to those in our db
> would be awkward in our situation, especially if files are added or
> removed in the period between the scan and recursive smbcacls call.
not entirely sure what you mean about the awkwardness of "recursive
version of smbcalcs and matching files in the output back to those in
our db", surely smbcalcs ( a recursive version ) should mean you don't
need to do this 2 step process, you should just get the info you need.
Regarding files being added and removed, isn't that going to be a
problem ( regarding stale data ) no matter what approach you take (
unless you can somehow lock the directory being precossed for the
duration of the operation(s) )?
>
> I welcome any comments regarding our approach.
>
> I'll give your new version of smbc a go this afternoon if I get a chance.
please do!

thanks,

Noel