[Samba] Searching Samba share file contents

Noel Power nopower at suse.de
Thu Jun 15 14:07:59 UTC 2023


On 15/06/2023 14:26, Nick Couchman via samba wrote:
>>> Hey, Noel,
>>> Is this ready to be tested out? Is the process simply to check out the
>>> npower_wsp_norecurse_client branch, build Samba from that, and then
>>> give it a go?
>> yes, simply add  --enable-wsp to your configure line and you should be
>> good to go
>>
>> please note: this is only the client part and the client will only work
>> against windows machines where the WSP service is enabled and configured
>>
> Ah, bummer - I don't think I have any Windows file servers with search
> enabled, or enough data to make a difference - my need is the
> opposite, I need the WSP service on Samba so that my Windows Clients
> can search more quickly, especially over higher-latency (VPN) links.

I understood from you message you were interested in the client side 
since you mentioned the merge request associated with 
npower_wsp_norecurse_client. The good news is I also have experimental 
server side code if you would like to try it out

To try out the server code best you can clone it from
https://git.samba.org/npower/samba.git

branch is current_wsp_417_wip

As the name suggests it is very much work in progress

Building
--------------
1. clone https://git.samba.org/npower/samba.git
2. checkout current_wsp_417_wip
3. ./configure.developer ${YOUR_OWN_CONF_SWITCHES} --enable-wsp && make -j

Indexing a share
---------------------------

1. install fscrawler (I used version fscrawler-es7-2.7-SNAPSHOT) [1]
https://fscrawler.readthedocs.io/en/latest/installation.html
2. install and start a version of elasticsearch (I used 
elasticsearch-7.11.2) [2]
3. configure fscrawler (I attach my conf '_setting.yaml' file here) and
4. start fscrawler 'bin/fscrawler job_name'

configure samba
---------------------------

[global]
     wsp backend = elasticsearch
elasticsearch:wsp_mappings=${PATH_TO}/elasticsearch_mappings.json

[share_to_search]

     wsp = true

an initial version of elasticsearch_mappings.json is available from 
'source3/rpc_server/wsp/elasticsearch_mappings.json' in the build tree

other global smb.conf settings

   'elasticsearch:address' ip address of elasticsearch (defaults to 
localhost)

   'elasticsearch:port' port num to connect to

There are also some settings relevant to encryping the elasticsearch 
connection with tls (I don't mention them here for now as it's best to 
have clear text communication setting thing up

    'elasticsearch:wspindex' name of index to search defaults to '_all'

    'elasticsearch:acl_filtering' enables acl filtering of results based 
on the authenticated user (enable this if the normal elasticsearch 
security features are not enough for you, by default acl_filtering is 
turned off and all results from elasticsearch are used. Note: it is 
possible to set up elasticsearch for document and index security based 
on user/gid but there are instances where this might not be appropriate 
or enough for to satisfy a particular use case. At this time access to 
elasticsearch is anonymous, even when we support accessing elasticsearch 
with the current user there still might be reasons why elasticsearch 
document or index security might not be enough and in this case it is 
best to set acl_filtering. The down side of setting acl_filtering is 
that only a limited number of results are available (as the server needs 
to cache the results that it itself acl filters) The default no. of 
results is returned with acl_filtering enabled is 200 (can be modified 
with global param 'wsp results limit'

start samba
-------------------

depending on how you installed samba start as appropriate 😄

test search
------------------

a) with cmdline tool
     wspsearch -U${USER}%${PASS} --kind documents //${SERVER}/${SHARE}

where 'kind' is one of 
"Calendar|Communication|Contact|Document|Email|Feed|Folder|Game|InstantMessage|Journal|Link|Movie|Music|Note|Picture|Program|RecordedTV|SearchFolder|Task|Video|WebHistory"

see wspsearch --help for some more details

Note: when searching against a samba share only only a subset of the 
categories above are supported (search is based on mimetype associations 
setup in elasticsearch_mappings.json, please have a look in there to see 
what types are supported) supported kinds include the obvious ones, 
Music, Video, Pictures

b) with windows client
      a) with windows explorer navigate to share contents,  click on the 
'Search' tab which should give you access to the search ribbon, from 
there you can select the various 'kinds' from a dropdown, you can refine 
the searches for example by size, date etc.

I've probably left out vital details, I probably should create something 
on the wiki at some stage, if you want to try it out then please feel 
free to mail with problems/questions etc.

Noel

[1] probably quite old now, I downloaded quite some time ago and didn't 
update it (I include the version just for information as to the setup 
that currently is working for me)
[2] again this is probably now an 'old' version, I don't recall when I 
last (re)downloaded the rpm, again the version info is just for 
completeness as this is what I am using for testing
[2] again this is probably now an 'old' version, I don't recall when I 
last (re)downloaded the rpm, again the version info is just for 
completeness as this is what I am using for testing ne it from
https://git.samba.org/npower/samba.git


More information about the samba mailing list