[Samba] Fwd: Wsearch

npower npower at samba.org
Wed Feb 22 11:35:14 UTC 2023


Hi Kees, All

So, I have a version of samba with WIP Windows Search Service (using 
elasticsearch) backported against 4.17

you can clone it from
https://git.samba.org/npower/samba.git

branch is current_wsp_417_wip

As the name suggests it is very much work in progress :-) but since I 
have already a merge request for the wsp client, I have started (again) 
to try and prepare patches for the server side

Quick start Guide :-)

Building
--------------
1. clone https://git.samba.org/npower/samba.git
2. checkout current_wsp_417_wip
3. ./configure.developer ${YOUR_OWN_CONF_SWITCHES} --enable-wsp && make -j

Indexing a share
---------------------------

1. install fscrawler (I used version fscrawler-es7-2.7-SNAPSHOT) [1]
https://fscrawler.readthedocs.io/en/latest/installation.html
2. install and start a version of elasticsearch (I used 
elasticsearch-7.11.2) [2]
3. configure fscrawler (I attach my conf '_setting.yaml' file here) and
4. start fscrawler 'bin/fscrawler job_name'

configure samba
---------------------------

[global]
     wsp backend = elasticsearch
elasticsearch:wsp_mappings=${PATH_TO}/elasticsearch_mappings.json


an initial version of elasticsearch_mappings.json is available from 
'source3/rpc_server/wsp/elasticsearch_mappings.json' in the build tree

start samba
-------------------

depending on how you installed samba start as appropriate :-)

test search
------------------

a) with cmdline tool
     wspsearch -U${USER}%${PASS} --kind documents //${SERVER}/${SHARE}

where 'kind' is one of 
"Calendar|Communication|Contact|Document|Email|Feed|Folder|Game|InstantMessage|Journal|Link|Movie|Music|Note|Picture|Program|RecordedTV|SearchFolder|Task|Video|WebHistory"

see wspsearch --help for some more details

Note: when searching against a samba share only only a subset of the 
categories above are supported (search is based on mimetype associations 
setup in elasticsearch_mappings.json, please have a look in there to see 
what types are supported) supported kinds include the obvious ones, 
Music, Video, Pictures

b) with windows client
      a) with windows explorer navigate to share contents,  click on the 
'Search' tab which should give you access to the search ribbon, from 
there you can select the various 'kinds' from a dropdown, you can refine 
the searches for example by size, date etc.

By default the server will return unlimited results, to limit the number 
of results you can use  the global configuration parawsp result limitm
     'wsp result limit'

Noel

[1] probably quite old now, I downloaded quite some time ago and didn't 
update it (I include the version just for information as to the setup 
that currently is working for me)
[2] again this is probably now an 'old' version, I don't recall when I 
last (re)downloaded the rpm, again the version info is just for 
completeness as this is what I am using for testing
On 17/02/2023 16:30, npower via samba wrote:
>
> need to remember to mail from my samba.org address :-)
>
> -------- Forwarded Message --------
> Subject:     Re: [Samba] Wsearch
> Date:     Fri, 17 Feb 2023 16:28:29 +0000
> From:     Noel Power <nopower at suse.de>
> To:     Kees van Vloten <keesvanvloten at gmail.com>, npower via samba 
> <samba at lists.samba.org>
>
>
>
> Hi Kees
>
> On 17/02/2023 10:56, Kees van Vloten via samba wrote:
>> Hi Noel,
>>
>>
>> As we discussed on the list, I busy getting the bits and pieces in 
>> place to be able to test your windows search work.
>>
>> I am running all my stuff on Debian Bullseye, with Samba code in 3 
>> lxc-containers. Two are for the DCs, one is the fileserver. 
>> Everything is managed by Ansible code, I try to avoid manual changes 
>> to my environment(s) as much as possible. Generally manual changes 
>> are to test something and when it works I put it in code.
>>
>> The fileserver-container uses a mounted host directory to store the 
>> file shares.
>>
>> In the mean time I have installed Opensearch on the host. Opensearch 
>> is binary compatible with Elasticsearch 7.x and that is what 
>> FSCrawler requires, so should just work.
> as mentioned before, haven't used opensearch so I have no idea what, 
> or if there are any differences with elasticsearch or indeed how any 
> such differences might affect the WSP server implementation.
>>
>> Now I was looking at FSCrawler and I noticed the last release with 
>> compiled code is 2.7, is that version alright? I guess we do not want 
>> to run into issues in FSCrawler while working on Samba, hence 2.10 
>> snapshots feel like a bad idea.
>> /Did you find 2.9 binaries somewhere or how do you deal with this?/
> so, the version of fscrawler I last tested with is 
> fscrawler-es7-2.7-SNAPSHOT
>> For communication between Samba and Opensearch I will apply the code 
>> patch from Awen Saunders (Authorization header in smb.conf). Although 
>> not secure it is good enough for this testing.
> sure, I just use anonymous :-) (but you need to do some configuration 
> steps that I don't remember to get that to work with tls)
>>
>> As for Samba I have setup build code to create debian packages from 
>> https://salsa.debian.org/samba-team/samba. That delivers me .deb 
>> packages which can be installed with my automation on the fileserver.
> hmmm, I am currently trying to get this the server to work/build 
> against master, I'd be willing to backport it to 4.17 or 4.18 but 
> don't see much value in backporting it further back just for testing
>>
>> I guess that is not what we want for this project. My latest idea is 
>> to deploy a second fileserver container, specifically for this work.
>> /How do you build and install samba for dev and test work?/
> I just build  and run directly from the source tree, if you know the 
> correct options (e.g. for debian? ) to pass to configure, then in the 
> container you could probably easily just do 'make install' and it 
> should overwrite whatever version is there. The distro config options 
> would probably need to be changed a little so that you build and use 
> the correct versions of ldb/talloc/tevent etc. instead of using the 
> system ones
>>
>> Since FSCrawler will index the shares, I was thinking of installing 
>> it on the host and not in the fileserver-container. That helps when 
>> there are 2 fileserver-containers that work on the same underlying 
>> host storage (one for this work and the regular one).  That reduces 
>> duplication and keeps the containers lean.
>> /Would there be any objections with this approach?/
>
> whenever I tested this, it was with a very simple setup, I just run 
> elasticsearch on my local dev machine, I point fscrawler at a 
> particular share I use for testing to create the index and that's it. 
> After that I use my own 'wspsearch' client or connect a windows client 
> and perform searches from the file explorer/browser
>
>  I think whatever way you want to configure your system is entirely up 
> to you :-), at the end of the day the wsp server only needs to be able 
> to talk to the elasticsearch instance to query it. In fact I think I 
> only ever ran fscrawler once :-)
>
>>
>>
>> A lot of progress on my side but not yet ready :-)
>>
>> - Kees.
>>
> I hope to have a working (well working for simple searches at least) 
> wsp early next week
>
> Noel



More information about the samba mailing list