[Samba] Spotlight indexing with fscrawler for multiple shares

Kees van Vloten keesvanvloten at gmail.com
Thu Aug 10 15:15:39 UTC 2023

Op 10-08-2023 om 15:38 schreef Matthias Kühne | Ellerhold 
Aktiengesellschaft via samba:
> Hey Kees,
> fs2es-indexer is designed to be a lightweight alternative to FSCrawler.
> So no ... it doesnt do any content indexing or saves much of the metadata.
> As far as I understand it the OCR and other stuff makes FScrawler that
> big. And we dont need any of that - we just want to search for file names.
> BUT Im open for merge requests ;-)

Hopefully there is some-one on the mailinglist who wants to help.

As said: I love to move away from FScrawler but things like mp3 metadata 
and exif data etc. are really handy when you want to search. At the 
moment I spend my time on the Windows-search solution Noel Powers is 
building in Samba, so I won't be sending merge requests shortly :-)

For indexing there is no difference between Spotlight and 
Windows-search, so that why I am setting everything up for both.

> I currently getting away with a lot less complexity because I dont need
> to watch for changes in files. Because thats not something I'm indexing.
> If I'd be adding more metadata (even only size!) I have to verify that
> it stays correct and start to listen to "file X has changed" events
> somehow...
> fanotify seems like a sweet framework for that, but sadly ZFS is
> incompatible with it...
> Samba does not let me get this data efficiently either, so Im forced to
> regular scans of the whole fs.... which might take a while depending on
> the amount of files.
I am not an expert but I guess you would need to something like a Samba 
vfs module to get notified on file change.
> Adding support for opensearch though shouldnt be that hard, right? I've
> already got a version switch for ES v7 and v8, adding OS to it should be
> easy enough!
Opensearch is fully compatible with ES 7 (<= 7.10.2), so that is not an 
> Have a nice day,
> Matthias.
> Am 10.08.23 um 15:01 schrieb Kees van Vloten via samba:
>> Hi Matthias,
>> Op 10-08-2023 om 14:46 schreef Matthias Kühne | Ellerhold
>> Aktiengesellschaft via samba:
>>> Hey Kees,
>>> disclaimer: shameless self-plug!!
>>> If you dont need content indexing you can use my indexer:
>>> https://github.com/Ellerhold/fs2es-indexer
>> I have looked at it because of troubles with FScrawler and I love your
>> solution because it does not need heavy weight java.
>> But there is one thing FScrawler is good at: it indexes all kinds of
>> metadata of files (like exif data in photos etc), it can even do OCR.
>> This is what the fs2es-indexer does not seem to do (to my understanding).
>> That is the reason why I am stuck with FScrawler for now.
>> Hopefully I am wrong and you are going to tell me that fs2es-indexer
>> has all the functionality of FScrawler but not the issues :-)
>> The other thing is that I am pushing data to Opensearch which requires
>> me to patch and  compile FScrawler, another complexity I don't like
>> very much.
>> - Kees
>>> Ive created it because I couldnt get FScrawler to work correctly.
>>> You can add as many directories as you like in the config, it'll crawl
>>> it through one daemon service.
>>> I'm planning on adding smb.conf parsing, so you dont even have to add
>>> these directories into the yaml file and just use samba as you would.
>>> Let me know if you need some help setting it up or otherwise.
>>> Have a nice day,
>>> Matthias.
>>> Am 04.08.23 um 19:56 schrieb Kees van Vloten via samba:
>>>> Hi Team,
>>>> Did anybody solve the issue of FScrawler crawling over multiple
>>>> shares, preferably from a single job or from a single service?
>>>> Setting up a service for FScrawler per share does not scale very
>>>> nice...
>>>> - Kees.

More information about the samba mailing list