[Samba] Spotlight indexing with fscrawler for multiple shares

Matthias Kühne | Ellerhold Aktiengesellschaft matthias.kuehne at ellerhold.de
Thu Aug 10 13:38:34 UTC 2023


Hey Kees,

fs2es-indexer is designed to be a lightweight alternative to FSCrawler. 
So no ... it doesnt do any content indexing or saves much of the metadata.

As far as I understand it the OCR and other stuff makes FScrawler that 
big. And we dont need any of that - we just want to search for file names.

BUT Im open for merge requests ;-)

I currently getting away with a lot less complexity because I dont need 
to watch for changes in files. Because thats not something I'm indexing. 
If I'd be adding more metadata (even only size!) I have to verify that 
it stays correct and start to listen to "file X has changed" events 
somehow...

fanotify seems like a sweet framework for that, but sadly ZFS is 
incompatible with it...

Samba does not let me get this data efficiently either, so Im forced to 
regular scans of the whole fs.... which might take a while depending on 
the amount of files.

Adding support for opensearch though shouldnt be that hard, right? I've 
already got a version switch for ES v7 and v8, adding OS to it should be 
easy enough!

Have a nice day,
Matthias.

Am 10.08.23 um 15:01 schrieb Kees van Vloten via samba:
> Hi Matthias,
>
> Op 10-08-2023 om 14:46 schreef Matthias Kühne | Ellerhold 
> Aktiengesellschaft via samba:
>> Hey Kees,
>>
>> disclaimer: shameless self-plug!!
>>
>> If you dont need content indexing you can use my indexer:
>> https://github.com/Ellerhold/fs2es-indexer
>
> I have looked at it because of troubles with FScrawler and I love your 
> solution because it does not need heavy weight java.
>
> But there is one thing FScrawler is good at: it indexes all kinds of 
> metadata of files (like exif data in photos etc), it can even do OCR. 
> This is what the fs2es-indexer does not seem to do (to my understanding).
>
> That is the reason why I am stuck with FScrawler for now.
>
> Hopefully I am wrong and you are going to tell me that fs2es-indexer 
> has all the functionality of FScrawler but not the issues :-)
>
> The other thing is that I am pushing data to Opensearch which requires 
> me to patch and  compile FScrawler, another complexity I don't like 
> very much.
>
> - Kees
>
>>
>> Ive created it because I couldnt get FScrawler to work correctly.
>>
>> You can add as many directories as you like in the config, it'll crawl
>> it through one daemon service.
>>
>> I'm planning on adding smb.conf parsing, so you dont even have to add
>> these directories into the yaml file and just use samba as you would.
>>
>> Let me know if you need some help setting it up or otherwise.
>>
>> Have a nice day,
>>
>> Matthias.
>>
>> Am 04.08.23 um 19:56 schrieb Kees van Vloten via samba:
>>> Hi Team,
>>>
>>>
>>> Did anybody solve the issue of FScrawler crawling over multiple
>>> shares, preferably from a single job or from a single service?
>>>
>>> Setting up a service for FScrawler per share does not scale very 
>>> nice...
>>>
>>>
>>> - Kees.
>>>
>>>
>
-- 
Senior Webentwickler
Datenschutzbeauftragter

Ellerhold Aktiengesellschaft
Friedrich-List-Str. 4
01445 Radebeul

Telefon: +49 (0) 351 83933-61
Web: www.ellerhold.de
Facebook: www.facebook.com/ellerhold.gruppe
Instagram: www.instagram.com/ellerhold.gruppe
Twitter: https://twitter.com/EllerholdGruppe

Amtsgericht Dresden / HRB 23769
Vorstand: Stephan Ellerhold, Maximilian Ellerhold
Vorsitzender des Aufsichtsrates: Frank Ellerhold



---Diese E-Mail und Ihre Anlagen enthalten vertrauliche Mitteilungen. Sollten Sie nicht der beabsichtigte Adressat sein, so bitten wir Sie um Mitteilung und um sofortiges löschen dieser E-Mail und der Anlagen.

Unsere Hinweise zum Datenschutz finden Sie hier: http://www.ellerhold.de/datenschutz/

This e-mail and its attachments are privileged and confidential. If you are not the intended recipient, please notify us and immediately delete this e-mail and its attachments.

You can find our privacy policy here: http://www.ellerhold.de/datenschutz/




More information about the samba mailing list