[Xapian-discuss] using xapian for indexing mails [SOLVED]

djcb djcb.bulk at gmail.com
Sun Aug 31 10:36:29 BST 2008


On Sat, 30 Aug 2008, djcb wrote:

> Dear Xapian,
> 
> I am writing a little tool for indexing/searching email messages in
> maildirs.

<snip (16 lines)>

> Or is there some easier way to simply provide blobs of text, and being
> able to search for them later? I have the feeling I am misunderstanding
> something....

Thanks all for the quick replies! 

Matthew Somerville <matthew at mysociety.org> wrote:

> You want XapianTermGenerator, which takes a blob of text and adds all
> the words in it to Xapian. e.g. (snippet of the written-in-PHP
> http://sandwich.ukcod.org.uk/~matthew/subtitles/?source=1#indexer ):

Ah, that did the trick, great! I now integrated Xapian with my code, and
it seems to work nicely. I'll take a look at some of the other indexers
that were mentioned.

I noticed that the stemming is language-specific (understandably); is
there some recommended way to guess the language of a blob of text? For
me, speed is more important than 100% accuracy (which would be hard
anyway, and consider multi-language text etc...)

BTW, my little maildir indexer/searcher 'mu':
http://www.djcbsoftware.nl/code/mu/

Version 0.1 does not have Xapian-search yet, but 0.2 will :-)

Best wishes,
Dirk.



More information about the Xapian-discuss mailing list