[Xapian-discuss] using xapian for indexing mails [SOLVED]
djcb
djcb.bulk at gmail.com
Sun Aug 31 10:36:29 BST 2008
On Sat, 30 Aug 2008, djcb wrote:
> Dear Xapian,
>
> I am writing a little tool for indexing/searching email messages in
> maildirs.
<snip (16 lines)>
> Or is there some easier way to simply provide blobs of text, and being
> able to search for them later? I have the feeling I am misunderstanding
> something....
Thanks all for the quick replies!
Matthew Somerville <matthew at mysociety.org> wrote:
> You want XapianTermGenerator, which takes a blob of text and adds all
> the words in it to Xapian. e.g. (snippet of the written-in-PHP
> http://sandwich.ukcod.org.uk/~matthew/subtitles/?source=1#indexer ):
Ah, that did the trick, great! I now integrated Xapian with my code, and
it seems to work nicely. I'll take a look at some of the other indexers
that were mentioned.
I noticed that the stemming is language-specific (understandably); is
there some recommended way to guess the language of a blob of text? For
me, speed is more important than 100% accuracy (which would be hard
anyway, and consider multi-language text etc...)
BTW, my little maildir indexer/searcher 'mu':
http://www.djcbsoftware.nl/code/mu/
Version 0.1 does not have Xapian-search yet, but 0.2 will :-)
Best wishes,
Dirk.
More information about the Xapian-discuss
mailing list