[Xapian-discuss] making my db leaner and meaner
Ben Campbell
ben at scumways.com
Tue Mar 31 10:58:44 BST 2009
Olly Betts wrote:
> On Thu, Mar 26, 2009 at 04:30:09PM +0000, Ben Campbell wrote:
> It's worth taking a look at the terms indexed for each document (the
> delve tool in xapian-core/examples is good for this) and seeing if
> you can get rid of any junk. It depends on the nature of the data,
> but things like ASCII art, OCRed documents, files with the wrong
> extensions, etc can result in terms which aren't useful for searches.
Ahh good point - there is probably a lot of cruft in there.
Is it actually possible to block terms entirely when using
TermGenerator::index_text()?
TermGenerator seems to add even stopped terms, albeit only in their
non-stemmed form.
Thanks,
Ben.
More information about the Xapian-discuss
mailing list