[Xapian-discuss] Tuning Phrase Searching

Olly Betts olly at survex.com
Mon Nov 7 13:03:30 GMT 2005


On Sat, Nov 05, 2005 at 10:39:43PM +0100, Arjen van der Meijden wrote:
> It may be faster to combine such word-pairs with normal phrase 
> searching, build a query that checks for the correct word-pairs and the 
> phrase.
> The drawback is of course that you'll increase the size of your postlist 
> quite a bit (you don't need it in the position table however).

Ages ago Richard and I came up with the idea of using word pairs to
reduce the set of candidates like this but instead of indexing the
actual word pairs, you index a hash of the word pair.  This allows you
to choose a tradeoff between the number of extra terms and the reduction
in the number of candidates.

It turns out other people have used much the same idea, which is handy
as it means it should actually work!

Anyway, this is on my todo list as an optional extra for speeding up
phrase searching.

> Olly already mentioned using Flint, using xapian-compact to further 
> decrease the size of the database may help a lot for searches. You may 
> want to keep two versions of your database, the non-compacted for 
> updating and the fully compacted for searches.

Oh yes, I should have mentioned compacting databases!

> For Flint the compaction is a bit less dramatic than for Quartz, with 
> Flint our 14G non-compacted database decreases to 12G compacted (which 
> uses zlib-compression as well).

It's not just the size reduction - the compact database has a structure
which allows for faster searching.

Incidentally the reason flint databases don't reduce in size as much is
that flint generally does a better job of keeping databases more compact
in typical use.

Cheers,
    Olly



More information about the Xapian-discuss mailing list