[Xapian-discuss] high update-frequency strategy

Olly Betts olly at survex.com
Thu Aug 13 12:35:44 BST 2009


On Thu, Aug 13, 2009 at 09:18:40AM +0200, Jan wrote:
> Is there any way to make get_document "lazier" i.e. not do lookups in
> the persistent index - and do the meta-date replace "dirty" i.e. simply
> write the new value in the cache and don't make it persistent until
> flush() ?

This patch helps in many cases (for apt-xapian-index, it improved a
testcase of updating just values from about 40 seconds to less than one):

http://oligarchy.co.uk/xapian/patches/xapian-flint-lazy-update-backport-for-1.0.patch

It's quite likely to be in 1.0.15 (and more success stories would make
that more likely).

It's already in the 1.1.x development releases.

> What are the performance dis-/advantages of modeling meta-data as
> prefix-terms vs. document values ?

It depends how you want to use it really.  If you want to select one
or a few of the possible values, a prefixed boolean term is good.  But
if you want to select potentially large ranges, or perform more complex
tests than "is a member of" (e.g. geographical distance filtering) then
values are more flexible.

With 1.1.x, you can also use externally stored meta-data and
Xapian::PostingSource.

Cheers,
    Olly



More information about the Xapian-discuss mailing list