xapian 1.4 performance issue

Jean-Francois Dockes jf at dockes.org
Thu Dec 7 09:29:09 GMT 2017


Hi,

I have had reports that Recoll has become unbearingly slow in some
instances.

After inquiry, this happens with Xapian 1.4 only, and the part which does
not work any more is the snippets extraction.

Recoll builds snippets by partially reconstructing documents out of index
contents.

For this, after determining a set of document term positions to be
displayed (around the hopefully interesting hits), it walks the document
term list, and, for each term, walks its position list looking for matches
with the target positions (there is no other way that I know of to
determine the term at a given position).

This used to be always very fast with Xapian 1.2. I do understand that this
is a very intensive operation, but performance has never been an issue at
all for displaying typical screens of 8-15 document abstracts.

This operation has become unbearingly slow in some cases with Xapian 1.4,
especially when a document has many terms.

The specific operation which has become slow is opening many term position
lists, each quite short.

In a quite typical example, the abstract generation time has gone from 100 mS
to 90 S (SECONDS), on a cold index. Users don't like it, they think that
the application is dead, this is what has triggered the user reports.

The TLDR is that Recoll is unusable with Xapian 1.4.

I don't know why I had not seen it earlier, probably because I always work
with warm indexes, this is an I/O issue.

Any idea how I can work around this ?

J.F. Dockes



More information about the Xapian-discuss mailing list