[Xapian-discuss] Theoretical question

Chris shef31 at yahoo.com
Tue Jan 17 04:50:11 GMT 2006


I've been reading the docs on the internal construction of Xapian. There's
discussion of autopruning and operator decay in the Matching section.

Elsewhere, though, it says that postings lists are stored in doc_id order,
instead of wdf order, which suggests that there could be high-ranking
documents at the end of a postings list.

How can autoprune and operator decay really have much effect, then? You
would almost always have to go to the end of every list.

Example: let's say we have 1000 documents, and we need to return the top 10
for a single-word query. On average, the top 10 will be scattered uniformly
across a postings list which is sorted in doc_id order, which means that at
least one of them will commonly be found 90% or 95% of the way into the
list.






More information about the Xapian-discuss mailing list