[Xapian-discuss] Theoretical question

Chris shef31 at yahoo.com
Tue Jan 17 00:53:05 GMT 2006


I've been reading the docs on the internal construction of Xapian. There's 
discussion of autopruning and operator decay in the Matching section.

Elsewhere, though, it says that postings lists are stored in doc_id order, 
instead of wdf order, which suggests that there could be high-ranking 
documents at the end of a postings list.

How can autoprune and operator decay really have much effect, then? You 
would almost always have to go to the end of every list.

Example: let's say we have 1000 documents, and we need to return the top 10 
for a single-word query. On average, the top 10 will be scattered uniformly 
across a postings list which is sorted in doc_id order, which means that at 
least one of them will commonly be found 90% or 95% of the way into the 
list. 






More information about the Xapian-discuss mailing list