[Xapian-discuss] timing behaviour for creating large queries

Markus Wörle mrks at mrks.de
Wed Feb 7 16:59:58 GMT 2007


Hello

I am using Xapian with perl-bindings, and a self-written queryparser,  
which is mandatory for my application.

My queryparser builds a query object by connecting two objects by an  
operator using the Search::Xapian::Query constructor, e.g.

Search::Xapian::Query->new(OP_OR, $left, $right)

and so on. The queries it'll get, are mostly probabilistic and could  
become very large - about 200 up to 2400 OR'ed terms.  To estimate  
the amount of time for a growing amount of terms inside a query, i  
did some benchmarks. A result can be found here:

http://5nord.org/~mrks/query-performance.gif
(this is tested on a _very_ slow machine, so do not be concerned  
about the concrete amount of time mentioned there)

The yellow area is the amount of time which is needed by xapian to  
process the query. The red area is the time needed by my queryparser  
to parse the query and build up the query object. You see, its  
runtime behaves quadratic. I did some research on this (since my  
parsing algorithm has a linear complexity), and found out that the  
Search::Xapian::Query constructor, which joins the subtrees behaves  
quadratic. I think this is, because each joined three will be   
validated by libxapian, independent of if its subtrees got already  
validated or not. To get more internal: Every call of the constrator  
gets the query validated by Xapian::Query::Internal::prevalidate_query 
(), and prevalidate_query() calls validate_query() which calls  
prevalidate_query() recursively on each subtree.

Do you have any suggestions for my problem?

As a final solution, and if the recursive prevalidate_query() is   
really the reason to my weak performance (it badly looks so), I would  
have to patch my copy and remove those checks.

Regards,
mrks



More information about the Xapian-discuss mailing list