[Xapian-discuss] timing behaviour for creating large queries

Kevin Duraj kevin.softdev at gmail.com
Thu Feb 8 19:23:06 GMT 2007


Markus,

I am doing similar research with Xapian Perl bindings on Linux using Boolean
OR and AND queries and my numbers are below. I will do same measurement with
same data on same machine with Lucene .NET on Windows. I would like to know
if Xapian performs better then Lucene and if yes, how much better?

 - 100 Boolean (AND / OR) queries of 500 terms against 20 million docs =
avg: 139 ms per query.
   real    0m13.915s
   user    0m9.513s
   sys     0m4.397s

 - 100 Boolean (AND / OR) queries of 1,000 terms against 20 million docs =
avg: 275 ms per query.
   real    0m27.503s
   user    0m20.302s
   sys     0m7.193s

-Kevin Duraj


On 2/7/07, Markus Wörle <mrks at mrks.de> wrote:
>
> Hello
>
> I am using Xapian with perl-bindings, and a self-written queryparser,
> which is mandatory for my application.
>
> My queryparser builds a query object by connecting two objects by an
> operator using the Search::Xapian::Query constructor, e.g.
>
> Search::Xapian::Query->new(OP_OR, $left, $right)
>
> and so on. The queries it'll get, are mostly probabilistic and could
> become very large - about 200 up to 2400 OR'ed terms.  To estimate
> the amount of time for a growing amount of terms inside a query, i
> did some benchmarks. A result can be found here:
>
> http://5nord.org/~mrks/query-performance.gif
> (this is tested on a _very_ slow machine, so do not be concerned
> about the concrete amount of time mentioned there)
>
> The yellow area is the amount of time which is needed by xapian to
> process the query. The red area is the time needed by my queryparser
> to parse the query and build up the query object. You see, its
> runtime behaves quadratic. I did some research on this (since my
> parsing algorithm has a linear complexity), and found out that the
> Search::Xapian::Query constructor, which joins the subtrees behaves
> quadratic. I think this is, because each joined three will be
> validated by libxapian, independent of if its subtrees got already
> validated or not. To get more internal: Every call of the constrator
> gets the query validated by Xapian::Query::Internal::prevalidate_query
> (), and prevalidate_query() calls validate_query() which calls
> prevalidate_query() recursively on each subtree.
>
> Do you have any suggestions for my problem?
>
> As a final solution, and if the recursive prevalidate_query() is
> really the reason to my weak performance (it badly looks so), I would
> have to patch my copy and remove those checks.
>
> Regards,
> mrks
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>


More information about the Xapian-discuss mailing list