[Xapian-discuss] Xapian performance testing

Arjen van der Meijden acmmailing at tweakers.net
Wed Mar 19 19:31:14 GMT 2008


Well, the slowness is to be expected due to the way xapian (or rather 
omega and omindex/scriptindex) process those kind of terms: They split a 
term and index the elements as seperate terms. And when such a term is 
encountered in a query, the term is rewritten to be a phrase query (i.e. 
d-link, "d link" and d.link all yield the same results).

Phrase queries are (relatively) slow because they use positional 
information, and these kind of examples are extra heavy because terms 
like 'd', 'link', 's' and 'video' are very common words. For subterms 
that aren't so frequent or (since 1.0.4) when ANDed with some other 
terms or filters its less of a problem.
It probably does pay off to index any "term + single character term c" 
as both "term", "c" and "term + c" rather than only "term" and "c".

Afaik Olly had already some ideas to be a much more efficient while 
consuming a bit more (disk)space, but I have no idea whether he has 
decided upon anything and/or started implementing them.
He already did optimise the "version number" special cases, which used 
to be very slow too (1, 0 etc are very common terms).

Best regards,

Arjen

On 19-3-2008 20:05 Kevin Duraj wrote:
> I have encountered the same issue having slow queries when term
> contain dash characters.
> 
> I do not have solution except that I replaced the '-' with 'dash'
> inside of the index.
> Example: 'd-link', 's-video' as temporary solution: ' ddashlink', ' sdashvideo'
> 
> Perhaps we could address this slowness issue of dash characters in the
> near future.
> 
> Kevin Duraj
> http://myhealthcare.com
> 
> 
> On Tue, Mar 18, 2008 at 1:26 AM, Arjen van der Meijden
> <acmmailing at tweakers.net> wrote:
>> On 5-11-2007 14:40, Olly Betts wrote:
>>  > BTW, I have implemented the hoisting of the positional information
>>  > checking part of NEAR and PHRASE, so that the "AND" inside can be
>>  > merged with other AND and FILTER operations.  This gave a big
>>  > performance boost to the slow queries (~50% saving in time just
>>  > from this one change) and a good boost to the other queries (~25%
>>  > saving from just this change).
>>  >
>>  > This optimisation and all the earlier ones are in 1.0.4, so once you
>>  > upgrade to that, it would be interesting to see what the slow query log
>>  > looks like with these new optimisations in place.
>>
>>  We finally found time to upgrade our 0.9.8 to 1.0.5 and reindex the
>>  whole database. The results so far are quite good.
>>  When taking the daily average of our forum search result page it went
>>  down from about 0.55 seconds to about 0.38 seconds.
>>  Looking at the log-files, the slow query log file (queries taking more
>>  than 2 seconds) dramatically reduced in size. Prior to the update we had
>>  3190 and 3112 lines in a week and now in the latest week it had only 863
>>  lines.
>>
>>  The slowest queries seem to be the single-term phrases with a single
>>  character attached to a common word like 'd-link', 's-video' and
>>  variants on that with only a few additional terms. As expected, I don't
>>  see any version numbers anymore in the slow query log.
>>
>>  Best regards,
>>
>>  Arjen van der Meijden
>>  Tweakers.net
>>
>>  _______________________________________________
>>  Xapian-discuss mailing list
>>  Xapian-discuss at lists.xapian.org
>>  http://lists.xapian.org/mailman/listinfo/xapian-discuss
>>
> 



More information about the Xapian-discuss mailing list