[Xapian-tickets] [Xapian] #326: Searches with chert are slow, due to slow doclen access

Xapian nobody at xapian.org
Thu Mar 12 23:41:43 GMT 2009


#326: Searches with chert are slow, due to slow doclen access
---------------------------+------------------------------------------------
 Reporter:  richard        |        Owner:  richard  
     Type:  defect         |       Status:  assigned 
 Priority:  normal         |    Milestone:  1.1.1    
Component:  Backend-Chert  |      Version:  SVN trunk
 Severity:  normal         |   Resolution:           
 Keywords:                 |    Blockedby:           
 Platform:  All            |     Blocking:           
---------------------------+------------------------------------------------

Comment(by richard):

 I've just attached a patch which tries another approach - it makes a
 separate chunk type for the case where a continuous list of docids are in
 the list.  This should probably just be an option for the doclen lists,
 since it's rare to get continuous lists in other situations, but it was
 easier to code it generally, to start with.

 This patch increases the performance quite nicely - from the original
 12.58 seconds it goes to 6.8 seconds.  get_doclength() is now taking 75%
 of the time spent under get_mset() (down from 85% unpatched), and of this
 only 66% is spent in move_forward_in_chunk_to_at_least() (down from 90% of
 time spent in get_doclength()).  Only 5% of the time is spent in
 unpack_uint() - the majority of the time in
 move_forward_in_chunk_to_at_least() (86%) is spent in the tight loop
 iterating through the wdf values looking for the right one to decode.

 If we had a datastructure which removed the need to spend any time in this
 tight loop, but didn't affect anything else (unrealistic, obviously, but
 an upper bound on what could happen), we could therefore expect the time
 spent in get mset to decrease by a proportion of: .75 * .66 * .86 = .42.
 This would result in a total time for the search of 3.9 seconds, rather
 than 6.8, which would be about 2500 queries per second.  This is still
 half the speed of flint, so we're going to have to think of more than that
 if we want to equal flint's speed in chert.

 (Admittedly, these single-term queries are a worst-case for chert versus
 flint, since we have two lists to look in rather than 1 - for a multi-term
 query, the lookup in the extra doclen list will be a lesser proportion of
 the total work.  However, they're probably an important case, and
 certainly a good place to try and optimise as much as possible.)

-- 
Ticket URL: <http://trac.xapian.org/ticket/326#comment:10>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list