[Xapian-tickets] [Xapian] #326: Searches with chert are slow, due to slow doclen access
Xapian
nobody at xapian.org
Thu Mar 12 23:41:43 GMT 2009
#326: Searches with chert are slow, due to slow doclen access
---------------------------+------------------------------------------------
Reporter: richard | Owner: richard
Type: defect | Status: assigned
Priority: normal | Milestone: 1.1.1
Component: Backend-Chert | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
---------------------------+------------------------------------------------
Comment(by richard):
I've just attached a patch which tries another approach - it makes a
separate chunk type for the case where a continuous list of docids are in
the list. This should probably just be an option for the doclen lists,
since it's rare to get continuous lists in other situations, but it was
easier to code it generally, to start with.
This patch increases the performance quite nicely - from the original
12.58 seconds it goes to 6.8 seconds. get_doclength() is now taking 75%
of the time spent under get_mset() (down from 85% unpatched), and of this
only 66% is spent in move_forward_in_chunk_to_at_least() (down from 90% of
time spent in get_doclength()). Only 5% of the time is spent in
unpack_uint() - the majority of the time in
move_forward_in_chunk_to_at_least() (86%) is spent in the tight loop
iterating through the wdf values looking for the right one to decode.
If we had a datastructure which removed the need to spend any time in this
tight loop, but didn't affect anything else (unrealistic, obviously, but
an upper bound on what could happen), we could therefore expect the time
spent in get mset to decrease by a proportion of: .75 * .66 * .86 = .42.
This would result in a total time for the search of 3.9 seconds, rather
than 6.8, which would be about 2500 queries per second. This is still
half the speed of flint, so we're going to have to think of more than that
if we want to equal flint's speed in chert.
(Admittedly, these single-term queries are a worst-case for chert versus
flint, since we have two lists to look in rather than 1 - for a multi-term
query, the lookup in the extra doclen list will be a lesser proportion of
the total work. However, they're probably an important case, and
certainly a good place to try and optimise as much as possible.)
--
Ticket URL: <http://trac.xapian.org/ticket/326#comment:10>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list