[Xapian-tickets] [Xapian] #394: Speed up phrase queries with a "settling pond"
Xapian
nobody at xapian.org
Sat May 28 22:17:19 BST 2022
#394: Speed up phrase queries with a "settling pond"
-------------------------+-------------------------------
Reporter: Olly Betts | Owner: Olly Betts
Type: enhancement | Status: assigned
Priority: normal | Milestone: 1.5.0
Component: Matcher | Version: git master
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+-------------------------------
Comment (by Olly Betts):
I found a somewhat newer version of the patch from 2013 on my dev box so
attaching in case it's useful. I think this is probably the version of
the patch Gustavo tested.
It would be good to put this ticket to bed, either by concluding that the
idea is no longer useful in the light of the numerous improvements to
phrase search performance we've made in the meantime, or by merging an
updated implementation of the idea.
Looking over the 2013 patch, some things come to mind.
* It's a shame that it only helps when the positional check is "and-like
with the root" - e.g. `"some phrase" OR "other words"` isn't helped at
all. I can see that a subquery could read ahead a bit and have its own
pond, but it gets more complicated with things like
`count_matching_subqs()` needing to work for the "current" entry and I'm
not sure how feasible it would actually be. Probably best to retry the
idea at the top level and if it's helps in that case think harder about
whether it's doable for a subquery.
* It could probably still use `read_position_list()` instead of creating
new `PositionList` objects every time, with a little rejigging so that the
docid to open for is specifiable. That would reduce memory allocation
overhead and mean that we might not have to deal in the term names.
* The data structure for the pond could be improved. Comments in the
patch suggest a min/max heap. Essentially we need to be able to
efficiently add items, find and remove the entry with the highest weight,
and remove all entries below a specified weight; the data structure has a
fixed max number of entries which is known in advance
--
Ticket URL: <https://trac.xapian.org/ticket/394#comment:18>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list