[Xapian-tickets] [Xapian] #700: Support Enquire::matching_terms_begin() without termlist table?
Xapian
nobody at xapian.org
Wed Oct 23 04:35:39 BST 2019
#700: Support Enquire::matching_terms_begin() without termlist table?
---------------------------+--------------------------
Reporter: olly | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone: 1.4.x
Component: Backend-Glass | Version:
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
---------------------------+--------------------------
Description changed by olly:
Old description:
> ''(Split out of #181)''
>
> Currently `Enquire::matching_terms_begin()` uses the termlist of the
> document, comparing it terms in the query. This means it doesn't work if
> the database has no termlist. It's also another item to lookup for each
> result, and comparing the two lists of terms isn't free.
>
> It's also arguably not quite correct in some cases, for example for this
> query:
>
> {{{
> A OR (B AND NOT C)
> }}}
>
> It'll report `A` and `B` as matching terms in a document containing all
> three terms, but perhaps only `A` should be reported in such a case since
> `B AND NOT C` wouldn't say `B` matched this document.
>
> We could record the information about matching terms for each candidate
> entry in the proto-`MSet`, which would solve both of these issues. The
> tricky part is doing this in a way which doesn't incur a significant
> space or time overhead during the match. E.g. a bitmap of matching terms
> is fairly space efficient.
>
> If we don't care about the corner cases of which terms match like the one
> above, we could also skip through the posting lists a second time to get
> this information. More data to decode, but it's likely to already be in
> cache.
>
> Probably doesn't need API or ABI changes, so suitable for 1.4.x.
New description:
''(Split out of #181)''
Currently `Enquire::matching_terms_begin()` uses the termlist of the
document, comparing it with terms in the query. This means it doesn't
work if the database has no termlist. It's also another item to lookup
for each result, and comparing the two lists of terms isn't free.
It's also arguably not quite correct in some cases, for example for this
query:
{{{
A OR (B AND NOT C)
}}}
It'll report `A` and `B` as matching terms in a document containing all
three terms, but perhaps only `A` should be reported in such a case since
`B AND NOT C` wouldn't say `B` matched this document.
We could record the information about matching terms for each candidate
entry in the proto-`MSet`, which would solve both of these issues. The
tricky part is doing this in a way which doesn't incur a significant space
or time overhead during the match. E.g. a bitmap of matching terms is
fairly space efficient.
If we don't care about the corner cases of which terms match like the one
above, we could also skip through the posting lists a second time to get
this information. More data to decode, but it's likely to already be in
cache.
Probably doesn't need API or ABI changes, so suitable for 1.4.x.
--
--
Ticket URL: <https://trac.xapian.org/ticket/700#comment:1>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list