[Xapian-tickets] [Xapian] #700: Support Enquire::matching_terms_begin() without termlist table?

Xapian nobody at xapian.org
Wed Oct 23 04:35:39 BST 2019


#700: Support Enquire::matching_terms_begin() without termlist table?
---------------------------+--------------------------
 Reporter:  olly           |             Owner:  olly
     Type:  defect         |            Status:  new
 Priority:  normal         |         Milestone:  1.4.x
Component:  Backend-Glass  |           Version:
 Severity:  normal         |        Resolution:
 Keywords:                 |        Blocked By:
 Blocking:                 |  Operating System:  All
---------------------------+--------------------------
Description changed by olly:

Old description:

> ''(Split out of #181)''
>
> Currently `Enquire::matching_terms_begin()` uses the termlist of the
> document, comparing it terms in the query.  This means it doesn't work if
> the database has no termlist.  It's also another item to lookup for each
> result, and comparing the two lists of terms isn't free.
>
> It's also arguably not quite correct in some cases, for example for this
> query:
>
> {{{
> A OR (B AND NOT C)
> }}}
>
> It'll report `A` and `B` as matching terms in a document containing all
> three terms, but perhaps only `A` should be reported in such a case since
> `B AND NOT C` wouldn't say `B` matched this document.
>
> We could record the information about matching terms for each candidate
> entry in the proto-`MSet`, which would solve both of these issues.  The
> tricky part is doing this in a way which doesn't incur a significant
> space or time overhead during the match.  E.g. a bitmap of matching terms
> is fairly space efficient.
>
> If we don't care about the corner cases of which terms match like the one
> above, we could also skip through the posting lists a second time to get
> this information.  More data to decode, but it's likely to already be in
> cache.
>
> Probably doesn't need API or ABI changes, so suitable for 1.4.x.

New description:

 ''(Split out of #181)''

 Currently `Enquire::matching_terms_begin()` uses the termlist of the
 document, comparing it with terms in the query.  This means it doesn't
 work if the database has no termlist.  It's also another item to lookup
 for each result, and comparing the two lists of terms isn't free.

 It's also arguably not quite correct in some cases, for example for this
 query:

 {{{
 A OR (B AND NOT C)
 }}}

 It'll report `A` and `B` as matching terms in a document containing all
 three terms, but perhaps only `A` should be reported in such a case since
 `B AND NOT C` wouldn't say `B` matched this document.

 We could record the information about matching terms for each candidate
 entry in the proto-`MSet`, which would solve both of these issues.  The
 tricky part is doing this in a way which doesn't incur a significant space
 or time overhead during the match.  E.g. a bitmap of matching terms is
 fairly space efficient.

 If we don't care about the corner cases of which terms match like the one
 above, we could also skip through the posting lists a second time to get
 this information.  More data to decode, but it's likely to already be in
 cache.

 Probably doesn't need API or ABI changes, so suitable for 1.4.x.

--

--
Ticket URL: <https://trac.xapian.org/ticket/700#comment:1>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list