[Xapian-tickets] [Xapian] #763: Track unique term bounds for documents in the collection

Xapian nobody at xapian.org
Sat Jul 28 02:06:24 BST 2018


#763: Track unique term bounds for documents in the collection
-------------------------+---------------------------
 Reporter:  gp1308       |             Owner:  gp1308
     Type:  enhancement  |            Status:  new
 Priority:  normal       |         Milestone:
Component:  Library API  |           Version:
 Severity:  normal       |        Resolution:
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  All
-------------------------+---------------------------

Comment (by olly):

 Can you show a patch of the changes you're talking about in comment:7?

 Replying to [comment:8 gp1308]:
 > Also Instead of using `current_wdf` of each term, `termlist_size` can be
 used to update bounds for the number of unique terms?

 The correct count of unique terms ought to exclude those for which `wdf ==
 0`, so to get that we'd need to actually look at `current_wdf` -
 `termlist_size` will often be more than the correct value.

 However, currently for efficiency we approximate like this:

 {{{
 Xapian::termcount
 GlassTermList::get_unique_terms() const
 {
     LOGCALL(DB, Xapian::termcount, "GlassTermList::get_unique_terms",
 NO_ARGS);
     // get_unique_terms() really ought to only count terms with wdf > 0,
 but
     // that's expensive to calculate on demand, so for now let's just
 ensure
     // unique_terms <= doclen.
     RETURN(min(termlist_size, doclen));
 }
 }}}

 So the bound here needs to based on the same thing, so it's actually a
 bound on the value that can return.

 At some point it's likely we'll start storing the number of unique terms
 in a similar way to how we store the document length.  That's probably not
 going to happen for glass now though, as it would be hard to start doing
 so compatibly.

--
Ticket URL: <https://trac.xapian.org/ticket/763#comment:9>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list