[Xapian-tickets] [Xapian] #763: Track unique term bounds for documents in the collection
Xapian
nobody at xapian.org
Sat Jul 28 02:06:24 BST 2018
#763: Track unique term bounds for documents in the collection
-------------------------+---------------------------
Reporter: gp1308 | Owner: gp1308
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Library API | Version:
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+---------------------------
Comment (by olly):
Can you show a patch of the changes you're talking about in comment:7?
Replying to [comment:8 gp1308]:
> Also Instead of using `current_wdf` of each term, `termlist_size` can be
used to update bounds for the number of unique terms?
The correct count of unique terms ought to exclude those for which `wdf ==
0`, so to get that we'd need to actually look at `current_wdf` -
`termlist_size` will often be more than the correct value.
However, currently for efficiency we approximate like this:
{{{
Xapian::termcount
GlassTermList::get_unique_terms() const
{
LOGCALL(DB, Xapian::termcount, "GlassTermList::get_unique_terms",
NO_ARGS);
// get_unique_terms() really ought to only count terms with wdf > 0,
but
// that's expensive to calculate on demand, so for now let's just
ensure
// unique_terms <= doclen.
RETURN(min(termlist_size, doclen));
}
}}}
So the bound here needs to based on the same thing, so it's actually a
bound on the value that can return.
At some point it's likely we'll start storing the number of unique terms
in a similar way to how we store the document length. That's probably not
going to happen for glass now though, as it would be hard to start doing
so compatibly.
--
Ticket URL: <https://trac.xapian.org/ticket/763#comment:9>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list