[Xapian-devel] a confusion within chert_postlist.cc

Olly Betts olly at survex.com
Sat Feb 15 08:54:21 GMT 2014


On Sat, Feb 15, 2014 at 10:01:05AM +0800, Hurricane Tong wrote:
> I try to get familiar with Xapian by
> http://trac.xapian.org/ticket/326, but I'm confused with a function.
> 
> ChertPostlistTable::get_doclength 
> ( in chert_postlist.cc , line 63 )

You really want to look at the brass backend, since that's where you'd
be implementing this (the format of chert is fixed at this point).
But the corresponding code in brass_postlist.cc is pretty much the same.

> By annotation, this function is designed to return the length of a doc
> at a did, but why does it return doclen_pl->get_wdf()   ?

The way the chert and brass backends currently store the document
lengths is to have a postlist with an empty termname (which isn't a
valid normal term), and the wdf values in this postlist store the
document lengths rather than actual wdfs.

> doclen_pl is an autoptr pointing to a postlist of a term, its current
> doc is the doc with desired did, so I think this function should
> return doclen_pl->get_doclength() .

That would just go round in circles, since doclen_pl->get_doclength()
would call this_db->get_doclength(did) which would call
postlist_table.get_doclength(did, ptrtothis) which is the
BrassPostListTable::get_doclength() function you're looking at.  One
of these 3 methods needs to actually get the doclength from somewhere
and return it.

As an aside, we can probably actually get rid of
PostList::get_doclength() on trunk - we used to store the doclength in
the posting lists in flint (the backend before chert), but that means
you end up storing the doclength once for every term in the document,
which is rather space inefficient, so we no longer do.  I think I did
have a quick look at removing it a while ago and there was some
obstacle.

Cheers,
    Olly



More information about the Xapian-devel mailing list