[Xapian-devel] docid type redifine

Olly Betts olly at survex.com
Fri Aug 19 12:15:39 BST 2005


On Wed, Jul 20, 2005 at 12:11:06PM +0400, Pronichev Alexander wrote:
> I need to redefine a docid type (and all dependent types) like this:
> typedef unsigned long long docid;

Just curious as to why.  Are you really indexing more than 4 billion
documents?

> I think it would be enough to edit "include/xapian/types.h", but it isn't so.
> 1) I've added :
> 
> string
> om_tostring(unsigned long long val)
> {
>     CONVERT_TO_STRING("%llu")
> }

That should work.

> in common/utils.{h,cc}
> 
> 2) In include/enquire.h (line 438) I've found the following declaration:
> ESetIterator operator[](Xapian::doccount i) const;
> but I think it would be 
> ESetIterator operator[](Xapian::termcount i) const;
> isn't it?

Yes, that's wrong (though is harmless when doccount and termcount are
actually the same type as they always currently are).  I've noticed (and
fixed) a few cases like this where doccount and termcount are confused
in the past, but I expect there are more.

The documentation comment is wrong too (says "document" instead of "term"),

I've fixed both.  Thanks for reporting this.

> 3)
> Then I have the following errors while compiling backends:
> 
> quartz_postlist.cc: In constructor `
>    QuartzPostList::QuartzPostList(Xapian::Internal::RefCntPtr<const 
>    Xapian::Database::Internal>, const Btree*, const Btree*, const std::string&)
>    ':
> quartz_postlist.cc:673: error: cannot convert `doccount*' to `termcount*' for 
>    argument `3' to `docid read_start_of_first_chunk(const char**, const char*, 
>    termcount*, termcount*)'
> 
> so I have a question: what is function static Xapian::docid
> read_start_of_first_chunk(...) for? (actually I don't understand what
> is "posting list"). Must the 3rd parameter be Xapian::doccount instead
> of Xapian::termcount? Or maybe QuartzPostList::number_of_entries class
> property must be Xapian::termcount instead of Xapian::doccount?

The 3rd parameter should be doccount.  I've fixed this too.

However, I think you'll need to change both termcount and doccount
anyway since otherwise the collection frequency (which is returned as
a termcount) could overflow.  The problem is that the collection
frequency is effectively doccount * termcount so needs to be the larger
of the two types (or ideally a type large enough to hold the product of
the two) - currently it's returned as termcount.

Cheers,
    Olly




More information about the Xapian-devel mailing list