[Xapian-devel] docid type redifine
Olly Betts
olly at survex.com
Fri Aug 19 12:15:39 BST 2005
On Wed, Jul 20, 2005 at 12:11:06PM +0400, Pronichev Alexander wrote:
> I need to redefine a docid type (and all dependent types) like this:
> typedef unsigned long long docid;
Just curious as to why. Are you really indexing more than 4 billion
documents?
> I think it would be enough to edit "include/xapian/types.h", but it isn't so.
> 1) I've added :
>
> string
> om_tostring(unsigned long long val)
> {
> CONVERT_TO_STRING("%llu")
> }
That should work.
> in common/utils.{h,cc}
>
> 2) In include/enquire.h (line 438) I've found the following declaration:
> ESetIterator operator[](Xapian::doccount i) const;
> but I think it would be
> ESetIterator operator[](Xapian::termcount i) const;
> isn't it?
Yes, that's wrong (though is harmless when doccount and termcount are
actually the same type as they always currently are). I've noticed (and
fixed) a few cases like this where doccount and termcount are confused
in the past, but I expect there are more.
The documentation comment is wrong too (says "document" instead of "term"),
I've fixed both. Thanks for reporting this.
> 3)
> Then I have the following errors while compiling backends:
>
> quartz_postlist.cc: In constructor `
> QuartzPostList::QuartzPostList(Xapian::Internal::RefCntPtr<const
> Xapian::Database::Internal>, const Btree*, const Btree*, const std::string&)
> ':
> quartz_postlist.cc:673: error: cannot convert `doccount*' to `termcount*' for
> argument `3' to `docid read_start_of_first_chunk(const char**, const char*,
> termcount*, termcount*)'
>
> so I have a question: what is function static Xapian::docid
> read_start_of_first_chunk(...) for? (actually I don't understand what
> is "posting list"). Must the 3rd parameter be Xapian::doccount instead
> of Xapian::termcount? Or maybe QuartzPostList::number_of_entries class
> property must be Xapian::termcount instead of Xapian::doccount?
The 3rd parameter should be doccount. I've fixed this too.
However, I think you'll need to change both termcount and doccount
anyway since otherwise the collection frequency (which is returned as
a termcount) could overflow. The problem is that the collection
frequency is effectively doccount * termcount so needs to be the larger
of the two types (or ideally a type large enough to hold the product of
the two) - currently it's returned as termcount.
Cheers,
Olly
More information about the Xapian-devel
mailing list