[Xapian-devel] docid type redifine

Pronichev Alexander dyker at agava.com
Fri Aug 19 15:15:35 BST 2005


Sorry for my english, I'm not a native speaker.. ;-)

On Fri, 19 Aug 2005 12:15:39 +0100
Olly Betts <olly at survex.com> wrote:

> On Wed, Jul 20, 2005 at 12:11:06PM +0400, Pronichev Alexander wrote:
> > I need to redefine a docid type (and all dependent types) like this:
> > typedef unsigned long long docid;
> 
> Just curious as to why.  Are you really indexing more than 4 billion
> documents?
I am indexing objects from relational sql db (by object id). The point is that I have several daemons, which provide object ids from different ranges (6 bytes total) at the same time and I can ask any of them (randomly). That's why I can have objects with so large ids. But sure there are much less than 4 billion documents simultaneously in one xapian db.

> 
> > I think it would be enough to edit "include/xapian/types.h", but it isn't so.
> > 1) I've added :
> > 
> > string
> > om_tostring(unsigned long long val)
> > {
> >     CONVERT_TO_STRING("%llu")
> > }
> 
> That should work.
> 
> > in common/utils.{h,cc}
> > 
> > 2) In include/enquire.h (line 438) I've found the following declaration:
> > ESetIterator operator[](Xapian::doccount i) const;
> > but I think it would be 
> > ESetIterator operator[](Xapian::termcount i) const;
> > isn't it?
> 
> Yes, that's wrong (though is harmless when doccount and termcount are
> actually the same type as they always currently are).  I've noticed (and
> fixed) a few cases like this where doccount and termcount are confused
> in the past, but I expect there are more.
> 
> The documentation comment is wrong too (says "document" instead of "term"),
> 
> I've fixed both.  Thanks for reporting this.
> 
> > 3)
> > Then I have the following errors while compiling backends:
> > 
> > quartz_postlist.cc: In constructor `
> >    QuartzPostList::QuartzPostList(Xapian::Internal::RefCntPtr<const 
> >    Xapian::Database::Internal>, const Btree*, const Btree*, const std::string&)
> >    ':
> > quartz_postlist.cc:673: error: cannot convert `doccount*' to `termcount*' for 
> >    argument `3' to `docid read_start_of_first_chunk(const char**, const char*, 
> >    termcount*, termcount*)'
> > 
> > so I have a question: what is function static Xapian::docid
> > read_start_of_first_chunk(...) for? (actually I don't understand what
> > is "posting list"). Must the 3rd parameter be Xapian::doccount instead
> > of Xapian::termcount? Or maybe QuartzPostList::number_of_entries class
> > property must be Xapian::termcount instead of Xapian::doccount?
> 
> The 3rd parameter should be doccount.  I've fixed this too.
> 
> However, I think you'll need to change both termcount and doccount
> anyway since otherwise the collection frequency (which is returned as
> a termcount) could overflow.  The problem is that the collection
> frequency is effectively doccount * termcount so needs to be the larger
> of the two types (or ideally a type large enough to hold the product of
> the two) - currently it's returned as termcount.
Yes. Finally I've patched it like this. But it doesn't work. Documents successfully added to db (at least doccount() method returns correct value), but when I try to search these documents mset iterator size is 0. When I'm indexing documents with id smaller than unsigned int it works correctly.. I'm using Perl API.

> 
> Cheers,
>     Olly


-- 
WBR dyker
Agava Software




More information about the Xapian-devel mailing list