[Xapian-discuss] Emtpy records & unique key...

Olly Betts olly at survex.com
Tue May 10 22:35:52 BST 2005


On Tue, May 10, 2005 at 01:31:26PM -0700, arjan holscher wrote:
> Most of them are pretty easy. Although the internal
> field needs a little explenation. I use a section ID
> for each section. I multiply this with 1 million and
> add the actual database row id to this. This way my
> article always has the same unique ID. 

It might be cleaner to make internal <section id>:<row id> so you aren't
assuming less than a million rows, but there's nothing actually wrong
with your current scheme.

> - Some of the records in the omega database simply do
> not return data. They do not contain document data
> although I'm absolutely sure I do not deliver empty
> documents to scriptindex. Under what conditions is it
> possible for scriptindex to 'discard' a document.

I can't really see how it can.

If a term is too long, the implicit flush can fail, but that will exit
scriptindex with an error.  The "index" command doesn't allow this to
happen, but "boolean" doesn't check (mostly because it's not clear what
it should do if a term is too big - dropping the term doesn't really
seem correct and dropping the document doesn't seem ideal either).

But that would mean the document failed to be added, not that it would
be added with no document data.

Hmm, if the input has newlines in fields, are you escaping them as
scriptindex expects?

> - The second issue is indexing. The first time I index
> all the documents I get 14222 added documents. This is
> the correct number since it's a new database.

And there are 14222 documents in the input?

Might be worth checking (with delve from xapian-examples) how many
documents are in the database now.

> When I want to re-index the database I just throw all
> the documents again at the database and I'd expected
> to get 14222 updated documents (assuming no documents
> are added during the index periods). However
> scriptindex returns 2/3th of the total documents as
> added and 1/3th of the documents as updated.

And again here.

> However I want ALL the documents to be updated ... not
> added. I thought adding the unique field would solve
> it. However this is not the case.

As far as I can see, what you have should work...

Cheers,
    Olly



More information about the Xapian-discuss mailing list