[Xapian-discuss] Emtpy records & unique key...

Wed May 11 08:04:59 BST 2005

--- Olly Betts <olly at survex.com> wrote:
> On Tue, May 10, 2005 at 01:31:26PM -0700, arjan
> holscher wrote:
> > Most of them are pretty easy. Although the
> internal
> > field needs a little explenation. I use a section
> ID
> > for each section. I multiply this with 1 million
> and
> > add the actual database row id to this. This way
> my
> > article always has the same unique ID. 
> 
> It might be cleaner to make internal <section
> id>:<row id> so you aren't
> assuming less than a million rows, but there's
> nothing actually wrong
> with your current scheme.
> 

Actually I tried this first. However, it seems that
this is not the case after all.

> > - Some of the records in the omega database simply
> do
> > not return data. They do not contain document data
> > although I'm absolutely sure I do not deliver
> empty
> > documents to scriptindex. Under what conditions is
> it
> > possible for scriptindex to 'discard' a document.
> 
> I can't really see how it can.
> 
> If a term is too long, the implicit flush can fail,
> but that will exit
> scriptindex with an error.  The "index" command
> doesn't allow this to
> happen, but "boolean" doesn't check (mostly because
> it's not clear what
> it should do if a term is too big - dropping the
> term doesn't really
> seem correct and dropping the document doesn't seem
> ideal either).
> 
> But that would mean the document failed to be added,
> not that it would
> be added with no document data.
> 
> Hmm, if the input has newlines in fields, are you
> escaping them as
> scriptindex expects?
> 

I escape newlines as expected, since the documents
already in the database already do contain spaces in
the texts with spaces.

Could it have anything to do with the fact that I pipe
the buffer at once to scriptindex? I believe my buffer
is several Mb's of size. Could it help if I split my
buffer in pieces? Or isn't there a possiblity that
this will solve my problem?

> > - The second issue is indexing. The first time I
> index
> > all the documents I get 14222 added documents.
> This is
> > the correct number since it's a new database.
> 
> And there are 14222 documents in the input?
> 
> Might be worth checking (with delve from
> xapian-examples) how many
> documents are in the database now.
> 

Delve isn't installed on the server omega is running
on. However I'll try to install it ;)

> > When I want to re-index the database I just throw
> all
> > the documents again at the database and I'd
> expected
> > to get 14222 updated documents (assuming no
> documents
> > are added during the index periods). However
> > scriptindex returns 2/3th of the total documents
> as
> > added and 1/3th of the documents as updated.
> 
> And again here.
> 
> > However I want ALL the documents to be updated ...
> not
> > added. I thought adding the unique field would
> solve
> > it. However this is not the case.
> 
> As far as I can see, what you have should work...
> 
So far, it doesn't work as expected and I hope that
somebody here is able to work out a working solution.

Thx in advance,

Arjan Holscher

__________________________________ 
Yahoo! Mail Mobile 
Take Yahoo! Mail with you! Check email on your mobile phone. 
http://mobile.yahoo.com/learn/mail