[Xapian-discuss] incremental indexing

Wed Jul 20 10:46:15 BST 2005

On Tue, Jul 19, 2005 at 06:18:12PM -0400, Arshavir Grigorian wrote:

> I just tried using the -d flag (-d ignore) and it worked just fine,
> though I am not sure what it is considering duplicates since all my 
> filenames are unique across all subdirectories. Any ideas? Thanks.

The only thing I can think of is that your URLs (the unique terms
omindex uses for deduping) aren't being generated properly, although
it looks to me like they should be ... could you provide a dump of the
database after the first command (actually, the only thing I'm
interested in is the document data)? Probably another dump of what
happens if you only run the second command to a fresh database would
be useful, too.

Something like the following python script may help if you have the
bindings installed:

----------------------------------------------------------------------
import sys, xapian
db = xapian.Database(sys.argv[1])
i=0
j=1
while i<db.get_doccount():
    try:
	doc = db.get_document(j)
	print "-------"
	print doc.get_data()
	i+=1
    except:
        pass
    j+=1
----------------------------------------------------------------------

Run as:

$ python docdata.py path/to/database

> >>omindex --db /var/lib/omega/data/default --url /top /top_dir sub_dir[i]
> >>omindex --db /var/lib/omega/data/default --url /top /top_dir sub_dir[j]

Cheers,
James

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org