[Xapian-discuss] incremental indexing
James Aylett
james-xapian at tartarus.org
Wed Jul 20 10:46:15 BST 2005
On Tue, Jul 19, 2005 at 06:18:12PM -0400, Arshavir Grigorian wrote:
> I just tried using the -d flag (-d ignore) and it worked just fine,
> though I am not sure what it is considering duplicates since all my
> filenames are unique across all subdirectories. Any ideas? Thanks.
The only thing I can think of is that your URLs (the unique terms
omindex uses for deduping) aren't being generated properly, although
it looks to me like they should be ... could you provide a dump of the
database after the first command (actually, the only thing I'm
interested in is the document data)? Probably another dump of what
happens if you only run the second command to a fresh database would
be useful, too.
Something like the following python script may help if you have the
bindings installed:
----------------------------------------------------------------------
import sys, xapian
db = xapian.Database(sys.argv[1])
i=0
j=1
while i<db.get_doccount():
try:
doc = db.get_document(j)
print "-------"
print doc.get_data()
i+=1
except:
pass
j+=1
----------------------------------------------------------------------
Run as:
$ python docdata.py path/to/database
> >>omindex --db /var/lib/omega/data/default --url /top /top_dir sub_dir[i]
> >>omindex --db /var/lib/omega/data/default --url /top /top_dir sub_dir[j]
Cheers,
James
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list