[Xapian-discuss] Re: Evaluating Xapian
Olly Betts
olly at survex.com
Thu Feb 10 12:43:52 GMT 2005
On Mon, Jan 31, 2005 at 01:44:42PM +0000, Richard Boulton wrote:
> On Fri, 2005-01-28 at 20:56 +0100, Arne Georg Gleditsch wrote:
> > Well, I'm fiddling with using Xapian for a source-code indexing system
> > where I want to index several releases of the same source code base
> > (the Linux kernel, primarily).
>
> As a side point - you might want to take a look at the "cvssearch"
> application in "xapian-applications/cvssearch", which is aiming at a
> somewhat similar task. I'm not sure exactly what state it is in - Olly
> has been gradually bringing it up to scratch as a Xapian application.
It still needs work. Mostly ensuring all CGI input is sanitised, and
sorting out some better documentation.
> > Where the same file exists in several
> > releases in an identical revision (which is true for a lot of files,
> > especially in a stable branch), I'd like to index this [file,revision]
> > only once. So I'm tagging the indexed documents with the releases
> > they occur in, incrementally adding tags as I index new releases.
If this means you often end up calling replace_document for the same
documents, the implicit flushes are probably what's making indexing slow.
> replace_document can cause an implicit flush of the database (but won't
> always). Specifically, if the document being modified was added or
> modified in the currently buffered batch, the database is flushed. This
> is because it's fiddly to handle this case, and for most usage patterns
> it's a fairly uncommon operation.
> [...]
> In the longer term, perhaps it would be worthwhile for us to try and
> remove this constraint.
We probably should. I suspect it's not actually especially hard to
handle - I added the forced flush because my original code didn't handle
this case properly and I wanted to concentrate on getting the improved
buffering working for the common cases.
Cheers,
Olly
More information about the Xapian-discuss
mailing list