xapian 1.4 performance issue
Jean-Francois Dockes
jf at dockes.org
Fri Dec 15 16:25:20 GMT 2017
Olly Betts writes:
> On Fri, Dec 08, 2017 at 11:08:00AM +0100, Jean-Francois Dockes wrote:
> > This is the only really short term solution: any other is weeks or months
> > away. Is the "stub database" feature the appropriate way to create Chert
> > databases with Xapian 1.4 ?
>
> With 1.4 you can pass Xapian::DB_BACKEND_CHERT in the flags when
> constructing the WritableDatabase object.
>
> I noticed recently that this doesn't quite work as advertised in the
> case when the database already exists but is not of the specified type.
> It's meant to just open the database in that case (and ignore the
> backend hint), but it actually seems to create a new database with the
> specified backend in the same directory. I'll fix that, but obviously
> that won't help with existing releases. You can try with
> Xapian::DB_OPEN first, then Xapian::DB_BACKEND_CHERT if that fails,
> though that's slightly racy. Not sure there's a better workaround
> though.
I hadn't noticed that Xapian now had these db creation flags, so I am using
a stub file for creating a new index and it seems to work fine.
> > Another possibility for me would be to decide that Chert is good enough and
> > appropriate for Recoll, and bundle it together with the appropriate Xapian
> > parts.
>
> That wouldn't be popular with distros packaging recoll - they'll want to
> use their existing Xapian packages instead of a bundled code copy, e.g.
> see:
>
> https://wiki.debian.org/UpstreamGuide#No_inclusion_of_third_party_code
> https://fedoraproject.org/wiki/Bundled_Libraries
> https://wiki.gentoo.org/wiki/Why_not_bundle_dependencies#When_code_is_bundled.3F
>
> It also means you wouldn't benefit from improvements in new Xapian
> releases, and would end up having to maintain the old version you picked
> yourself.
Thanks for the links about bundling !
Yes, distribution policies and maintenance are definitely problems for this
approach, which remains a possibility if it proves too hard or too onerous
(in terms of index size or query times) to do things otherwise .
Recoll already has a Fedora exemption for bundling code from an old imap
server package, and I think that I can make a reasonable case for another
exemption. I'd strip down Xapian source to keep only the backend and
associated code (no need for the query parser, unicode etc.), and probably
link statically. The static link part will be a return to what I did when
Xapian itself was not universally packaged :) At worse, not being in the
distributions is not the end of the world.
I am also not too worried about maintenance, the old index format has
worked largely flawlessly for quite some time, so it will be mainly a
question of fixing compiler compatibility issues from time to time. The
fact that I can consider doing this is a tribute to Xapian code quality, by
the way.
Also, it might put me in a position to do something about my old wish for
Xapian query interruptibility...
Still, it is a last resort option, no doubt. The priority is exploring the
impact of storing the document texts.
Cheers,
jf
More information about the Xapian-discuss
mailing list