[Xapian-discuss] Re: Xapian and research in IR: a few suggestions from experience

Sun Sep 16 20:02:58 BST 2007

On Thu, Sep 13, 2007 at 03:08:03PM +0200, Emmanuel Eckard wrote:
> I understand that fitting Xapian with a generic interface for research goes 
> somewhat against its optimisations for retrieval speed.

I think it only starts to get difficult if you want to try to optimise
the generic interface a lot.  For research work, speed often isn't a key
issue (it isn't for TREC-style evaluation, but it might be if you're
building a prototype to see how users interact).

> I don't know in which measure it could be possible to offer these
> features as supplementary packages or as configure options.

The problem with compile time options is that adding lots as them
rapidly increases the number of combinations and it becomes unfeasible
to regularly test all combinations.  Plus people building binary
packages have to choose a set of options, and if the one you need isn't
in that set then you can't use the packages.

I've committed a reworked version of Richard's user metadata idea (see
http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=143 for the
original patches) which essentially allows arbitrary tag data (in the
form of a std::string) to be associated with short key strings.  This
data is committed at the same time as other database changes, so it's
easy to ensure it stays in step with the rest of the Xapian database.

By using a suitable scheme for generating key names, this could be used
to store extra data associated with termnames, docids, etc as you
originally suggested - you just need to serialise it to a string.  I'd
be interested to hear how well this works if anybody tries it.

Cheers,
    Olly