[Xapian-discuss] add_database for WritableDatabases

Olly Betts olly at survex.com
Wed Nov 28 15:10:43 GMT 2007


On Mon, Nov 26, 2007 at 09:43:08PM +0100, Thomas Viehmann wrote:
> Does a WritableDatabase with several DBs added have defined/stable
> semantics as to where documents are stored upon replace_document?

WritableDatabase only supports a single sub-database at present.
Nothing actually prevents you calling add_database() to add more, but
the results are entirely undefined.

It would be nice to make this work in a sensible way though.  Then
you could split indexing load with a WritableDatabase which round-robins
updates across several remote servers.

> I would want to be able to re-index parts of the archive and to replace
> messages.
> Or is it preferable to have some sort of external partition of what is
> in which DB?

Currently that is what you need to do.  If you commonly want to search
over subsets of the data, this approach probably is better anyway.

> Also, is there a way (short of patching but configuration rather than
> passing parameters) to make omega search through multiple databases?

It's only currently supported by passing multiple DB parameters, or a
single DB parameter with a list of database names separated by "/".

> Finally, is there a simple good way of searching a database with
> documents stemmed in different languages? The two naive ideas I could
> come up with is split the index into databases by language or search
> with something like
>   OR_{lang in languages} (queriy_stemmed_for_lang AND LANG=lang)...

This sort of multi-language search is a problem I've seen come up a
number of times over the years I've been involved in search, and I've
yet to see a totally satisfactory solution.

You can determine the language of a document pretty reliably (e.g. look
at the textcat library), but a query string is often too short to make
a reliable determination.  Some queries are ambiguous as they make sense
in multiple languages.

If you can, I think it's best to sidestep these problems and set up your
UI so that the user actually specifies (explicitly or implicitly) what
language their query is in.  Then search a database of documents in just
that language (since you can identify these reliably enough).

Cheers,
    Olly



More information about the Xapian-discuss mailing list