[Xapian-discuss] Spreading a database across multiple machines

Olly Betts olly at survex.com
Thu Mar 30 13:15:40 BST 2006


On Sat, Mar 25, 2006 at 06:01:59PM -0800, Philip Neustrom wrote:
> It seems like the logical thing to do would be to create a Database
> object and then add_database() for each database.  However, I'm
> looking at a situation in which there could be thousands of
> independent databases, and doing add_database() for each possible site
> seems like it could be inefficient in this case.

You'll eventually hit the per-process file descriptor limit too.

> Is there a way to maintain a single database that can be queried on a
> site-specific basis and act like it's a site-specific -- e.g. the
> probability/results are weighted according to some site-specific tag? 

No.  The problem is that you can't calculate those statistics
efficiently from the information stored.  Precalculating them as
content is added might be possible, but is a big change.

Are the statistics from a combined database different enough to matter?

If so, I'd suggest building a merged database for the global search, but
keep the individual databases if you want the stats to be exact for each
subcollection.  If you're using flint, then xapian-compact has a
"--multipass" option which will cope with merging 1000s of databases.
I suspect quartzcompact won't cope, but you can always merge in several
goes by hand.

Cheers,
    Olly



More information about the Xapian-discuss mailing list