[Xapian-discuss] Separting results from multiple databases

Olly Betts olly at survex.com
Mon Jun 13 17:55:48 BST 2005


On Sun, Jun 12, 2005 at 10:55:27PM -0400, Marco Tabini wrote:
> I'm playing around with Xapian and I'm wondering whether it's possible to
> retrieve the estimated number of documents returned by each database that is
> part of a query.

No.  Currently statistics for each term are merged, then the estimates
calculated.

This is likely to change though.  I'm planning to change to storing the
first and last document id which each term indexes and use the query's
structure to apply intersections, unions, etc to these ranges.  This
should improve the estimate statistics, but it is probably best done per
database, and then summed.

It would be pretty easy to make per-database statistics available then.

> 1. Is this possible without running the query again against either db?

No, although this probably won't be very expensive to do as most of the
database blocks you'll need will be cached from the first query.
Generally it's the I/O which takes the time (unless the database is
small, in which case it's quick anyway!)

> 2. As a side question, is there a significant performance hit in combining
> multiple databases as opposed to using a single db?

Shouldn't be much.  The main hit will be that separate databases will
usually be smaller, so need less I/O.

Cheers,
    Olly



More information about the Xapian-discuss mailing list