[Xapian-discuss] Re: Re: get_docid over multi-database search

Olly Betts olly at survex.com
Tue Dec 18 11:49:24 GMT 2007


On Fri, Dec 14, 2007 at 11:18:12AM -0800, Andrey wrote:
> from my own experience, breaking up into dbs will not cause a big 
> preformance lost, like from 1sec to 2 secs, it just works like querying 1 db 
> after cached up

I would be suprised if there was a large overhead - there's a bit of
extra work from opening the databases, and a small amount from having
a "MultiPostList".  The combined size of the split databases is usually
a little larger than the combined one, which may increase VM pressure a
bit.

If you do profile and find there's a significant difference, it would
be interesting to see comparable profiles for the two cases to see where
the extra time is spent.

> maybe you can try to duplicate another copy of your db and serach over them 
> together, its very easy with just 1 extra line 
> db=db.add_database(xapian.Database(''db"))

You'd also need to generate the equivalent combined database (e.g. by
using xapian-compact with the same input twice).

But just duplicating the data isn't an accurate recreation of searching
a real database split in two though.  I don't know if it actually would
make a difference, but it might.

Cheers,
    Olly



More information about the Xapian-discuss mailing list