[Xapian-discuss] Multiple databases vs Single large database
Felix Antonius Wilhelm Ostmann
ostmann at websuche.de
Mon Nov 24 05:23:51 GMT 2008
One question about this:
mv C.tmp C moves C.tmp into directory B :-/ (C is symlink to B)
perhaps i misunderstood :-)
Olly Betts schrieb:
> On Fri, Nov 21, 2008 at 05:51:30AM -0500, Jim wrote:
>
>> Consider
>> 1. Are the searches fast enough (of multiple DBs)?
>>
>
> There's not been much profiling of searching multiple databases, and I
> don't have figures for how performance compares. The split databases
> will generally be bigger in total size than a merged database would be
> so you'll need a bit more disk space, but then you need to rebuild while
> searching the old database, won't need as much scratch space if you can
> rebuild one database at a time.
>
> There's clearly an overhead for opening each one - we try to minimise
> this, so it's not much for each one, but if we're talking hundreds or
> thousands of databases, it might start to add up. In some applications
> you can keep them open between searches, but that's not always viable.
>
> If you find slow cases, profiling them often reveals a bottleneck that
> can be addressed fairly easily. There are some tips on profiling on
> the wiki.
>
>
>> 2. How often are multiple DBs searched?
>>
>
> Also, the term statistics will be different for a search over a single
> user's database and a search over a combined database filtered to show
> a single user's data. If users are very different (e.g. different
> languages) that might lead to worse results from a merged database.
> If they're broadly similar, the averaging of statistics might actually
> lead to better results from a merged database.
>
>
>> 2. Consider ping ponging two Xapian DBs when updating. I use the
>> following logic.
>> I have two directories with Xapian DBs. A and B.
>> If A is older than B
>> copy contents of B into A
>> else
>> copy contents of A into B
>> add new entries to the copy
>> if the copy is A
>> rm C
>> ln -s A C
>> if the copy is B
>> rm C
>> ln -s B C
>>
>> where C is the database that I am using to search.
>>
>
> This leaves a time interval where there's no valid database at C though,
> which is problematic if search process are could be trying to open the
> database while you're switch the new database live.
>
> A better approach is to use a stub database file for C. You can write a
> new file as "C.tmp" and then atomically switch with "mv C.tmp C" (at
> least on POSIX platforms).
>
> Cheers,
> Olly
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
>
>
--
Mit freundlichen Grüßen
Felix Antonius Wilhelm Ostmann
--------------------------------------------------
Websuche Search Technology GmbH & Co. KG
Martinistraße 3 - D-49080 Osnabrück - Germany
Tel.: +49 541 40666-0 - Fax: +49 541 40666-22
Email: info at websuche.de - Website: www.websuche.de
--------------------------------------------------
AG Osnabrück - HRA 200252 - Ust-Ident: DE814737310
Komplementärin: Websuche Search Technology
Verwaltungs GmbH - AG Osnabrück - HRB 200359
Geschäftsführer: Diplom Kaufmann Martin Steinkamp
--------------------------------------------------
More information about the Xapian-discuss
mailing list