[Xapian-discuss] Multiple databases vs Single large database

Felix Antonius Wilhelm Ostmann ostmann at websuche.de
Mon Nov 24 05:23:51 GMT 2008


One question about this:

mv C.tmp C moves C.tmp into directory B :-/ (C is symlink to B)

perhaps i misunderstood :-)

Olly Betts schrieb:
> On Fri, Nov 21, 2008 at 05:51:30AM -0500, Jim wrote:
>   
>> Consider
>> 1.  Are the searches fast enough (of multiple DBs)?
>>     
>
> There's not been much profiling of searching multiple databases, and I
> don't have figures for how performance compares.  The split databases
> will generally be bigger in total size than a merged database would be
> so you'll need a bit more disk space, but then you need to rebuild while
> searching the old database, won't need as much scratch space if you can
> rebuild one database at a time.
>
> There's clearly an overhead for opening each one - we try to minimise
> this, so it's not much for each one, but if we're talking hundreds or
> thousands of databases, it might start to add up.  In some applications
> you can keep them open between searches, but that's not always viable.
>
> If you find slow cases, profiling them often reveals a bottleneck that
> can be addressed fairly easily.  There are some tips on profiling on
> the wiki.
>
>   
>> 2.  How often are multiple DBs searched?
>>     
>
> Also, the term statistics will be different for a search over a single
> user's database and a search over a combined database filtered to show
> a single user's data.  If users are very different (e.g. different
> languages) that might lead to worse results from a merged database.
> If they're broadly similar, the averaging of statistics might actually
> lead to better results from a merged database.
>
>   
>> 2.  Consider ping ponging two Xapian DBs when updating.  I use the 
>> following logic.
>> I have two directories with Xapian DBs.  A  and B.
>> If A is older than B
>>   copy contents of B into A
>> else
>>   copy contents of A into B
>> add new entries to the copy
>> if the copy is A
>>     rm C
>>     ln -s A C
>> if the copy is B
>>     rm C
>>     ln -s B C
>>
>> where C is the database that I am using to search.
>>     
>
> This leaves a time interval where there's no valid database at C though,
> which is problematic if search process are could be trying to open the
> database while you're switch the new database live.
>
> A better approach is to use a stub database file for C.  You can write a
> new file as "C.tmp" and then atomically switch with "mv C.tmp C" (at
> least on POSIX platforms).
>
> Cheers,
>     Olly
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
>
>   


-- 
Mit freundlichen Grüßen

Felix Antonius Wilhelm Ostmann
--------------------------------------------------
Websuche   Search   Technology   GmbH   &   Co. KG
Martinistraße 3  -  D-49080  Osnabrück  -  Germany
Tel.:   +49 541 40666-0 - Fax:    +49 541 40666-22
Email: info at websuche.de - Website: www.websuche.de
--------------------------------------------------
AG Osnabrück - HRA 200252 - Ust-Ident: DE814737310
Komplementärin:     Websuche   Search   Technology
Verwaltungs GmbH   -  AG Osnabrück  -   HRB 200359
Geschäftsführer:  Diplom Kaufmann Martin Steinkamp
--------------------------------------------------




More information about the Xapian-discuss mailing list