[Xapian-discuss] Multiple databases vs Single large database

Jim jim at fayettedigital.com
Fri Nov 21 10:51:30 GMT 2008


Bradley wrote:
> Hi
> I've decided to use xapian because my files table in my mysql database is going 
> to grow very large, and it seems mysql isn't good at full text searching. I'm 
> doing this with the php wrapper by the way.
>
> The way my system is set out, each user has their own set of files, and when 
> doing a search it is going to be for a specific user's file (based on file 
> name, title, description). Although at some point we may decide we want 
> functionality to search for files for a list of users or all users.
>
> I was planning on having a xapian database for each user's files. Would it be 
> better this way (multiple databases), or to have on large database for all 
> users files, as I'm doing with mysql. I'm thinking mainly with regard to 
> performance, feel free to add other thoughts.
>
> Thanks
> Bradley
>
>
>   
If I were doing it, I'd do it your way.  Searching a single DB will most 
likely be faster.  Once you allow your users to search multiple DBs you 
can evaluate performance and see if merging them makes sense. 

Consider
1.  Are the searches fast enough (of multiple DBs)?
2.  How often are multiple DBs searched?

If you need to merge them, there is a utility, xapian-compact, 
(http://xapian.org/docs/admin_notes.html#merging-databases) that will do 
it for you with a minimum of effort.

You didn't ask, but here are a few things to consider.

1. Xapian searches will not be looking at realtime data.  It takes a 
finite amount of time to add new entries.  The larger the database, the 
longer it will take to index new entries.
1.1. Be sure to have something in the database that either says "This 
row has been added to Xapian" or have a field with a last changed 
timestamp.  Periodically add new entries to the Xapian DB by comparing 
times or select on the "is_added" field.   
2.  Consider ping ponging two Xapian DBs when updating.  I use the 
following logic.
I have two directories with Xapian DBs.  A  and B.
If A is older than B
  copy contents of B into A
else
  copy contents of A into B
add new entries to the copy
if the copy is A
    rm C
    ln -s A C
if the copy is B
    rm C
    ln -s B C

where C is the database that I am using to search.

Jim.



More information about the Xapian-discuss mailing list