[Xapian-discuss] Suitability of Xapian for my application?

Eric Parusel eparusel at creativens.com
Fri Oct 15 19:08:18 BST 2004


Olly Betts wrote:
> On Thu, Oct 14, 2004 at 10:24:37PM -0700, Eric Parusel wrote:
> 
>>Xapian will reasonable be able to handle a corpus of let's say triple 
>>that, 600K documents or more?
> 
> Yes.  The largest I've personally worked with is 18 million documents,
> but I think people have done bigger systems.  Webtop peaked at around
> 500 million, but used the old muscat 3.6 backend rather than quartz so
> it's hard to compare directly.  But I think quartz now comfortably
> surpasses muscat 3.6 (equivalent quartz databases are smaller so less
> I/O should be needed).

Ok, that's much larger... :)  thanks.


>>How much RAM would Xapian take up while adding keywords, or searching 
>>typically?
> 
> It depends what you set the autoflush threshold to, but for 50000 I get
> around 250MB process size.  You need more RAM than that as you want the
> OS to cache database blocks.

Is "autoflush" the frequency that it fsync's?  Or is more of an internal
Xapian thing?
Apart from the initial import, can I set it so that it fsync's after 
each document is inserted (since I process as they come in, rather than 
in batches -- and I don't want the updates to be atomic *and* consistent
with the inserts to the PostgreSQL db)?


I have another question -- it concerns the ability to access/store 
Xapian info over our network...
Since our "import" server and our db server are different boxes,
obviouly Xapian doesn't communicate over any network...
I suppose I could possibly create a pgsql function of some sort that 
would call Xapian and insert the keywords, as a way of calling it from a 
remote box?

Thanks,
Eric





More information about the Xapian-discuss mailing list