[Xapian-discuss] Suitability of Xapian for my application?
Eric Parusel
eparusel at creativens.com
Fri Oct 15 19:08:18 BST 2004
Olly Betts wrote:
> On Thu, Oct 14, 2004 at 10:24:37PM -0700, Eric Parusel wrote:
>
>>Xapian will reasonable be able to handle a corpus of let's say triple
>>that, 600K documents or more?
>
> Yes. The largest I've personally worked with is 18 million documents,
> but I think people have done bigger systems. Webtop peaked at around
> 500 million, but used the old muscat 3.6 backend rather than quartz so
> it's hard to compare directly. But I think quartz now comfortably
> surpasses muscat 3.6 (equivalent quartz databases are smaller so less
> I/O should be needed).
Ok, that's much larger... :) thanks.
>>How much RAM would Xapian take up while adding keywords, or searching
>>typically?
>
> It depends what you set the autoflush threshold to, but for 50000 I get
> around 250MB process size. You need more RAM than that as you want the
> OS to cache database blocks.
Is "autoflush" the frequency that it fsync's? Or is more of an internal
Xapian thing?
Apart from the initial import, can I set it so that it fsync's after
each document is inserted (since I process as they come in, rather than
in batches -- and I don't want the updates to be atomic *and* consistent
with the inserts to the PostgreSQL db)?
I have another question -- it concerns the ability to access/store
Xapian info over our network...
Since our "import" server and our db server are different boxes,
obviouly Xapian doesn't communicate over any network...
I suppose I could possibly create a pgsql function of some sort that
would call Xapian and insert the keywords, as a way of calling it from a
remote box?
Thanks,
Eric
More information about the Xapian-discuss
mailing list