[Xapian-discuss] Suitability of Xapian for my application?

Eric Parusel eparusel at creativens.com
Fri Oct 15 06:24:37 BST 2004


Thanks for the thorough reply, sounds very good.
Just a few points to reply on:

Olly Betts wrote:
> So if there's about 150 keywords per document and 30 million or so rows,
> then the corpus is of the order of 200K documents?

Correct, that's how a came up with the "150 keywords per" figure 
actually. :)
Xapian will reasonable be able to handle a corpus of let's say triple 
that, 600K documents or more?
I've seen the graphs at http://www.survex.com/~olly/gmaneindexrate.html
so judging by the curve I would assume that's a yes?


> It's hard to say how fast a system will be without a reference point.
> Indexing speed depends a lot on the hardware.  CPU speed isn't too
> important.  You want lots of RAM and fast disks.
> 
> The gmane index has an average doc length of 186 terms.  It takes about
> 15 minutes to index 200K documents from scratch.  That's got 3G of RAM
> and SATA disks.

That sounds quite good to me...!
Both servers have 1GB of RAM, most of which the OS uses as buffer cache.
The DB server is using a 4 disk 10K SCSI Raid10 system, and the "import" 
server is running a 5 disk SATA raid5, both have batt-backed write caches.

How much RAM would Xapian take up while adding keywords, or searching 
typically?


>>3) keywords "database" size -- any rough estimates for what I'm working 
>>with?
> 
> I'd guess something like 500MB for 200K documents.
> 
> There are plans in the pipeline to improve the packing and compression
> (which should improve both index and search speed too).

You don't even want to know how much disk space the keyword search is 
currently taking, this would be a major improvement!



>>   Can I use filesystem snapshots, then back up the xapian db file 
>>snapshot?
> 
> That's a good way to do it.  Make sure that there's no updates happening
> and snapshot the filesystem.  Then you can restart updates and back up
> from the snapshot to tape at your leisure.

Hmm, sounds viable...  I can pause updates for a short period of time to 
take the snapshot.


> Cheers,
>     Olly

Thanks again,
Eric



More information about the Xapian-discuss mailing list