[Xapian-discuss] Suitability of Xapian for my application?
Eric Parusel
eparusel at creativens.com
Fri Oct 15 06:24:37 BST 2004
Thanks for the thorough reply, sounds very good.
Just a few points to reply on:
Olly Betts wrote:
> So if there's about 150 keywords per document and 30 million or so rows,
> then the corpus is of the order of 200K documents?
Correct, that's how a came up with the "150 keywords per" figure
actually. :)
Xapian will reasonable be able to handle a corpus of let's say triple
that, 600K documents or more?
I've seen the graphs at http://www.survex.com/~olly/gmaneindexrate.html
so judging by the curve I would assume that's a yes?
> It's hard to say how fast a system will be without a reference point.
> Indexing speed depends a lot on the hardware. CPU speed isn't too
> important. You want lots of RAM and fast disks.
>
> The gmane index has an average doc length of 186 terms. It takes about
> 15 minutes to index 200K documents from scratch. That's got 3G of RAM
> and SATA disks.
That sounds quite good to me...!
Both servers have 1GB of RAM, most of which the OS uses as buffer cache.
The DB server is using a 4 disk 10K SCSI Raid10 system, and the "import"
server is running a 5 disk SATA raid5, both have batt-backed write caches.
How much RAM would Xapian take up while adding keywords, or searching
typically?
>>3) keywords "database" size -- any rough estimates for what I'm working
>>with?
>
> I'd guess something like 500MB for 200K documents.
>
> There are plans in the pipeline to improve the packing and compression
> (which should improve both index and search speed too).
You don't even want to know how much disk space the keyword search is
currently taking, this would be a major improvement!
>> Can I use filesystem snapshots, then back up the xapian db file
>>snapshot?
>
> That's a good way to do it. Make sure that there's no updates happening
> and snapshot the filesystem. Then you can restart updates and back up
> from the snapshot to tape at your leisure.
Hmm, sounds viable... I can pause updates for a short period of time to
take the snapshot.
> Cheers,
> Olly
Thanks again,
Eric
More information about the Xapian-discuss
mailing list