[Xapian-discuss] Suitability of Xapian for my application?

Eric Parusel eparusel at creativens.com
Fri Oct 15 04:43:33 BST 2004


Hello,

    I'm currently using PostgreSQL to store keywords for documents
in an indexed table, one row per keyword per document.
I'm also using a perl document importing script to extract
keywords from documents as they arrive and store (no positional data)
in pgsql...

The two components (import, and pgsql) are on different servers..

My problem is that certain databases have keywords tables that have
30 million rows or so..
A standard index on the varchar for 30 million rows = one very large
inefficient index.

Table columns:
keyword (varchar, avg length 8 chars), and idnum (int4).

I would want to feed Xapian just a list of keywords, no positional data 
at this time.

How efficient would Xapian be if I converted my keyword search over to it?

What's important to me, in no particular order:
1) Import speeds when the tables grow (avg # of keywords per document: 
150 approx)
2) Searching speed (I don't think this will be a problem from what I've 
heard)
3) keywords "database" size -- any rough estimates for what I'm working 
with?
4) Stability -- it won't corrupt, or crap out and die on me, will it? :)
5) Backups -- Is there a backup dump utility of some sort?
    Can I take backups of the live system?
    Can I use filesystem snapshots, then back up the xapian db file 
snapshot?

Anything else I should be concerned about?

As you can see, I have alot of questions since I'm quite new to Xapian...
Hopefully all my questions are not out of line :)

Thanks for any help you can offer,
Eric



More information about the Xapian-discuss mailing list