[Xapian-discuss] Xapian support for huge data sets?
Charlie Hull
charlie at juggler.net
Fri May 13 09:57:59 BST 2011
On 12/05/2011 19:18, Bill Hendrickson wrote:
> Hello,
>
> I’m currently using another open source search engine/indexer and am
> having performance issues, which brought me to learn about Xapian. We
> have approximately 350 million docs/10TB data that doubles every 3
> years. The data mostly consists of Oracle DB records, webpage-ish
> files (HTML/XML, etc.) and office-type docs (doc, pdf, etc.). There
> are anywhere from 2 to 4 dozen users on the system at any one time.
> The indexing server has upwards of 28GB memory, but even then, it gets
> extremely taxed, and will only get worse.
>
> In the opinion of this list, would Xapian be able to handle this kind
> of load, or should I evaluate more “enterprise”-like solutions (GSA,
> etc.)?
Xapian was originally written to power the Webtop web search engine,
which indexed around 500 million pages on a farm of around 30 servers,
back in 1999 or so. We've built 100m page indexes for clients. You
shouldn't have any trouble indexing your content given sufficient
hardware, arranged in the right way - a single server is probably not
enough though!
Cheers
Charlie
www.flax.co.uk
More information about the Xapian-discuss
mailing list