[Xapian-discuss] Design-question/problem

Carsten Reimer carsten.reimer at galileo-press.de
Fri Aug 28 12:59:21 BST 2009


Dear list,

we are using the Xapian-python bindings to build some fulltext search 
engine for some 400+ books each about 300+ pages.

We have the need to be able to limit the search on one book or on 
several selected books as well as to be able to search all of them.

To be able to do so we decided to create one Xapian-database for each 
book and build the databases we need to search for the different use 
cases described above dynamically.

The flow is as follows.

We provide the paths to the per book xapaian-databases to a function 
that is bulding our search database. From this we build the 
query_parser-objects which in turn we use to create the query-objects 
from which the Enquiry-objects are finally build.

Having 400+ single xapian-databases we found that when searching all of 
the books it took a lot of time (and file descriptors) to open all the 
database files for each book while building the search database (and 
appearently at least with the python-bindings the file descriptors are 
kept open during the livetime of the database-object).

This behaviour increases response times of our application (a 
django-based web app) dramatically. So we decided to keep that large 
search database for all books in memory by creating it when django is 
started. This works well but unfortunately influences scalability in a 
very negative way. Stress testing the app using Apache bench with really 
low numbers of requests and concurrency (100/4) leads to  erroneous 
responses due to the fact that not enough backend processes/threads 
(django behind lighttpd) could be provided in time due to the in memory 
search database (disabling this in memory search database 5000/200 was 
no problem).

So we are a little unsure if we used Xapian the wrong way or Xapian may 
not be suitable for our needs.

Any ideas, hints, whatever are warmly welcome.

Thanks in advance

with best regards

Carsten Reimer


-- 
Carsten Reimer
Web Developer
carsten.reimer at galileo-press.de
Phone +49.228.42150.73

Galileo Press GmbH
Rheinwerkallee 4 - 53227 Bonn - Germany
Phone +49.228.42150.0 (Zentrale) .77 (Fax)
http://www.galileo-press.de/

Managing Directors: Tomas Wehren, Ralf Kaulisch, Rainer Kaltenecker
HRB 8363 Amtsgericht Bonn




More information about the Xapian-discuss mailing list