[Xapian-discuss] Design-question/problem
Carsten Reimer
carsten.reimer at galileo-press.de
Fri Aug 28 12:59:21 BST 2009
Dear list,
we are using the Xapian-python bindings to build some fulltext search
engine for some 400+ books each about 300+ pages.
We have the need to be able to limit the search on one book or on
several selected books as well as to be able to search all of them.
To be able to do so we decided to create one Xapian-database for each
book and build the databases we need to search for the different use
cases described above dynamically.
The flow is as follows.
We provide the paths to the per book xapaian-databases to a function
that is bulding our search database. From this we build the
query_parser-objects which in turn we use to create the query-objects
from which the Enquiry-objects are finally build.
Having 400+ single xapian-databases we found that when searching all of
the books it took a lot of time (and file descriptors) to open all the
database files for each book while building the search database (and
appearently at least with the python-bindings the file descriptors are
kept open during the livetime of the database-object).
This behaviour increases response times of our application (a
django-based web app) dramatically. So we decided to keep that large
search database for all books in memory by creating it when django is
started. This works well but unfortunately influences scalability in a
very negative way. Stress testing the app using Apache bench with really
low numbers of requests and concurrency (100/4) leads to erroneous
responses due to the fact that not enough backend processes/threads
(django behind lighttpd) could be provided in time due to the in memory
search database (disabling this in memory search database 5000/200 was
no problem).
So we are a little unsure if we used Xapian the wrong way or Xapian may
not be suitable for our needs.
Any ideas, hints, whatever are warmly welcome.
Thanks in advance
with best regards
Carsten Reimer
--
Carsten Reimer
Web Developer
carsten.reimer at galileo-press.de
Phone +49.228.42150.73
Galileo Press GmbH
Rheinwerkallee 4 - 53227 Bonn - Germany
Phone +49.228.42150.0 (Zentrale) .77 (Fax)
http://www.galileo-press.de/
Managing Directors: Tomas Wehren, Ralf Kaulisch, Rainer Kaltenecker
HRB 8363 Amtsgericht Bonn
More information about the Xapian-discuss
mailing list