[Xapian-discuss] Xapian as documentfilter?

Arjen van der Meijden acmmailing at tweakers.net
Mon Oct 31 09:49:56 GMT 2005


Hi list,

Currently I'm working on an application that will need both searches 
through a set of documents and alerts when a new document is added which 
matches some predefined set of "rules". That set of rules may be just a 
stored searchquery, but can be anything I need.
For both the searches and the alerts I'd like to have the same set of 
parameters, so it'd be nice to have the same engine handling everything.

Documents contain a few short indexable texts and a list of boolean 
terms. Those boolean terms are in most cases n:m i.e. documents can have 
multiple boolean terms with the same prefix.
Most searches will be conducted on those boolean terms, sometimes 
expanded with keyword searches (and rarely with explicit operators).

For the searches through the set of documents Xapian/Omega work very 
well. For the alerts on new document, I'm wondering how to do it.
The naive approach is of course to just store a list of searchqueries 
that users have asked to be alerted on.
But it will likely run in hundreds of such queries, maybe even a few 
thousand. Each added set of documents would than be "searched" by each 
stored query, and even though that can be done quite fast (prepend 
B=Q$newId1 B=Q$newId2 etc to the query) it may (will?) be too much 
overhead nonetheless.

Reversing the process might be quite nice, but how to do that? The 
queries should be stored as documents and the document should be "the 
query". But than you lose the boolean logic and phrase operators from 
the original query.

Any ideas?

Best regards,

Arjen



More information about the Xapian-discuss mailing list