[Xapian-discuss] Boolean terms,
Olly Betts
olly at survex.com
Wed Mar 9 01:21:13 GMT 2005
On Tue, Mar 08, 2005 at 03:29:45PM -0500, Sig Lange wrote:
> I decided to write my own indexer and search. I initially started
> learning about xapian by using omega and basing ideas off it's
> documentation (lack there of).
If you see gaps in any of the documentation, please point them out.
It's hard for me to know what needs improving, since I know how it all
works!
> scriptindex and friends in omega seem to make things more complex than
> they really are. But enough babble.
Omega, scriptindex, etc are built to be fairly general purpose tools for
implementing a search system with a web-frontend. Their code doesn't
really make good Xapian tutorials.
If you want simple working examples, look at the programs in
xapian-examples. Some of these are a bit "toy", but others are useful
real-world utilities. But an important aspect is that they should
demonstrate how to use aspects of the API in real code.
> One feature i'd like is to generate like terms, or search like terms..
> for instance if one types in "Mike More" and in fact his name is
> indexed as "mike mour". Is there currently this sort of soundex
> feature available?
Given a soundex algorithm, you could index the soundex of each word you
want to be able to search for the soundex of with a prefix (e.g.
XSOUNDEX), then generate XSOUNDEX terms for search terms in the same way
and search for these.
A few years ago, I wondered about adding "soundex" as a stemming
language. The way it conflates terms is not so different to how
a stemming algorithm does.
But I found that there seemed to be a number of variants on the soundex
algorithm, and I wasn't sure which to use so I didn't pursue the idea.
But looking again, the obvious choice is soundex as Knuth describes in
"The Art of Computer Programming" vol 3.
> Now onto boolean terms, I would like to generate a bolean genre for
> each record. What woudl some code be to generate boolean terms?
Add them with Xapian::Document::add_term():
Xapian::Document doc;
doc.add_term(bool_term);
// And add any other terms, set the document data, etc...
db.add_document(doc);
> Is is just a user defined prefix?
Yes, just pick a prefix for genre as you would for a probabilistic
field.
Note there's not a definite distinction between "probabilistic term"
and "boolean term" at the API level. Typically probabilistic
terms will have positional information (and so will be added with
add_posting) while boolean terms won't (and so will be added with
add_term). But if you don't need phrase searching, you can add
probabilistic terms with add_term, and you can use OP_FILTER on
a term indexed with add_posting.
> How would I let the QueryParser() know. So
> lets say i'm looking for "some song" in genre "punk",
>
> I'd have a custom prefix of XGENRE, so my term would be XGENREpunk,
> but how do I let the query parser know to look for "some song" in the
> punk genre.
The user query string can't specify boolean filters to QueryParser in
the released version.
With the new QueryParser implementation in CVS (which will be released
as 0.9.0 fairly soon), you tell the QueryParser that genre is a boolean
prefix:
Xapian::QueryParser qp;
qp.add_boolean_prefix("genre", "XGENRE");
Xapian::Query q = qp.parse_query("some song genre:punk");
> Then if I wanted to search multiple genres, What would I do?
At the moment, with the CVS code you'll get multiple boolean filters
combined with AND. That usually makes sense if they're on different
categories, but almost certainly doesn't if they're on the same
category. This ought to be addressed, but I don't know if it will be
before 0.9.0 - the new QueryParser is a big improvement, so it would
be bad to delay releasing it too much longer.
Cheers,
Olly
More information about the Xapian-discuss
mailing list