[Xapian-discuss] omega and searching specific fields
James Aylett
james-xapian at tartarus.org
Mon Mar 7 10:43:01 GMT 2005
On Sun, Mar 06, 2005 at 09:12:36PM -0500, Sig Lange wrote:
> I'm still quite new to xapian and omega. Olly was a great help in
> putting together some docs on boolean search and I seem to still be
> having a bit of trouble. Talking about it is the best way to (perhaps
> by myself) solve my problem.
>
> I am indexing a music collection and want to be able to perform
> searches in certain fields. I'm a little confused on the difference
> between "boolean filters" and "Probabilistic Fields" .. I am using
> boolean filters but perhaps use the other? I would like to use the B=
> CGI parameter to search in omega.
Scriptindex isn't really my thing, but I'll try to deal with that as
well as what you're trying to do. I'm also not convinced that Omega
can manage perfectly this unaltered; I may well be wrong, and it's
possible that this approach (while it makes sense in terms of Xapian)
isn't the right solution when using Omega.
To summarise: you want to, for instance (given the sample data you
gave) search for "Circle" as "artist" and get back the "Sleeping
Beauty" document. This is /not/ a boolean search; you'd use boolean
search if you, say, wanted to search for "Sleeping" in document
titles, but wanted to restrict to a particular genre (say "pop").
What I think you'll need to do is to index terms in the 'artist' field
with a prefix, and then do a probabilistic search.
> So, I came up with a index file like this:
> -- begin music.index --
> id: index boolean=Q unique=Q
> artist: lower field=artist boolean=XART index
> path: field=url
> info: lower index
> -- end music.index --
I'm pretty sure you don't want or need boolean=Q and unique=Q on the
same line. unique=Q should be enough.
For artist, you want something like:
----------------------------------------------------------------------
artist: lower field=artist index=<PREFIX>
----------------------------------------------------------------------
You might want to do it against with a non-prefixed index, as well (so
it can be matched in a 'general' search).
Then you want to do a probabilistic search with a term constructed
from the artist, which will start with <PREFIX>.
There's going to be a wrinkle to do with stemming, which is why I
haven't specified <PREFIX> - someone else is going to have to chime in
here (or come up with a better method entirely). Omega will do nasty
things to your P input, trying to turn it into useful terms to search
over. You actually want this (stemming, for instance), but you may
need to choose the prefix very carefully to avoid weird effects in
stemming. For instance, if your prefix was 'da' and one of the words
in the artist was 'the' then gluing them together you get 'dathe',
which stems to 'dath'. ('the' won't normally be stemmed).
It's possible there's an easy solution to this by choosing the prefix
correctly; unfortunately I don't understand the query parser used by
Omega well enough to be able to advise here. It may be that there's a
different approach that is better, but hopefully at least this
describes boolean searching better for you.
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list