[Xapian-discuss] indexing for phrase searching and constructing
queries
Richard Jolly
richardjolly at mac.com
Sun Jan 21 11:36:44 GMT 2007
Hi,
I'm new to xapian, and to search engines in general. I'm using the perl
bindings and 0.9.9. In general it works excellently, but I've got some
questions.
1. phrase searching
I'm having no luck getting phrase searching to work. I expect it's
because I've not indexed the content correctly. The content is xml. I'm
basically taking the text content of that, splitting it into words,
lower casing, stemming and stripping of punctuation. The term position
passed to add_posting is just incremented, but I'm keeping the same
position for both the stemmed and the unstemmed words.
# made up
add_posting( 'office', 3 )
add_posting( 'offic', 3 )
My hand-wavy understanding of phrase searching is that it's looking for
consecutive matching terms, which is why I've done the stemmed and
unstemmed words at the same position. But when I do a query, I get no
results. The debug on the query look sane to me:
Xapian::Query((impose:(pos=1) PHRASE 3 time:(pos=2) PHRASE 3
limits:(pos=3)))
How can I tell why this isn't matching? Can I find those three posts in
the index and compare the positions?
Secondly, a user entered search with an apostrophe ends up as a phrase
search - not right at all:
Xapian::Query(((mike:(pos=1) PHRASE 2 s:(pos=2)) OR tail:(pos=3)))
2. user interfaces
My next question is about the practicalities of user facing search
interfaces. I've got a web form with a big text input, and also a
couple additional controls that correspond to indexed terms. I've then
got code that combines the term controls with the text input into
something like:
( name:foo AND name:bar ) AND text from text box
And I hand this off to QueryParser. But punctuation seems to mess it
up. Should I be stripping out punctuation and stop words? Is it a bad
approach all together?
Thanks,
Richard
More information about the Xapian-discuss
mailing list