[Xapian-discuss] indexing for phrase searching and constructing queries

Richard Jolly richardjolly at mac.com
Sun Jan 21 11:36:44 GMT 2007


Hi,

I'm new to xapian, and to search engines in general. I'm using the perl 
bindings and 0.9.9. In general it works excellently, but I've got some 
questions.


1. phrase searching

I'm having no luck getting phrase searching to work. I expect it's 
because I've not indexed the content correctly. The content is xml. I'm 
basically taking the text content of that, splitting it into words, 
lower casing, stemming and stripping of punctuation. The term position 
passed to add_posting is just incremented, but I'm keeping the same 
position for both the stemmed and the unstemmed words.

   # made up
   add_posting( 'office', 3 )
   add_posting( 'offic', 3  )

My hand-wavy understanding of phrase searching is that it's looking for 
consecutive matching terms, which is why I've done the stemmed and 
unstemmed words at the same position. But when I do a query, I get no 
results. The debug on the query look sane to me:

   Xapian::Query((impose:(pos=1) PHRASE 3 time:(pos=2) PHRASE 3 
limits:(pos=3)))

How can I tell why this isn't matching? Can I find those three posts in 
the index and compare the positions?

Secondly, a user entered search with an apostrophe ends up as a phrase 
search - not right at all:

   Xapian::Query(((mike:(pos=1) PHRASE 2 s:(pos=2)) OR tail:(pos=3)))


2. user interfaces
My next question is about the practicalities of user facing search 
interfaces. I've got a web form with a big text input, and also a 
couple additional controls that correspond to indexed terms. I've then 
got code that combines the term controls with the text input into 
something like:

   ( name:foo AND name:bar ) AND text from text box

And I hand this off to QueryParser. But punctuation seems to mess it 
up. Should I be stripping out punctuation and stop words? Is it a bad 
approach all together?


Thanks,

Richard




More information about the Xapian-discuss mailing list