[Xapian-tickets] [Xapian] #113: QueryParser limitation/inconsistency

Xapian nobody at xapian.org
Fri Feb 20 14:20:31 GMT 2009


#113: QueryParser limitation/inconsistency
-------------------------------+--------------------------------------------
 Reporter:  federico.schwindt  |        Owner:  olly     
     Type:  enhancement        |       Status:  assigned 
 Priority:  normal             |    Milestone:  1.1.1    
Component:  QueryParser        |      Version:  SVN trunk
 Severity:  minor              |   Resolution:           
 Keywords:                     |    Blockedby:           
 Platform:  All                |     Blocking:           
-------------------------------+--------------------------------------------
Changes (by olly):

  * milestone:  1.1.0 => 1.1.1


Old description:

> Hi,
>
>   I've been using xapian (0.9.9 and now 0.9.10) recently at work and I've
> found
> that the exquisite QueryParser (no irony intended) imposes some serious
> limitations for certain queries, as it does treat some characters
> specially,
> even when flags does not contain FLAG_PHRASE.
>   I'm talking about the method is_phrase_generator(). In the organization
> I work
> for we have a mixed setup of html documents and code. This includes
> several
> references to text in the word_word format. Unfortunately the QueryParser
> treats
> underscore as phrase generator, making impossible to search for terms
> indexed
> using whitespace separators, even when allterms() shows the term exists
> on the
> database.
>   I believe this is an inconsistency and also a limitation in the
> QueryParser,
> as it does not matter what flags are used, in such cases where the query
> string
> contains any of the characters defined in is_phrase_generator(), the
> query will
> be automatically converted to a phrase search (note that these characters
> can't
> be changed).
>   In an ideal world (mine at least), I'd expect the user to define a
> phrase
> (using " or any other previously defined character) and if this is not
> the case
> the QueryParser should not try to convert the query to anything else
> (except for
> the defined operations, OR, AND, etc).
>   ITOH, I could change the indexing to strip the underscores (and the
> other
> characters) and treat every part of the word_word as a separate term, but
> that
> would also mean that "word word" would match as well, when it's not what
> you wanted.
>   I hope you have this into consideration. Feel free to contact me if you
> need
> further details or I can clarify anything else.
>   Many thanks,
>
>   f.-

New description:

 Hi,

 I've been using xapian (0.9.9 and now 0.9.10) recently at work and I've
 found
 that the exquisite !QueryParser (no irony intended) imposes some serious
 limitations for certain queries, as it does treat some characters
 specially,
 even when flags does not contain FLAG_PHRASE.

 I'm talking about the method is_phrase_generator(). In the organization I
 work
 for we have a mixed setup of html documents and code. This includes
 several
 references to text in the word_word format. Unfortunately the !QueryParser
 treats
 underscore as phrase generator, making impossible to search for terms
 indexed
 using whitespace separators, even when allterms() shows the term exists on
 the
 database.

 I believe this is an inconsistency and also a limitation in the
 !QueryParser,
 as it does not matter what flags are used, in such cases where the query
 string
 contains any of the characters defined in is_phrase_generator(), the query
 will
 be automatically converted to a phrase search (note that these characters
 can't
 be changed).

 In an ideal world (mine at least), I'd expect the user to define a phrase
 (using " or any other previously defined character) and if this is not the
 case
 the !QueryParser should not try to convert the query to anything else
 (except for
 the defined operations, OR, AND, etc).

 ITOH, I could change the indexing to strip the underscores (and the other
 characters) and treat every part of the word_word as a separate term, but
 that
 would also mean that "word word" would match as well, when it's not what
 you wanted.

 I hope you have this into consideration. Feel free to contact me if you
 need
 further details or I can clarify anything else.

 Many thanks,

   f.-

--

Comment:

 Bumping to milestone:1.1.1

 (and fix description wiki formatting)

-- 
Ticket URL: <http://trac.xapian.org/ticket/113#comment:10>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list