[Xapian-tickets] [Xapian] #355: non-spacing chars are not term splitters

Xapian nobody at xapian.org
Wed Apr 1 04:29:25 BST 2009


#355: non-spacing chars are not term splitters
-------------------------+--------------------------------------------------
 Reporter:  alsadi       |        Owner:  olly     
     Type:  defect       |       Status:  assigned 
 Priority:  normal       |    Milestone:  1.1.0    
Component:  QueryParser  |      Version:  SVN trunk
 Severity:  normal       |   Resolution:           
 Keywords:               |    Blockedby:           
 Platform:  All          |     Blocking:           
-------------------------+--------------------------------------------------
Changes (by olly):

  * status:  new => assigned
  * version:  => SVN trunk
  * component:  Other => QueryParser
  * milestone:  => 1.1.0


Comment:

 We should fix this for 1.1.0, as it's going to make indexes built with and
 without it have incompatible terms, at least for those indexing/searching
 for data with such characters in.

 Issues:

   * This patch is simple and works pretty well, but ideally a space
 followed by a non-spacing
   mark shouldn't count as a space.  In reality, we can probably ignore
 this for now - this
   approach is a definite improvement over the current handling.

   * We should really be putting decomposed and decomposable characters
 into some canonical
   form so that the representation doesn't matter for matching.  But we've
 already punted
   on that for this release series.

   * We can't really make this change for 1.0.x, but we could make non-
 spacing marks phrase
   generators in the !QueryParser so that <first part of
 word><SHADDA><second part of word>
   becomes a phrase search for "<first part of word> <second part of
 word>".  That will work
   with existing databases, though it's not as good as the solution in this
 patch - e.g. all
   non-spacing marks will be equivalent.

-- 
Ticket URL: <http://trac.xapian.org/ticket/355#comment:1>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list