[Xapian-tickets] [Xapian] #355: non-spacing chars are not term splitters
Xapian
nobody at xapian.org
Wed Apr 1 04:29:25 BST 2009
#355: non-spacing chars are not term splitters
-------------------------+--------------------------------------------------
Reporter: alsadi | Owner: olly
Type: defect | Status: assigned
Priority: normal | Milestone: 1.1.0
Component: QueryParser | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
-------------------------+--------------------------------------------------
Changes (by olly):
* status: new => assigned
* version: => SVN trunk
* component: Other => QueryParser
* milestone: => 1.1.0
Comment:
We should fix this for 1.1.0, as it's going to make indexes built with and
without it have incompatible terms, at least for those indexing/searching
for data with such characters in.
Issues:
* This patch is simple and works pretty well, but ideally a space
followed by a non-spacing
mark shouldn't count as a space. In reality, we can probably ignore
this for now - this
approach is a definite improvement over the current handling.
* We should really be putting decomposed and decomposable characters
into some canonical
form so that the representation doesn't matter for matching. But we've
already punted
on that for this release series.
* We can't really make this change for 1.0.x, but we could make non-
spacing marks phrase
generators in the !QueryParser so that <first part of
word><SHADDA><second part of word>
becomes a phrase search for "<first part of word> <second part of
word>". That will work
with existing databases, though it's not as good as the solution in this
patch - e.g. all
non-spacing marks will be equivalent.
--
Ticket URL: <http://trac.xapian.org/ticket/355#comment:1>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list