[Xapian-discuss] indexing and queryparsing: UTF-8 and PHP

Peter Karman peter at peknet.com
Sat Feb 25 19:18:26 GMT 2006


These are good questions.

tata 668 scribbled on 2/25/06 10:54 AM:

> 1) Am I correct when I say that Xapian doesn't provide an indexer 
> function?

The Omega project provides a couple different indexers. That's a separate 
project from the Xapian library, but they're available together, as are the 
bindings for using other languages (like PHP).

Your questions about how "words" are defined is one reason I prefer Swish-e 
(http://swish-e.org) for smaller projects. Swish-e lets you define which 
characters constitute a "word" and the indexer splits text strings accordingly. 
Also, the indexer is "smart" about word context in HTML and XML and lets you 
bias some words more than others (like titles or headings, for example).

Since this is the Xapian list and not the Swish-e list, I will say that Xapian 
offers some key features Swish-e does not, which is why I am on this list. :) I 
am currently working on the next version of Swish-e, which will offer the Xapian 
library as a backend, thus combining the best of both worlds: the ease and power 
of Swish-e's indexer with the scalability and ranking features of Xapian.


-- 
Peter Karman  .  http://peknet.com/  .  peter at peknet.com



More information about the Xapian-discuss mailing list