[Xapian-discuss] indexing and queryparsing: UTF-8 and PHP

tata 668 tata668 at gmail.com
Sat Feb 25 21:34:35 GMT 2006


I don't really know Swish-e but it seems more html and xml oriented. I NEVER 
index any html, xml or any kind of files. I only need to index information 
like "member description" that would require slow MySQL plain-text search 
without a dedicated library like Xapian.

I would definitivly like to see a function in Xapian that would take a text, 
splits the words and index them into the associated Document.

Document::index_text(textToIndex, encoding)

This function would use the same spliting algorithm than the queryparser and 
it would accept UTF-8 text...

That's my wish! ;-)







----- Original Message ----- 
From: "Peter Karman" <peter at peknet.com>
To: "tata 668" <tata668 at gmail.com>
Cc: <xapian-discuss at lists.xapian.org>
Sent: Saturday, February 25, 2006 2:18 PM
Subject: Re: [Xapian-discuss] indexing and queryparsing: UTF-8 and PHP


> These are good questions.
>
> tata 668 scribbled on 2/25/06 10:54 AM:
>
>> 1) Am I correct when I say that Xapian doesn't provide an indexer 
>> function?
>
> The Omega project provides a couple different indexers. That's a separate 
> project from the Xapian library, but they're available together, as are 
> the bindings for using other languages (like PHP).
>
> Your questions about how "words" are defined is one reason I prefer 
> Swish-e (http://swish-e.org) for smaller projects. Swish-e lets you define 
> which characters constitute a "word" and the indexer splits text strings 
> accordingly. Also, the indexer is "smart" about word context in HTML and 
> XML and lets you bias some words more than others (like titles or 
> headings, for example).
>
> Since this is the Xapian list and not the Swish-e list, I will say that 
> Xapian offers some key features Swish-e does not, which is why I am on 
> this list. :) I am currently working on the next version of Swish-e, which 
> will offer the Xapian library as a backend, thus combining the best of 
> both worlds: the ease and power of Swish-e's indexer with the scalability 
> and ranking features of Xapian.
>
>
> -- 
> Peter Karman  .  http://peknet.com/  .  peter at peknet.com 




More information about the Xapian-discuss mailing list