[Xapian-discuss] indexing and queryparsing: UTF-8 and PHP

Thomas Deniau thomas at deniau.org
Mon Feb 27 08:04:40 GMT 2006


Olly Betts <olly at survex.com> a écrit :


> It just has 10 hex values hardcoded (the same 10 which the old Gmane
> search had, which seem to be similar to what Google uses).  It wouldn't
> be hard to use CSS classes instead - instead of:
>
> <b style="color:black;background-color:#ffff66"> ... </b>
>
> Produce:
>
> <b class="omegahl1"> ... </b>
>
> If we use something like <b> ... </b> rather than <span> ... </span>
> then we get graceful degradation if there's no CSS support, or if the
> omegahl<n> classes aren't defined in the stylesheet.

I currently ask Omega to highlight just with <strong> and </strong> and then I
use a regex with a callback which stores the previously used colors to 
colorize
different words in different colors : I then get <strong
class="term0">...<strong class="term9"> which is more meaningful semantically
than <b class="..."> and still has the same graceful fallback.

Before devising this solution, I wanted to do this from OmegaScript, but I
haven't found any call that would return a list of the unstemmed forms of a
term in the sample of the document, not in the query (which would be useful if
you want to colorize all the words with the same stem in the same color).


PS. I have abandoned my delusions of using the Xapian bindings with 
Cygwin and I
transform from PHP the XML code generated by Omega as a CGI. The only 
issue left
is that the date fields are empty - Cygwin might store the modification 
dates in
a weird way. I'll look into it.

-- 
Thomas Deniau

Beyond the shadow of the ship, / I watched the water-snakes:
They moved in tracks of shining white / And when they reared, the elfish light
Fell off in hoary flakes. (Coleridge)



More information about the Xapian-discuss mailing list