[Xapian-discuss] Unicode troubles

Olly Betts olly at survex.com
Mon May 29 11:30:30 BST 2006


On Fri, May 26, 2006 at 01:48:14PM +0200, ?rjan Persson wrote:
> I'm indexing the pages from a htdig database using htdig2omega. I've
> tried to parse the db.docs-file as generated by htdump or after it's
> been converted to utf-8 by iconv. I've also tried to replace the p_*
> functions in scriptindex.cc to U_ ones -- just like the first patch
> does -- but I'm unable to get it to work.

Those functions in scriptindex are only used to parse the field and
action names in the index script.  Look at indextext.cc instead.

But htdig (version 3) doesn't support unicode as far as I'm aware,
so I'm not sure what converting everything to utf-8 gains you.

Cheers,
    Olly



More information about the Xapian-discuss mailing list