[Xapian-discuss] Re: Japanese / UTF-8 support

Reini Urban rurban at x-ray.at
Sun Aug 13 08:50:53 BST 2006


2006/8/13, Jeff Breidenbach <breidenbach at gmail.com>:
> This is looking promising. Running down my Omega checklist:
>
>   * The patch is still too crude to submit, but I'v beaten htmlparse.cc
>    into respecting <!--htdig_noindex--><!--/htdig_noindex-->
>
>   * I've located the 300 character limit on sample size in omindex.cc,
>     but am leaving that alone for the time being. Will keep in mind for
>     improving summary results later. [1]
>
>  * Getting filesize and last modification date in summary results is
>     nice to have, but not critical. Putting on backburner.
>
>   * I'm now building some flint indices for testing. This will probably
>      take about a week to complete. When finished, this may provide
>      some interesting benchmarks.

For first tests and benchmarks it's much better with smaller databases.
I run a make check in a small test subdir, with all major document types only,
for flint and quartz.

And the bench runs I do with 30.000 docs, which need ~20min on cygwin,
without last_mod check and with last_mod check and cached extracted
"virtual dirs"  (zip, msg, ...) about 2 min.
cygwin and pdftotext and xls2csv are slow.
-- 
Reini Urban
Racing Simu and Support
AVL List GesmbH Graz



More information about the Xapian-discuss mailing list