[Xapian-discuss] Re: Japanese / UTF-8 support
Reini Urban
rurban at x-ray.at
Sun Aug 13 08:50:53 BST 2006
2006/8/13, Jeff Breidenbach <breidenbach at gmail.com>:
> This is looking promising. Running down my Omega checklist:
>
> * The patch is still too crude to submit, but I'v beaten htmlparse.cc
> into respecting <!--htdig_noindex--><!--/htdig_noindex-->
>
> * I've located the 300 character limit on sample size in omindex.cc,
> but am leaving that alone for the time being. Will keep in mind for
> improving summary results later. [1]
>
> * Getting filesize and last modification date in summary results is
> nice to have, but not critical. Putting on backburner.
>
> * I'm now building some flint indices for testing. This will probably
> take about a week to complete. When finished, this may provide
> some interesting benchmarks.
For first tests and benchmarks it's much better with smaller databases.
I run a make check in a small test subdir, with all major document types only,
for flint and quartz.
And the bench runs I do with 30.000 docs, which need ~20min on cygwin,
without last_mod check and with last_mod check and cached extracted
"virtual dirs" (zip, msg, ...) about 2 min.
cygwin and pdftotext and xls2csv are slow.
--
Reini Urban
Racing Simu and Support
AVL List GesmbH Graz
More information about the Xapian-discuss
mailing list