[Xapian-discuss] Xapian with djvu files?

Olly Betts olly at survex.com
Thu Jan 17 03:22:12 GMT 2008


On Tue, Jan 15, 2008 at 10:39:49AM +1100, John Pye wrote:
> Olly Betts wrote:
> > I did actually write a patch a while back for djvu.  I think I didn't
> > apply it because I only actually found a single example file with a text
> > layer, and that only had 20 words of ASCII text.  I like to have a
> > few decent test files (including some with non-ASCII characters) to give
> > me some confidence that a filter program actually works well.  It
> > doesn't seem to be a popular format (John is the first person to ask
> > about support for it), so I just left the matter.
> 
> You can use a free online OCR tool to generate DJVU files that include
> text in them:
> 
> http://any2djvu.djvuzone.org/

But I don't have scans of documents containing non-ASCII characters, so
this isn't going to help much.

I'm happy to add support for djvu (or any other format with a suitable
filter program), but I feel uneasy about doing so when I have little or
no sample data to test with.

If you want support for a format, you presumably have documents in that
format, so feel free to supply a few for testing.  Ideally
redistributable ones (it would be great to be able to put together some
automated tests of omindex), but it's not a big problem if they can't be
made public.

Cheers,
    Olly



More information about the Xapian-discuss mailing list