[Xapian-discuss] omindex => Unknown extension

Eric Voisard eric.voisard at atisuher.ch
Mon Apr 6 11:45:40 BST 2009


Hi all,

I'm having a recurrent problem with Omega's indexing.
When I run omindex, it sometimes misses to recognize the extension of
some files (.doc, .pdf) and skips them. In the same run, omindex is
otherwise perfectly able to index other files with same extensions. The
reason is not clear but it should occur before it selects a content
converter since for example, if I manually run antiword on a .doc file
that failed, it works...

Running omindex:
Unknown extension: "/srv/xapian/targets/dir/subdir/file name.doc" - skipping

Manual conversion:
host:/srv # antiword "/srv/xapian/targets/dir/subdir/file name.doc"
<..plain text content of the file...>
host:/srv #

Note that the target directory is a CIFS mount of a remote Windows
shared directory. Charset is UTF-8.
I don't think it has to do with the whitespace in the file name since
other .doc filenames with whitespaces work.

Any idea?...

Thanks in advance, Eric
ATIS Uher S.A. 
CH 2046 Fontaines
________________________________________________________________________________________________

This message is confidential. It may also be privileged or otherwise protected by work product immunity or other legal rules. If you have received this message by mistake please let us know by reply and then delete it from your system; you should not copy it or disclose its contents to anyone. All messages sent to and from ATIS Uher S.A. may be monitored to ensure compliance with internal policies and to protect our business. E-Mails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed. Anyone who communicates with us by e-mail is taken to accept these risks.



More information about the Xapian-discuss mailing list