[Xapian-discuss] omindex => Unknown extension

Eric Voisard eric.voisard at atisuher.ch
Mon Apr 6 15:04:51 BST 2009


Thanks for your help!

Olly, I doubt it's related to the mime types since my omindex
configuration is minimal and I didn't alter anything about the mime
types. It's all by default and .doc is included.
Proof is that it works with most of the .doc files that are in the
directory structure it has to index and fails with some of them only
(during the same omindex run).

It never seems to fail with html and plain text files (formats that are
build in omindex)...

I tried to narrow down the target directory structure to just one
subdirectory containing MS Word documents that failed previously and I
ran omindex again: this time it worked...


Here follows an abstract of a failed output and a working one. The file
is the same, at the same location (I had to do some renaming sorry).

Omindex as called in my script (output below):
omindex --db /srv/xapian/my_index \
	--follow \
	--url /dont_mind/ \
	/srv/xapian/targets/dir/sub1


[Entering directory /sub dir 2/sub dir 3/sub dir 4]
Unknown extension: "/srv/xapian/targets/dir/sub1/sub dir 2/sub dir 3/sub
dir 4/File Name.doc" - skipping


Omindex with narrowed directory structure and run from shell (output below):
# omindex --db /srv/xapian/test --follow --url /dont_mind/
"/srv/xapian/targets/dir/sub1/sub dir 2/"


[Entering directory /sub dir 3/sub dir 4]
Indexing "/sub dir 3/sub dir 4/File Name.doc" as application/msword ...
added.



It seems it has something to do with the size of the target directory
structure, or the length of the indexing job, and server resources, as
Cedric said...

I tried to run my indexing job while looking at top's output. Difficult
to say, but from 300MB free over 1GB available, free RAM gently dropped
during the process and stabilized to 10MB.

Thanks again, Eric


Olly Betts wrote:
> On Mon, Apr 06, 2009 at 01:42:47PM +0200, Cedric Jeanneret wrote:
>> having the same here. Solved by adding some ram in my server.
>> Maybe external calls can't be done properly, and omindex crashes when
>> launching programs such as antiwor, pdftotext and so on...
> 
> I find it hard to tell exactly what the problem you were seeing was.
> 
> If there's not enough memory to run an external filter, it will fail to
> run (and omindex sets resource limits to prevent an external filter
> making excessive resource demands).  So you will see some files fail to
> index if you don't have enough RAM.
> 
> But we should *NOT* be removing the mime mapping in this case (which is
> what Eric seems to be describing).  If we are, that's a bug.  I looked
> at the code for this and it appears to be correct, so I'll need some
> output from omindex if you think you are seeing this.
> 
> Cheers,
>     Olly
> 


ATIS Uher S.A. 
CH 2046 Fontaines
________________________________________________________________________________________________

This message is confidential. It may also be privileged or otherwise protected by work product immunity or other legal rules. If you have received this message by mistake please let us know by reply and then delete it from your system; you should not copy it or disclose its contents to anyone. All messages sent to and from ATIS Uher S.A. may be monitored to ensure compliance with internal policies and to protect our business. E-Mails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed. Anyone who communicates with us by e-mail is taken to accept these risks.



More information about the Xapian-discuss mailing list