[Xapian-discuss] omindex one file at a time?

Olly Betts olly at survex.com
Fri Dec 14 00:35:48 GMT 2012


On Thu, Dec 13, 2012 at 08:05:38AM -0500, Will Partain wrote:
> Hi, all -- I want to do Plain Old Omindex'ing *but* the mapping
> between my documents' filenames and the URLs where I hope search
> users to find them is, uh..., strange.  The simplest thing (to
> me) would be to run omindex for each document, e.g.
> 
>   omindex --no-delete -U /cool-url-1 /funky/doc/file-blah.pdf
>   omindex --no-delete -U /cool-url-7 /doc/funky/ohmy/blah-file.txt
>   ... and so on...
> 
> Of course, this doesn't work because the pathnames don't signify
> directories.  I'm guessing the same thing can be done with
> 'scriptindex' -- but I really want what just plain old omindex
> does.

Running omindex once for each document will be slow.  If you have a lot
of documents, you really want to batch updates for good indexing
performance.

> A horrible? way might be to copy each document into a temp
> directory and run omindex -- but I'm guessing the URLs would come
> out wrong (it would append the filename onto the end).

I'd just symlink them all into a temporary directory structure and use
-f so omindex will follow the symlinks - e.g.:

$ mkdir tmp
$ ln -s /home/olly/git/survex/doc/manual.pdf tmp/cool-url-1
$ ln -s /home/olly/tmp.txt tmp/cool-url-7
$ ./omindex --db cool-url.db -f tmp
$ delve cool-url.db -1a|grep U
U/cool-url-1
U/cool-url-7

This will work so long as your omindex was built with libmagic (which is
optional in 1.2.x, but a hard requirement on trunk) and libmagic can
detect the filetype from the contents of the file.

Cheers,
    Olly



More information about the Xapian-discuss mailing list