[Xapian-devel] Omega changes

Olly Betts olly at survex.com
Fri Dec 17 16:12:37 GMT 2004


On Fri, Dec 17, 2004 at 02:15:34PM +0000, Richard Boulton wrote:
> I propose changing the configuration file search to read an environment
> variable "OMEGA_CONFIG_FILE".  If this is set, the configuration will be
> read from the file whose path is in the environment variable.  If this
> is not set, the configuration will be read from $sysconfdir/omega.conf
> (where $sysconfdir defaults to /etc, but can be set by parameters
> to ./configure).  If the configuration file specified cannot be read,
> default values will be used.

I'm not totally sold on this.

Setting an environmental variable (at least for apache) requires admin
access to the webserver configuration, or (assuming it's configured to
allow you to) the creation of a .htaccess file, or some sort of wrapper
around the CGI (e.g. a shell script which exports the variable and execs
omega).  If .htaccess exists, the server has to read it for anything
served from that directory, which is potentially quite an overhead.

If the only other option is to build from source and set sysconfdir in
configure, if I want to use omega on a server where it's *already
installed*, I'm forced to use a wrapper, .htaccess (if I'm able to), or
to compile my own separate version, which then means I need to worry
about any security patches.  It also wastes my disk quota (or shared
disk space).  Heck, I may not even have access to a compiler on a box
intended for hosting!

I'm not convinced that looking for omega.conf where omega was run from
is worse than this situation.

> 4) Finally, I propose changing the way in which omega and omindex map
> file locations to urls.  Currently, the URL at which a document is
> displayed is stored in each document in the Xapian database.  This has
> the obvious drawback that the index needs to be regenerated if a server
> is reconfigured (for example, change of hostname, or change of path
> within the server).

Although omindex doesn't build the hostname in unless you tell it to
by specifying it on the command line.

On the flip-side, with the current scheme, I can move files around on
disk (changing the pathnames) and the index will continue to work
provided I reconfigure the http server to serve them with the same
paths, with alias or using mod_rewrite.  If the pathnames are built
into the index, I have to rebuild in this situation.

Also the work of translating paths is done at index time.  Usually
it's minor, but if you have a lot of mappings it may not be.  And
the pathnames will almost inevitably be longer than the URLs, which
means a bigger index.

I'm also not quite sure how this would work with content from
scriptindex which came from a database and provides URLs through
a CGI gateway or similar - there are no pathnames to specify.
Similarly for crawled content.  Similarly for indexing a newsfeed
to produce nntp: or news: URLs.  Would omega's default template
look for both a url field and pathnames?

In fact, as I think about this more - pathnames are just an artifact of
how omindex gets the data to index (i.e. reading it from separate files
on local disk), so it feels kind of wrong that omega would need to care
about them...

> Instead, omindex would store the local path of the document in the
> database, and would store no information about the URLs at which
> documents are available externally.  Omega would be provided with a
> translation table in each database from local file prefix to external
> file prefix, and would use this to generate the external URLs.  I've
> used this scheme with other systems, so I know it can be made to work,
> but it would require some changes to applications currently using
> omindex.

Both schemes can be made to work, but it's not really clear to me that
either scheme is inherently better than the other.  There are minor
benefits either way.  And the current scheme has the enormous advantage
that's it's already implemented and debugged!

> Finally, is there a problem with making any of these changes whilst
> we're within the 0.8.x version cycle, or is the expectation that the
> workings of omega and related tools will be reasonably stable within
> this cycle, as the API of libxapian is.

I think we should try to constrain incompatible changes to x.x.0
versions across the board.  Ditto for major reworkings which have
an increased risk of introducing bugs.  But our resources are
limited so we have to be reasonably pragmatic, and at least try to fix
breakage quickly.

I'd suggest seeing where we are in with releases when this stuff is
ready to go in.

Cheers,
    Olly




More information about the Xapian-devel mailing list