[Xapian-devel] Omega changes

Richard Boulton richard at tartarus.org
Fri Dec 17 14:15:34 GMT 2004


I propose making a few changes to the way omega (and omindex) operate.
I'm posting these to the list before doing so to check if they'll cause
obvious problems for anyone.

1) Configuration handling for omega.  Omega has a configuration file,
which specifies where databases, templates and logfiles are to be found.
It currently looks for this configuration file in its current working
directory (which will usually be the directory the binary is located).
If the configuration file is not found in this location (or is
unreadable), it looks at /etc/omega.conf, and if this doesn't exist, it
uses default values.

Reading a configuration file from the current working directory seems
bad practice to me, and could be a potential, albeit small, security
risk; care needs to be taken to avoid serving the file to clients.

I propose changing the configuration file search to read an environment
variable "OMEGA_CONFIG_FILE".  If this is set, the configuration will be
read from the file whose path is in the environment variable.  If this
is not set, the configuration will be read from $sysconfdir/omega.conf
(where $sysconfdir defaults to /etc, but can be set by parameters
to ./configure).  If the configuration file specified cannot be read,
default values will be used.

2) Updating of omindex databases.  Currently, when run with the
"replace" duplicates option, omindex will index from scratch each
document found, even if it is already in the index.  In most situations,
it would be more desirable to reindex only those files whose
modification time has changed.  I propose to implement this as a new
duplicates option (call it "timestamp"), and make it the default
duplicates option.

The omega templates already support documents containing a field named
"modtime", holding a time_t timestamp, but omindex doesn't produce such
a field.  The only change to the data stored in the database would be to
add this field to the document contents.  With the default templates,
this would cause the last-modified time of the document to be displayed
in search results, but this could easily be suppressed if desired.

Actually, Olly suggested that it might be sensible to remove the
duplicates options entirely, and simply default to the behaviour
specified above.  Does anyone actually use omindex with a --duplicates
option other than "replace"?

3) Add database specific configuration files to omindex, which are used
to specify how a database has been indexed.  These configuration files
could consist simply of the command line options used, or possibly
equivalent information in an easy-to-parse format.  The configuration
file could be used by omega to configure the query parser, and other
search options, appropriately to the database being searched.

In addition to current options, these configuration files could specify
which information to store in the 

4) Finally, I propose changing the way in which omega and omindex map
file locations to urls.  Currently, the URL at which a document is
displayed is stored in each document in the Xapian database.  This has
the obvious drawback that the index needs to be regenerated if a server
is reconfigured (for example, change of hostname, or change of path
within the server).

Instead, omindex would store the local path of the document in the
database, and would store no information about the URLs at which
documents are available externally.  Omega would be provided with a
translation table in each database from local file prefix to external
file prefix, and would use this to generate the external URLs.  I've
used this scheme with other systems, so I know it can be made to work,
but it would require some changes to applications currently using
omindex.


Finally, is there a problem with making any of these changes whilst
we're within the 0.8.x version cycle, or is the expectation that the
workings of omega and related tools will be reasonably stable within
this cycle, as the API of libxapian is.

-- 
Richard Boulton <richard at tartarus.org>





More information about the Xapian-devel mailing list