[Xapian-devel] Omega changes
james at tartarus.org
Fri Dec 17 17:16:34 GMT 2004
On Fri, Dec 17, 2004 at 04:12:37PM +0000, Olly Betts wrote:
> > I propose changing the configuration file search to read an environment
> > variable "OMEGA_CONFIG_FILE".
> Setting an environmental variable (at least for apache) requires admin
> access to the webserver configuration, or (assuming it's configured to
> allow you to) the creation of a .htaccess file, or some sort of wrapper
> around the CGI (e.g. a shell script which exports the variable and execs
> omega). If .htaccess exists, the server has to read it for anything
> served from that directory, which is potentially quite an overhead.
If you can't affect your VHOST config, and you can't use .htaccess,
and you can't write a simple wrapper (we could supply one) then you
should move to a different hosting provider. Seriously. I can't stress
how much of a security hole I see this as being, because the
alternatives are (a) to write a Redirect/RedirectMatch directive in
your .htaccess/VHOST/directory or server config, or (b) expose
implementation details. And the default is (b).
Life is bad enough with people using PHP in shared hosting
environments without us supplying a CGI binary in C (which most people
can't audit themselves) that relies on a configuration mechanism that
is insecure. Many people are going to drop their databases into their
HTTP serving root and relying on obscurity to conceal them; making it
easy to find out where they are is even worse.
This is only a problem in shared hosting environments, but that's the
most common type. And I doubt there are many major ISPs that don't
have a poorly-written PHP script lying around that someone can use to
download or - worse - alter arbitrary files on the system, given the
filename. (This is mitigated if something like suexec is being used,
which isn't always, and if permissions on the database and config file
are set carefully, which isn't always.)
> > 4) Finally, I propose changing the way in which omega and omindex map
> > file locations to urls. Currently, the URL at which a document is
> > displayed is stored in each document in the Xapian database. This has
> > the obvious drawback that the index needs to be regenerated if a server
> > is reconfigured (for example, change of hostname, or change of path
> > within the server).
> On the flip-side, with the current scheme, I can move files around on
> disk (changing the pathnames) and the index will continue to work
> provided I reconfigure the http server to serve them with the same
> paths, with alias or using mod_rewrite. If the pathnames are built
> into the index, I have to rebuild in this situation.
I think Richard is proposing having the URL for a document built out
of fields, /not/ being able to reconstruct the underlying filename
(which I agree isn't a good plan).
(Actually, I've just re-read Richard's email, and you're right about
what he means. But I think we can get what Richard wants and still
work with URLs in the database.)
> Also the work of translating paths is done at index time. Usually
> it's minor, but if you have a lot of mappings it may not be. And
> the pathnames will almost inevitably be longer than the URLs, which
> means a bigger index.
Again, I don't think this is a problem. If the document knows that it
is a resource at URL url.join(base, local) then it's not much more
complex than now, and does provide the ability to move the base
Of course, this may not actually be that useful. I don't think I've
ever done something like that, to be honest.
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-devel