[Xapian-discuss] Using Xapian for webserver logs...?

John Pye john.pye at student.unsw.edu.au
Wed May 17 08:56:00 BST 2006


Hi all,

I recently tried out Xapian and used it to create an index of about 2000
pdf files. It took a while to index, but it served my needs very nicely
and was simple to set up.

I have another idea for an application of the Xapian indexing system. I
think that it's probably not exactly what Xapian is all about, but
nevertheless, I wonder if you have any comments or alternative suggestions.

The aim is to provide a system that indexes Apache web server logs for a
news-style website content management system. We index articles, issues,
sections of a set of monthly or weekly publications. Articles have topic
tags and we also have information about who (username) is visiting out
site, and when and from where.

What we want to be able to do is to index the webserver's accesses so
that we can do full drill-down and find all hits from people in a
particular country on a particular day, or all hits on a particular
article, etc.

I thought that Xapian, particularly using its boolean mode of operation,
might be suitable for this type of indexing and querying in a way that
perhaps conventional RDBMS are not. Each 'hit' would become a 'document'
in Xapian, so there would soon be millions of 'documents' but with
relatively few 'keywords' such as username, date, article title, etc. 
Would you agree with that thought? If not, would you suggest a different
approach, perhaps some more suitable software? I was thinking of Splunk
and wondering how they might have implemented their system. Would such
indexing and search be feasible with a single shared server?

Is it possible to output aggregate and time-series data from Xapian, or
is it only possible to get ranked search results? My experience so far
is just with Omega, so I'm not sure what the possibilities with the API
might be here.

Has anyone used Xapian in this kind of way?

Any suggestions much appreciated,

Cheers
JP

-- 
http://www.curioussymbols.com/



More information about the Xapian-discuss mailing list