[Xapian-discuss] Using Xapian for webserver logs...?

Michael Schlenker schlenk at uni-oldenburg.de
Wed May 17 09:22:16 BST 2006


John Pye schrieb:
> I have another idea for an application of the Xapian indexing system. I
> think that it's probably not exactly what Xapian is all about, but
> nevertheless, I wonder if you have any comments or alternative suggestions.
> 
> The aim is to provide a system that indexes Apache web server logs for a
> news-style website content management system. We index articles, issues,
> sections of a set of monthly or weekly publications. Articles have topic
> tags and we also have information about who (username) is visiting out
> site, and when and from where.
> 
> What we want to be able to do is to index the webserver's accesses so
> that we can do full drill-down and find all hits from people in a
> particular country on a particular day, or all hits on a particular
> article, etc.

Sounds more like you want a RDBMS and do data warehousing/decision
support type stuff with it.

> 
> I thought that Xapian, particularly using its boolean mode of operation,
> might be suitable for this type of indexing and querying in a way that
> perhaps conventional RDBMS are not. Each 'hit' would become a 'document'
> in Xapian, so there would soon be millions of 'documents' but with
> relatively few 'keywords' such as username, date, article title, etc. 
> Would you agree with that thought? If not, would you suggest a different
> approach, perhaps some more suitable software? I was thinking of Splunk
> and wondering how they might have implemented their system. Would such
> indexing and search be feasible with a single shared server?

You could do things like that with Xapians API, the main question is
'why?'. You seem to not do any meaningful fulltext search.

I would simply parse the logfiles, store the 'dimensions' your
interested in into a suitable RDBMS, an then use that to drill down. A
RDBMS is probably more suitable for this task, but you have to invest
some time to design proper table structures for the type of questions
you want answered.

What can be useful is combining xapian with a RDBMS to index documents
for fulltext search as an alternative access path to metadata retrieved
from an RDBMS. Depends on your application. For web server log files i
don't see it.

Michael



More information about the Xapian-discuss mailing list