MediaWiki to Omega
aldennisa15
xapian5485 at aldennis.me.uk
Sat Feb 24 09:59:29 GMT 2018
I use Omega to index and search an archive of magazine and ebook pdfs
etc. I also have a Wiki (in MediaWiki) that I wanted to include in that
index too.
If it's any use to anybody, I've adapted dbi2omega to export the pages
from MediaWiki and shared it on GitHub - search for mediawiki2omega.
It doesn't do anything very clever, but it might save someone time
figuring out the MediaWiki database and the scriptindex fields. Feel
free to correct me if I've not understood the xapian fields properly!
I'm sure it could be improved, for example doing something with
categories; exporting Talk:, User: etc namespaces; removing/converting
wiki markup in some way. But what it does now works just great for me. I
wasn't looking to replace the (awful) MediaWiki search (I use
SphinxSearch for that, which is vastly better than the built in search).
I just wanted a single search point for finding those nuggets I knew I
have hidden away somewhere.
As an aside, why are the scriptindex field definitions defined in a
separate file? Couldn't they go in-line before the data (sort of like
column names in a header line in a CSV are)? When parsing the data
stream, things like caption:xxx would be read as field definitions, then
caption=xxx would be text for indexing - not difficult surely? It would
mean a converter such as mediawiki2omega could generate a single stream
that could just be piped into scriptindex, without needing to use a
separate script file for the field definitions.
More information about the Xapian-discuss
mailing list