[Xapian-discuss] Indexing database to work with Omega
Olly Betts
olly at survex.com
Fri Dec 16 17:14:52 GMT 2005
On Fri, Dec 16, 2005 at 02:50:31PM -0200, Rafael Jorge wrote:
> 1 - Read 1000 rows from table
> 2 - Replace all no printable characteres (like ".:></[]" etc) for ' '
> (space)
> 3 - Replace ' ' (double space) into ' ' (unique space)
> 4 - Open xapian database, and create a variable to Xapian::Document
> 5 - For each row split the text into words using ' ' (space) as delimiter
> 6 - For each word insert into document (doc.add_postiing(word,index))
> 7 - Clean up the document, and back to 5, until finish rows
> 8 - Close database
Assuming step 7 includes database.add_document(document) then you're not
far off, but you also need to set the document data so Omega can display
results. Omega expects this to contain a series of fields, one per
line, of the form:
fieldname=fieldvalue
See the section "Document data construction" near the end of overview.txt
in the Omega documentation.
Incidentally, you might also want to consider using scriptindex instead
of coding up your own indexer. It was written to allow easy indexing of
information extracted from SQL databases and the like.
Cheers,
Olly
More information about the Xapian-discuss
mailing list