[Xapian-discuss] Practical example/explanation using an existing database

Jim jim at fayettedigital.com
Tue Jul 24 12:52:18 BST 2007


Edwin Smulders wrote:
> Hi,
>
> I'm reading up on the usage of Xapian to find out if we can use it for
> Wine's Application Database, and I'm having a bit of trouble seeing
> the general picture. I could use some practical information (through
> words or code) on how to search an existing (mysql) database.
>
> As far as I can tell it can be used with a mysql db, and I read that
> Xapian first makes an index (in it's own database/tables ?) and then
> searches through that index. Now a few questions come to mind and I
> couldn't find the answers in the documentation.
As Alexander said, Xapian is just a library that allows you to build and 
search.  Scriptindex is a program using Xapian that takes input from a 
file as a set of text fields and an index file telling what to do with 
the fields and builds an Xapian indexed database from the text.  Each 
set of fields in the input would represent a row in the database 
(presumably) and one of the fields should contain a unique value (like 
an ID) that would be used to fetch the row during a search.

For example, if we had a Mysql db with the following schema:

id bigint
lastname varchar(40)
firstname varchar(40)
address varchar(80)

You could write a simple script/program to extract the data from the db 
and create an input file for scriptindex that looks like:

id=0
lastname=Brown-White
firstname=John
address=123 Main Street
=anywhere,
=VA
=22222
=USA

id=1
lastname=Johnson
firstname=Jack
=aroo
address=234 Story Lane
=somewhere,
=WA
=09876-0988
=USA

Etc.  Each block represents a record fetched from the db.

The index file might look like:

id : field boolean=Q unique=Q
lastname: text
firsname: text
address: text

Scriptindex would then read both files (a data file and an index file) 
and create a searchable database that omega could read.

Omega  would then be called via something like:

omega?P=lastname:Johnson%20ANDfirstname:Roger

or

omega?P=address:23123  (to find all the people in zip 23123)

What omega returns will probably have to be interpreted by a program 
that actually goes to the mysql db and fetches the row and formats it in 
the way you want the data presented.

This is not the best way to index the data, but for simplicity I left 
off a lot since you wanted a concept not the details.  For simplicity I 
used the same names for the fields in Scriptindex as in the database but 
that is not necessary.

Jim.
>
> Firstly, how exactly does the indexing work in regard to telling
> Xapian what to search through? Do we write an SQL query returning all
> the data we want indexed? or maybe do we tell it what tables/columns
> to index (ie. does it generate queries?)
> And how is the index updated, a regular rescan or an update whenever
> data in our system updates?
>
> The other question that came to mind is, once everything is indexed,
> how is the data returned on a search? This is best explained in an
> example:
> If a user would be entering a a search term and I (the programmer)
> want to search the database, can i specifically tell Xapian to search
> in for example the application names, or the descriptions, or both?
By using the "field:" syntax you may search any field(s) that you want.
>
> I hope somebody can clarify this for me, right now it all looks quite
> difficult to implement.
Not at all. 
>
>
> Edwin Smulders
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>




More information about the Xapian-discuss mailing list