[Xapian-discuss] indexing for newbies?

John Wards jwards at whiteoctober.co.uk
Tue Oct 14 15:53:30 BST 2008


Hi Bobo,

Do you really need to use Xapian for this?

Xapian is really great for keyword searching large data sets (100,000
rows +), possibly adding some simple boolean filters etc. But doing
complicated queries really is a job for MySQL.

I use xapian on 2 largeish databases.

1) http://www.guidesforbrides.co.uk (quick keyword search)
I store in the indexer as much text as I can get out of the adverts,
giving weightings if they are premium adverts to bump them up the list
slightly, and I store a lump of XML as the document. Then I just do a
bit of XSLT transformation on the results.

2) http://www.yourpropertyfinder.co.uk
This searches the address only of properties but uses boolean flags to
filter out things.

Simple queries that could be done in SQL but on cheapo hardware and
100,000s of rows would take 1-10 seconds to return the results..xapian
is much better of course.

Or have I miss understood your issue.

Cheers
John

On Tue, 2008-10-14 at 16:18 +0200, Bobo Wieland wrote:
> Hi again.
> 
> I _know_ that this is asking a lot, but I have no idea how to do
> this. As I've said in my previous mail (see below) I think xapian.org
> lacks a good guide of how to do things.
> 
> (btw I use the php-bindings)
> 
> 
> Anyway, this is an example of one piece of data i want to index - but
> I don't know how to do it. The capital prefix to each row is just some
> way for me to inform you what the data is, it has nothing to do with
> xapian...:
> 
> NAME_OF_COMPETITION Upsala Elit Jr Cup
> START_DATE 2008-09-19
> END_DATE 2008-09-28
> GEOGRAFIC_REGION Uppsala
> GEOGRAFIC_REGION Uppland
> GEOGRAFIC_REGION Region Mitt
> ORGANIZER usif
> CLASS P12
> CLASS_P12_DATE 2008-09-14
> CLASS P14
> CLASS_P14_DATE 2008-09-14
> CLASS PD14
> CLASS_PD14_DATE 2008-09-22
> CLASS F12
> CLASS_F12_DATE 2008-09-14
> CLASS F14
> CLASS_F14_DATE 2008-09-14
> CLASS FD14
> CLASS_FD14_DATE 2008-09-22
> 
> 
> 
> If I want to register for a competition and I'm in the class F12 I
> will search for competitions that has a F12 class and that is open for
> registration (that is all competition that has CLASS_F12_DATE <
> today). This I will do through an advanced search form.
> I don't need any help with the search form but how to index data like
> the data above to make searches as the one i specified possible.
> 
> 
> mvh
> 
> Bobo Wieland - Kodstationen AB
> -------------------------------------------------
> Kodstationen AB
> Järnvägsstationen
> 262 52 Ängelholm
> -------------------------------------------------
> Tel: 0735 - 880 100
> E-post: bobo at kodlabbet.se
> -------------------------------------------------
> Detta e-mail skickades via SilverCRM
> http://www.silvercrm.se
> 
> Bobo Wieland <bobo at kodlabbet.se> skrev:
> >
> >I find it hard to find suitable documentation or articles about
> >indexing. Either it's too simple, or it's just the generated
> >documentation of the xapian-classes.
> >
> >I would like a push in the right direction here. As I've stated in
> >previous posts I'm new to xapian. I will explain the project I'm
> about
> >to start working on and any help is appreciated.
> >
> >I have three tables in mysql that should be indexed for searching.
> >You should get hits from all three tables with a single search (but
> >grouped by table). The tables are "players","clubs","competitions".
> >
> >The searchable fields for each table should be:
> >The players table:
> >Licensnumber (unique),
> >Firstname,
> >Lastname,
> >ClubName,
> >Geografical region
> >
> >The clubs table:
> >ClubName,
> >Geografical region
> >
> >The competitions table:
> >CompetitionName,
> >StartDate,
> >EndDate,
> >Organizer (a ClubName),
> >Geografical region
> >
> >
> >I thought of using terms XPLAYER, XCLUB and XCOMPETITION for keeping
> >the results grouped. And I need some range search for StartDate and
> >EndDate, right?
> >
> >Thing is I have no idea how to do this. How do I add the rows to the
> >xapian database and how do I define the terms?
> >
> >
> >I use the php bindings and xapian through a webserver...
> >
> >
> >
> >
> >mvh
> >
> >Bobo Wieland - Kodstationen AB
> >-------------------------------------------------
> >Kodstationen AB
> >Järnvägsstationen
> >262 52 Ängelholm
> >-------------------------------------------------
> >Tel: 0735 - 880 100
> >E-post: bobo at kodlabbet.se
> >-------------------------------------------------
> >Detta e-mail skickades via SilverCRM
> >http://www.silvercrm.se
> >
> >
> 
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss



More information about the Xapian-discuss mailing list