[Xapian-discuss] best practices - combining sql database and xapian, size of database?

Peter Karman peter at peknet.com
Fri Apr 16 14:05:52 BST 2010


Per Jessen wrote on 04/16/2010 04:18 AM:
> Newbie-alert:  I'm just getting started on a new project involving a
> full text search requirement, and my initial investigation points to
> xapian being the way to go. 
> 
> Two questions:
> 
> - eventually I'll most likely be indexing towards 50 million 
> documents - is this reasonable to expect or attempt with xapian?
> 

yes.

> - each of my documents come with a set of attributes. These are easily
> stored and indexed in a sql database, but I'm not quite sure how I
> would combine a sql database lookup with a xapian query?  AFAICT,
> xapian also has mechanism for associating attributes with a document,
> might that be the right approach?

I typically store attributes I want to be able to sort on or collapse on
as a Xapian value[0]. Values are not what you search for, but are
attributes associated with a document that you can sort by, fetch, etc.

I usually store my db primary key as a term[1] because I know it is
unique and I want to be able to search for it.

If you want one example of prior art that implements the above, you can
look at the swish_xapian code[2] (part of Swish3). The assumption in
that code is that you have serialized each db record into a XML doc
(which allows for joins, etc), and created a config file that calls out
each field/column as a MetaName and/or PropertyName. MetaNames are terms
in a context (in a field) so you can limit a search to a specific field.
PropertyNames are stored values. A field can be both (as with a date,
for example).

[0]
http://xapian.org/docs/apidoc/html/classXapian_1_1Document.html#f7babb1a6368b95dd327f60b433016ac

[1]
http://xapian.org/docs/apidoc/html/classXapian_1_1Document.html#28eb5f092a2efc25969f5c64b019c79c

[2]
http://dev.swish-e.org/browser/libswish3/trunk/src/xapian/swish_xapian.cpp

-- 
Peter Karman  .  http://peknet.com/  .  peter at peknet.com



More information about the Xapian-discuss mailing list