[Xapian-discuss] best practices - combining sql database and xapian, size of database?
Peter Karman
peter at peknet.com
Fri Apr 16 14:05:52 BST 2010
Per Jessen wrote on 04/16/2010 04:18 AM:
> Newbie-alert: I'm just getting started on a new project involving a
> full text search requirement, and my initial investigation points to
> xapian being the way to go.
>
> Two questions:
>
> - eventually I'll most likely be indexing towards 50 million
> documents - is this reasonable to expect or attempt with xapian?
>
yes.
> - each of my documents come with a set of attributes. These are easily
> stored and indexed in a sql database, but I'm not quite sure how I
> would combine a sql database lookup with a xapian query? AFAICT,
> xapian also has mechanism for associating attributes with a document,
> might that be the right approach?
I typically store attributes I want to be able to sort on or collapse on
as a Xapian value[0]. Values are not what you search for, but are
attributes associated with a document that you can sort by, fetch, etc.
I usually store my db primary key as a term[1] because I know it is
unique and I want to be able to search for it.
If you want one example of prior art that implements the above, you can
look at the swish_xapian code[2] (part of Swish3). The assumption in
that code is that you have serialized each db record into a XML doc
(which allows for joins, etc), and created a config file that calls out
each field/column as a MetaName and/or PropertyName. MetaNames are terms
in a context (in a field) so you can limit a search to a specific field.
PropertyNames are stored values. A field can be both (as with a date,
for example).
[0]
http://xapian.org/docs/apidoc/html/classXapian_1_1Document.html#f7babb1a6368b95dd327f60b433016ac
[1]
http://xapian.org/docs/apidoc/html/classXapian_1_1Document.html#28eb5f092a2efc25969f5c64b019c79c
[2]
http://dev.swish-e.org/browser/libswish3/trunk/src/xapian/swish_xapian.cpp
--
Peter Karman . http://peknet.com/ . peter at peknet.com
More information about the Xapian-discuss
mailing list