[Xapian-discuss] best practices - combining sql database and xapian, size of database?

Michel Pelletier pelletier.michel at gmail.com
Fri Apr 16 21:03:22 BST 2010


I think there might be a few approaches to this.

In my experience, all terms and values at some point come from
attributes.  Whether or not this can be considered "storing" them for
use as attributes again depends on the way you are mapping xapian
results to objects.  There are also some libraries out there that can
help you do that without much thought to the raw term/document level
details.

Technically, I find that terms are used for querying, and values are
used for sorting and counting, I try to keep my terms and values
restricted to those uses and avoid being tempted to use them as
containers for values I intend to treat as "attributes".

As for that, I find it's best to serialize the object into the
document.get/set_data() blob that every document can contain.  This
way the object in question an be easily reconstituted.  This does not
work as well for large graphs of objects.  I have used both the Python
"pickle" serialization (which does add support for most reasonable
object graphs), and now currently a json based serialization.
Depending on your language, there is likely a very simple
serialization library you can use.

All of these points are debatable, that's my take on the subject.

-Mike

On Fri, Apr 16, 2010 at 11:12 AM, Per Jessen <per at computer.org> wrote:
> Peter Karman wrote:
>
>>> - each of my documents come with a set of attributes. These are
>>> easily stored and indexed in a sql database, but I'm not quite sure
>>> how I
>>> would combine a sql database lookup with a xapian query?  AFAICT,
>>> xapian also has mechanism for associating attributes with a document,
>>> might that be the right approach?
>>
>> I typically store attributes I want to be able to sort on or collapse
>> on as a Xapian value[0]. Values are not what you search for, but are
>> attributes associated with a document that you can sort by, fetch,
>> etc.
>>
>> I usually store my db primary key as a term[1] because I know it is
>> unique and I want to be able to search for it.
>
> Time to read the manual.  Sounds like I should be storing my attributes
> as terms.
>
>> If you want one example of prior art that implements the above, you
>> can look at the swish_xapian code[2] (part of Swish3). The assumption
>> in that code is that you have serialized each db record into a XML doc
>> (which allows for joins, etc), and created a config file that calls
>> out each field/column as a MetaName and/or PropertyName. MetaNames are
>> terms in a context (in a field) so you can limit a search to a
>> specific field. PropertyNames are stored values. A field can be both
>> (as with a date, for example).
>
> Okay, thanks.
>
>
> /Per Jessen, Zürich
>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



More information about the Xapian-discuss mailing list