[Xapian-discuss] What is the best way to represent a category hierarchy using term prefixes in Xapian?

Justin Finkelstein justin at redwiredesign.com
Sun Nov 6 22:04:24 GMT 2011


Oddly enough I have a solution for that which has been in use for a few
years and works quite well. Our set up is similar, although each of our
categories have GUiDs as IDs so we don't store the category names.

However, it depends on how you want to get results. In my case, I have
categories with products in and, as I move up the hierarchy, I want to
retrieve everything below that point; so using your data, US would get me
all of the US, Michigan everything in Michigan.

To achieve, this, we simply stored all of the relevant category names for
each document with the prefix CATEGORY: so in your example, the 'Grand
Rapids' doc would contain the following terms:

CATEGORY:grand rapids
CATEGORY:michigan
CATEGORY:us

Therefore, searching for any of these will find you data in the Grand
Rapids. What would help you, I think, is to know that you can have multiple
terms with the same prefix per document - this is what we do and it works
very well in this occasion.

Knowing which type of term to use can be helpful too; in our case, our
terms are GUIDs and we don't want any partial matching - so we should be
using add_boolean (boolean terms not affecting the weight of the document,
just 'yes' or 'no').

How does this sound?

justin

On 6 November 2011 19:48, Jim Razmus II <bonetruck at gmail.com> wrote:

> Assume I have the following example hierarchy:
>
> US
> >Michigan
> >>Detroit
> >>Grand Rapids
> >>Lansing
> >Minnesota
> >>Grand Rapids
> >>Minneapolis
> >>St Paul
> >Ohio
> >>Columbus
> >>Grand Rapids
> >>Sandusky
>
> I see two ways that I could index a “Grand Rapids, Michigan” document with
> prefixed terms:
>
> XFIRSTLEVELus
> XSECONDLEVELmichigan
> XTHIRDLEVELgrandrapids
>
> or
>
> XFIRSTLEVELus
> XSECONDLEVELus_michigan
> XTHIRDLEVELus_michigan_grandrapids
>
> I’m inclined to use the second approach thinking that it will return more
> intuitive results. That is, a search that includes Grand Rapids, Michigan
> search
> criteria is less likely to include documents from Minnesota and Ohio.
>
> However, two aspects of this approach bother me. First, the creation and
> maintenance of term prefixes for each level of the hierarchy feels wrong.
> Second, the concatenation of values seems like a surrogate for using
> weights.
>
> So, what is the best way to represent a hierarchy with term prefixes?
>
> Note, I posted this question to stackoverflow here:
>
>
> http://stackoverflow.com/questions/7585948/what-is-the-best-way-to-represent-a-
> category-hierarchy-using-term-prefixes-in-xa
>
> I didn't get any responses so I thought I'd try here next.
>
> Best regards,
> Jim
>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



-- 

-- Redwire Design Limited 54 Maltings Place169 Tower Bridge RoadLondon
SE1 3LJwww.redwiredesign.com [ 020 7403 1444 ] - voice[ 020 7378 8711
] - fax

[ 07968 180 720 ] - mobile


More information about the Xapian-discuss mailing list