[Xapian-discuss] MatchSpy:ing on a large recordset

alexander lind malte at webstay.org
Thu May 29 23:08:20 BST 2008


On May 22, 2008, at 12:24 PM, alexander lind wrote:

>
> On May 22, 2008, at 2:55 AM, Olly Betts wrote:
>
>> On Wed, May 21, 2008 at 11:03:04PM -0700, alexander lind wrote:
>>> I have a project in the works that will have a 10-15M records with a
>>> set of arbitrary attributes on each record.
>>>
>>> I need to build a system where a user can filter the recordset by
>>> selecting attribute values and/or negating on them, and for each
>>> attribute value given, the amount of matching records needs to be
>>> calculated in realtime - 1-2 seconds lookup time is acceptable.
>>
>> For the filtering options you describing, making each attribute a
>> term prefix and filtering on those terms would be the most efficient
>> approach I think.
>
> For attributes that can be applied as values, would it be faster to
> put them in values instead?  Like for example the attribute age, which
> could be a value between 1-100.
>
>>
>>
>>> Can this be achieved with Xapian and the MatchSpy functionality?
>>
>> You certainly could do it this way.
>
> Do you think there is a better way to do it with Xapian?
>
>> If there's enough RAM to cache
>> all the value data, you'll probably at least be near the performance
>> target, but without trying it I couldn't say for sure.
>
> Would it be of significant use if I had enough RAM to put the entire
> xapian index in a RAM partition?
>
>> Using C++ here
>> is likely to help - calling from C++ to a scripting language and back
>> tens of millions of times will probably be a measurable overhead.
>
> You mean for when updating the recordset here right?
>


No more answers on this one? :-/

Thanks
Alec



More information about the Xapian-discuss mailing list