[Xapian-discuss] MatchSpy:ing on a large recordset
alexander lind
malte at webstay.org
Thu May 22 20:24:50 BST 2008
On May 22, 2008, at 2:55 AM, Olly Betts wrote:
> On Wed, May 21, 2008 at 11:03:04PM -0700, alexander lind wrote:
>> I have a project in the works that will have a 10-15M records with a
>> set of arbitrary attributes on each record.
>>
>> I need to build a system where a user can filter the recordset by
>> selecting attribute values and/or negating on them, and for each
>> attribute value given, the amount of matching records needs to be
>> calculated in realtime - 1-2 seconds lookup time is acceptable.
>
> For the filtering options you describing, making each attribute a
> term prefix and filtering on those terms would be the most efficient
> approach I think.
For attributes that can be applied as values, would it be faster to
put them in values instead? Like for example the attribute age, which
could be a value between 1-100.
>
>
>> Can this be achieved with Xapian and the MatchSpy functionality?
>
> You certainly could do it this way.
Do you think there is a better way to do it with Xapian?
> If there's enough RAM to cache
> all the value data, you'll probably at least be near the performance
> target, but without trying it I couldn't say for sure.
Would it be of significant use if I had enough RAM to put the entire
xapian index in a RAM partition?
> Using C++ here
> is likely to help - calling from C++ to a scripting language and back
> tens of millions of times will probably be a measurable overhead.
You mean for when updating the recordset here right?
Thanks
Alec
More information about the Xapian-discuss
mailing list