queries for a set of values

Eric Wong e at 80x24.org
Fri Apr 26 23:37:37 BST 2024


I probably should've used boolean terms in addition to numeric
values when indexing, but currently I have a set of numeric
values[1] and trying to avoid having to reindex ~250GB DBs
(and asking numerous users to do the same).

Say I have a bunch of values which I want to filter a query against.
If I had boolean terms, it could just OP_OR against the whole set.
IOW, this is what notmuch does with terms:

	std::set<std::string> terms;

	// notmuch populates terms via terms.insert(*i)...

	Query(OP_OR, terms.begin(), terms.end());

	// Disclaimer: I don't really know C++

With a set of integers I have (after sortable_serialise), would the
best way be to OP_OR a bunch of OP_VALUE_RANGE queries together?

So, perhaps something like:

	Query(OP_OR,
		Query(OP_VALUE_RANGE, column, v[0], v[0]),
		Query(OP_VALUE_RANGE, column, v[1], v[2]),
		Query(OP_VALUE_RANGE, column, v[3], v[3]),
		...
		Query(OP_VALUE_RANGE, column, v[LAST], v[LAST]))

// Or (totally not even compile-tested and I don't know C++)
// something like:

	std::vector<Xapian::Query> subq;

	for (size_t i = 0; i < nelem; i++) {
		std::string v = sortable_serialise(int_vals[i]));

		subq.insert(Query(OP_VALUE_RANGE, column, v, v));
	}

	Query(OP_OR, subq.begin(), subq.end());

It seems what I'm really looking for is an OP_VALUE_OR or OP_VALUE_IN;
but only OP_VALUE_{GE,LE,RANGE} exists.

[1] Even if I switched to terms, I would still keep the numeric
    values since I also rely on Enquire.set_collapse_key on this
    column.



More information about the Xapian-discuss mailing list