[Xapian-discuss] Re: Help on indexscript
Olly Betts
olly at survex.com
Thu Apr 27 13:34:10 BST 2006
On Thu, Apr 27, 2006 at 02:26:17PM +0200, Hermann Rokicz wrote:
> Olly Betts wrote:
> >> groups : truncate=200 boolean=G field=groups
> >
> > Presumably there can be multiple groups? Currently scriptindex doesn't
> > allow you to generate multiple booleans from one field. The best fix
> > currently is probably for your conversion script to produce one entry
> > for all the group names (to go in the field) and also one entry per
> > group (to be indexed as boolean=G).
>
> This is the 'normal' newsgroups-header, like 'Newsgroups: alt.test,
> alt.next.group'. Should my script convert it to something like 'alt.test
> alt.next.group'? What do you mean by 'one entry per group'?
Well, boolean=G takes the whole field value and creates one term from
it. So you'd get the term:
Galt.test,alt.next.group
Which I doubt you want...
So as things are currently your mail parsing script needs to producing
something like:
groups=alt.test,alt.next.group
group=alt.test
group=alt.next.group
With the corresponding index script lines being:
groups : truncate=200 field
group : boolean=G
But ideally scriptindex should be flexible enough to able to handle this
case directly.
> > If you'd find it useful, you're welcome to a copy of the indexer we use
> > for gmane (http://search.gmane.org/), which has a similar purpose.
>
> That would be nice. My e-mail-address from the header is working.
OK. It'll probably take me a day or two to sort out.
Cheers,
Olly
More information about the Xapian-discuss
mailing list