[Xapian-discuss] Perl example: parse terms, search , get total, get result, parse result

Kevin SoftDev kevin.softdev at gmail.com
Thu Mar 9 16:28:48 GMT 2006


Olly,

Yes of course I am sanitizing HTML. Actually HTML is striped at beginning of
the process and only text is placed into database, it makes database lot
small too.

I would like to ask you a question. When I try to search for "hiking"
http://nitra.net/cgi-bin/hladaj.cgi?a=q&q=hiking&c=sk
I am not getting any results back.

However if I search for hike only I get result back even the one including
hiking that is display on the top of the search.
http://nitra.net/cgi-bin/hladaj.cgi?a=q&q=hike&c=sk

Is it possible during index time to have every words index and disable
the stemming algorithm or that is is part of the package whether we like or
not?

Thanks,
Kevin
http://nitra.net


On 3/8/06, Olly Betts <olly at survex.com> wrote:
>
> On Wed, Mar 08, 2006 at 09:43:53AM -0800, Kevin SoftDev wrote:
> >   my $total  = $db->get_termfreq($terms);
>
> This looks up the frequency of a single term, so it'll be fine for a one
> term query, but will return zero for anything more complicated (unless
> you happen to have terms with spaces, etc in).
>
> As I explained just now, you want MSet::get_matches_estimated().
>
> >     $html = $doc->get_data();
> >
> >     $html    =~ m/body=(.*)/;   $body  = $1;
>
> That's kind of risky - you only want to match body at the start of a
> line, but this doesn't specify that, so it'll match wrongly if there's
> an earlier line containing "body=" anywhere in it.  I suggest:
>
>      my ($body) = $html =~ m/^body=(.*)/m;
>
> >     print "<a href=\"$url\"
> > target=_blank><b>$title</b><BR><i>$url</i></a><BR>$body";
>
> You really want to be escaping values put into HTML output, unless
> you've carefully sanitised them at indexing time.  Otherwise you're
> opening yourself to cross-site scripting type exploits.
>
> Cheers,
>    Olly
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20060309/6eebb3a5/attachment.htm


More information about the Xapian-discuss mailing list