[Xapian-discuss] Perl example: parse terms, search , get total,
get result, parse result
Kevin SoftDev
kevin.softdev at gmail.com
Thu Mar 9 16:28:48 GMT 2006
Olly,
Yes of course I am sanitizing HTML. Actually HTML is striped at beginning of
the process and only text is placed into database, it makes database lot
small too.
I would like to ask you a question. When I try to search for "hiking"
http://nitra.net/cgi-bin/hladaj.cgi?a=q&q=hiking&c=sk
I am not getting any results back.
However if I search for hike only I get result back even the one including
hiking that is display on the top of the search.
http://nitra.net/cgi-bin/hladaj.cgi?a=q&q=hike&c=sk
Is it possible during index time to have every words index and disable
the stemming algorithm or that is is part of the package whether we like or
not?
Thanks,
Kevin
http://nitra.net
On 3/8/06, Olly Betts <olly at survex.com> wrote:
>
> On Wed, Mar 08, 2006 at 09:43:53AM -0800, Kevin SoftDev wrote:
> > my $total = $db->get_termfreq($terms);
>
> This looks up the frequency of a single term, so it'll be fine for a one
> term query, but will return zero for anything more complicated (unless
> you happen to have terms with spaces, etc in).
>
> As I explained just now, you want MSet::get_matches_estimated().
>
> > $html = $doc->get_data();
> >
> > $html =~ m/body=(.*)/; $body = $1;
>
> That's kind of risky - you only want to match body at the start of a
> line, but this doesn't specify that, so it'll match wrongly if there's
> an earlier line containing "body=" anywhere in it. I suggest:
>
> my ($body) = $html =~ m/^body=(.*)/m;
>
> > print "<a href=\"$url\"
> > target=_blank><b>$title</b><BR><i>$url</i></a><BR>$body";
>
> You really want to be escaping values put into HTML output, unless
> you've carefully sanitised them at indexing time. Otherwise you're
> opening yourself to cross-site scripting type exploits.
>
> Cheers,
> Olly
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20060309/6eebb3a5/attachment.htm
More information about the Xapian-discuss
mailing list