From olly at survex.com Tue Jan 6 03:47:24 2009 From: olly at survex.com (Olly Betts) Date: Tue, 6 Jan 2009 03:47:24 +0000 Subject: [Xapian-devel] NearPostList and get_wdf In-Reply-To: <1db5d7c40812290509w39cd275r13e8fc27a9519a@mail.gmail.com> References: <1db5d7c40812280701y2b522decm93d6de9091d34369@mail.gmail.com> <20081229125051.GA25008@meerkat> <1db5d7c40812290509w39cd275r13e8fc27a9519a@mail.gmail.com> Message-ID: <20090106034724.GG15292@survex.com> On Mon, Dec 29, 2008 at 02:09:14PM +0100, Yann ROBIN wrote: > On Mon, Dec 29, 2008 at 1:50 PM, Richard Boulton > wrote: > > I'm not sure that modifying the wdf is really the way to go about this - it > > seems to me that you might do better to use a custom weight class, which > > factored in the frequencies of the individual terms, as well as their > > proximity. You have to choose a weight class for the whole query - it can't be different for different subqueries. So I'm not sure how this would work. A sane approach would probably be in NewNearPostList::get_weight() to multiply the weight returned by the AND query's get_weight() method by a non-negative factor which varies depending how close the terms are - largest when they're together, much smaller when they are far apart. This will be slower to run than the current NearPostList though as it can't stop working on a document when it finds a match within the window size - instead it has to check all the positional data for each document matching the AND query to find the closest match. This factor needs to have a known upper bound, which you multiply get_maxweight() and recalc_maxweight() from the AND query by. > > Feel free to open a feature request ticket, describing the feature that you > > would like to exist. OP_NEAR as it is currently implemented is behaving as > > intended, though. > > The ticket was more for the get_wdf not being called, i don't think this was > something intended. Currently NearPostList::get_wdf() and friends are dead code - I think whoever wrote them probably didn't realise they wouldn't be needed. It's even possible that they were actually used in a really early version. But once the synonym patch gets merged, I think they'll get used if you do a synonym operation with OP_NEAR or OP_PHRASE as a subquery, so it seems unhelpful to rip them out at this point. Cheers, Olly From david.sainty at dtsp.co.nz Thu Jan 15 01:28:34 2009 From: david.sainty at dtsp.co.nz (David Sainty) Date: Thu, 15 Jan 2009 14:28:34 +1300 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 Message-ID: <496E9142.9020209@dtsp.co.nz> Hi, Under gcc 2.95 Xapian fails to build like so: g++ -DHAVE_CONFIG_H -I. -I./common -I./include -I/home/dsainty/not-backed-up/pkgsrc/textproc/xapian/work/.buildlink/include -Wall -W -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wno-long-long -Wformat-security -fno-gnu-keywords -Wundef -O2 -c queryparser/queryparser_internal.cc -Wp,-MD,queryparser/.deps/queryparser_internal.TPlo -fPIC -DPIC -o queryparser/.libs/queryparser_internal.o /data/home/olly/tmp/xapian-svn-snapshot/tags/1.0.10/xapian/xapian-core/queryparser/queryparser.lemony:25: queryparser_internal.h: No such file or directory /data/home/olly/tmp/xapian-svn-snapshot/tags/1.0.10/xapian/xapian-core/queryparser/queryparser.lemony:31: queryparser_token.h: No such file or directory *** Error code 1 The problem seems to be that the build system is relying on the compiler to imply -Iqueryparser when the source file is queryparser/queryparser_internal.cc. Modern gcc makes this implication, gcc 2.95 doesn't. queryparser/Makefile.mk has an almost-solution there already, but it's conditionally disabled. if VPATH_BUILD # We need this so that generated sources can find non-generated headers in a # VPATH build from SVN. INCLUDES += -I$(top_srcdir)/queryparser if MAINTAINER_MODE # We need this because otherwise, if depcomp is being used (as it will be for a # build with gcc-2.95), depcomp will be unable to find queryparser_token.h. # This may be a bug in depcomp, but it certainly happens with automake-1.10. INCLUDES += -I$(top_builddir)/queryparser endif endif If instead it unconditionally set: INCLUDES += -I$(top_builddir)/queryparser ... regardless of flags, that should fix the build on old GCC and perhaps some other non-GCC compilers, and would make it clearer what is happening for those people like me that still find the newer GCC behaviour a little odd :) (Please CC: email, I'm not on the list) Cheers, Dave From david.sainty at dtsp.co.nz Thu Jan 15 03:53:52 2009 From: david.sainty at dtsp.co.nz (David Sainty) Date: Thu, 15 Jan 2009 16:53:52 +1300 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <496E9142.9020209@dtsp.co.nz> References: <496E9142.9020209@dtsp.co.nz> Message-ID: <496EB350.7050105@dtsp.co.nz> David Sainty wrote: > Hi, > > Under gcc 2.95 Xapian fails to build like so: I can confirm that the attached patch fixes the build under gcc 2.95 (after an automake). Cheers, Dave -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: xapian-gcc-295 Url: http://lists.xapian.org/pipermail/xapian-devel/attachments/20090115/cbdc4c91/attachment.txt From olly at survex.com Thu Jan 15 11:39:43 2009 From: olly at survex.com (Olly Betts) Date: Thu, 15 Jan 2009 11:39:43 +0000 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <496EB350.7050105@dtsp.co.nz> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> Message-ID: <20090115113943.GN15292@survex.com> On Thu, Jan 15, 2009 at 04:53:52PM +1300, David Sainty wrote: > David Sainty wrote: > >Under gcc 2.95 Xapian fails to build like so: > > I can confirm that the attached patch fixes the build under gcc 2.95 > (after an automake). Thanks for the patch. But it seems there's something odd going on, as other subdirectories also include headers from the same directory without an explicit -I. The files here are generated, but that shouldn't make a difference as they are shipped in the tarball and it appears you're building from the 1.0.10 source tarball. Perhaps the issue is the "#line" directives with full paths in queryparser_internal.cc - if GCC 2.95 resolves header includes relative to the filename given by "#line" then that would cause this problem. Could you try: perl -pi -e 's/^#line.*//' queryparser/queryparser_internal.cc And then building without your patch. (Unfortunately I no longer have access to GCC 2.95 to test this myself). Another question - what's the reason for using GCC 2.95? We came quite closing to dropping support for GCC < 3 a while back (but instead we ended up requiring 2.95.3 which added ). But I'd assumed that 2.95 was probably no longer used by now, and there are a few minor issues we currently work around to keep support for it, so if there are still people using it I'm interested as to why. Cheers, Olly From david.sainty at dtsp.co.nz Fri Jan 16 00:02:22 2009 From: david.sainty at dtsp.co.nz (David Sainty) Date: Fri, 16 Jan 2009 13:02:22 +1300 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <20090115113943.GN15292@survex.com> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> Message-ID: <496FCE8E.3080108@dtsp.co.nz> Olly Betts wrote: > On Thu, Jan 15, 2009 at 04:53:52PM +1300, David Sainty wrote: > >> David Sainty wrote: >> >>> Under gcc 2.95 Xapian fails to build like so: >>> >> I can confirm that the attached patch fixes the build under gcc 2.95 >> (after an automake). >> > > Thanks for the patch. > > But it seems there's something odd going on, as other subdirectories > also include headers from the same directory without an explicit -I. > The files here are generated, but that shouldn't make a difference as > they are shipped in the tarball and it appears you're building from the > 1.0.10 source tarball. > > Perhaps the issue is the "#line" directives with full paths in > queryparser_internal.cc - if GCC 2.95 resolves header includes relative > to the filename given by "#line" then that would cause this problem. > > Could you try: > > perl -pi -e 's/^#line.*//' queryparser/queryparser_internal.cc > Huh, mighty good guessing! Yeah, that also fixes the build (without the patch). Cleaning out the "#line"s is a good thing, but I'm not totally sure it should replace the patch? (Since files are being included unqualified from "queryparser/" I think it deserves being in INCLUDES?) I don't think gcc's behaviour here is universally true of all compilers (implying the source directory as an include path entry), but in saying that I'm not sure of a counterexample either. Obviously 2.95(.4) has the required behaviour in some form, but is confused by the #line lines. > (Unfortunately I no longer have access to GCC 2.95 to test this myself). > And nor should you :) I was taken aback when I noticed what version I had to work with too :) > Another question - what's the reason for using GCC 2.95? > > We came quite closing to dropping support for GCC < 3 a while back (but > instead we ended up requiring 2.95.3 which added ). But I'd > assumed that 2.95 was probably no longer used by now, and there are a > few minor issues we currently work around to keep support for it, so if > there are still people using it I'm interested as to why. > Yeah, it's in use on some old systems that are long long overdue for updates. A separate project is working on that, but it's not a variable I can control. Essentially it's the usual reasons - the more important the server the harder it is to regularly maintain it :) Cheers, Dave From towel77 at gmail.com Fri Jan 16 23:34:11 2009 From: towel77 at gmail.com (towel moist) Date: Fri, 16 Jan 2009 15:34:11 -0800 Subject: [Xapian-devel] chert vs flint vs lucene Message-ID: <398117540901161534x1315aeb5u86b8fb0c307e50dc@mail.gmail.com> Hi, What's the main difference between chert and flint? What above vs lucene? I am mainly asking about data structure (lexicon, posting list, document data), what's in memory, what's on disk, hash vs b-tree and reasons behind them. Any pointer is appreciated. Thanks! Crystal -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xapian.org/pipermail/xapian-devel/attachments/20090116/5863e372/attachment.htm From olly at survex.com Mon Jan 19 08:17:49 2009 From: olly at survex.com (Olly Betts) Date: Mon, 19 Jan 2009 08:17:49 +0000 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <496FCE8E.3080108@dtsp.co.nz> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> <496FCE8E.3080108@dtsp.co.nz> Message-ID: <20090119081749.GQ15292@survex.com> On Fri, Jan 16, 2009 at 01:02:22PM +1300, David Sainty wrote: > I don't think gcc's behaviour here is universally true of all compilers > (implying the source directory as an include path entry), but in saying > that I'm not sure of a counterexample either. I've successfully compiled Xapian with quite a few different compilers without encountering problems with this (at least GCC, Intel, Sun, HP, SGI). My understanding is that #include with "" implicitly adds the source directory to the search path (whereas #include with <> doesn't). I'm reluctant to start coding around behaviour which compilers *might* have, as that's a very open-ended list. But if anyone has actual evidence of a compiler which doesn't behave this way, we probably need to explicitly add -I options for several other subdirectories which rely on this behaviour. > Obviously 2.95(.4) has the required behaviour in some form, but is > confused by the #line lines. My guess is that this is because it uses a separate preprocessor and relies on "#line" in the preprocessor output to tell the compiler the filename of the source file. GCC 2.95 is adding the source directory to the search path but is confused as to what the source directory is. This is the patch I've actually applied to trunk, which fixes up "#line" directives rather than nuking them (depcomp parses preprocessor output for "#line" so we no longer need that workaround): http://trac.xapian.org/changeset/11823/trunk/xapian-core/queryparser/Makefile.mk?format=diff&new=11823 I'd be grateful if you could try this (it should apply to 1.0.10 cleanly). If it works I'll backport it for 1.0.11. > Yeah, it's in use on some old systems that are long long overdue for > updates. A separate project is working on that, but it's not a variable > I can control. Essentially it's the usual reasons - the more important > the server the harder it is to regularly maintain it :) OK. I guess that's a minor argument for keeping GCC 2.95 support, though it's problematic that we aren't regularly testing it. At some point we're going to have to just start telling people to upgrade. Cheers, Olly From olly at survex.com Mon Jan 19 11:33:11 2009 From: olly at survex.com (Olly Betts) Date: Mon, 19 Jan 2009 11:33:11 +0000 Subject: [Xapian-devel] chert vs flint vs lucene In-Reply-To: <398117540901161534x1315aeb5u86b8fb0c307e50dc@mail.gmail.com> References: <398117540901161534x1315aeb5u86b8fb0c307e50dc@mail.gmail.com> Message-ID: <20090119113311.GW15292@survex.com> On Fri, Jan 16, 2009 at 03:34:11PM -0800, towel moist wrote: > What's the main difference between chert and flint? What above vs lucene? Flint is documented here: http://trac.xapian.org/wiki/FlintBackend The user-visible change in Chert are covered in here (to find them, search for "chert backend:"): http://trac.xapian.org/browser/trunk/xapian-core/NEWS That should be complete at present, but there are likely to be further changes before chert is declared "finished". I don't know of any comparisons with Lucene's low level details. Cheers, Olly From david.sainty at dtsp.co.nz Wed Jan 21 02:59:08 2009 From: david.sainty at dtsp.co.nz (David Sainty) Date: Wed, 21 Jan 2009 15:59:08 +1300 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <20090119081749.GQ15292@survex.com> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> <496FCE8E.3080108@dtsp.co.nz> <20090119081749.GQ15292@survex.com> Message-ID: <49768F7C.7040304@dtsp.co.nz> Hi Olly, > This is the patch I've actually applied to trunk, which fixes up "#line" > directives rather than nuking them (depcomp parses preprocessor output > for "#line" so we no longer need that workaround): > > http://trac.xapian.org/changeset/11823/trunk/xapian-core/queryparser/Makefile.mk?format=diff&new=11823 > > I'd be grateful if you could try this (it should apply to 1.0.10 cleanly). > If it works I'll backport it for 1.0.11. > It took me a while to get the maintainer mode tools together to get the patch to have an effect. I've tried a fresh build in maintainer mode with and without this patch. The bad news is that it builds with both :) The reason is that in maintainer mode the #line entries are valid anyway (oddly I don't get the fully qualified paths [without the patch] that you get when building the tarball - something different about how you kick off the build). I think your patch is a good fix (and gets rid of your machine's build paths in the distribtion :). But my test isn't perfect - though you do at least know that the build works after the with-patch manipulation. If you want to verify for sure I guess you need to build queryparser_internal.cc as you would for a distribution tar, and then I can do a non-maintainer build in this environment with that file and see if it completes. Cheers, Dave From olly at survex.com Wed Jan 21 04:22:09 2009 From: olly at survex.com (Olly Betts) Date: Wed, 21 Jan 2009 04:22:09 +0000 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <49768F7C.7040304@dtsp.co.nz> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> <496FCE8E.3080108@dtsp.co.nz> <20090119081749.GQ15292@survex.com> <49768F7C.7040304@dtsp.co.nz> Message-ID: <20090121042209.GJ8027@survex.com> On Wed, Jan 21, 2009 at 03:59:08PM +1300, David Sainty wrote: > It took me a while to get the maintainer mode tools together to get the > patch to have an effect. I've tried a fresh build in maintainer mode > with and without this patch. The bad news is that it builds with both :) > The reason is that in maintainer mode the #line entries are valid anyway > (oddly I don't get the fully qualified paths [without the patch] that > you get when building the tarball - something different about how you > kick off the build). Ah, sorry about this. I should have thought things through more. The script which builds releases and snapshots does them with builddir != srcdir (sometimes called a VPATH build) - that's the difference you're seeing. With builddir = srcdir you used to get stuff like this (with an extra harmless './'): #line 1234 "./queryparser/queryparser.lemony" Thanks for your efforts though - much appreciated, and they do give me extra confidence in the fix. > I think your patch is a good fix (and gets rid of your machine's build > paths in the distribtion :). Yes, that's certainly an improvement (though there are other places where these leak in still). > But my test isn't perfect - though you do > at least know that the build works after the with-patch manipulation. If > you want to verify for sure I guess you need to build > queryparser_internal.cc as you would for a distribution tar, and then I > can do a non-maintainer build in this environment with that file and see > if it completes. The simplest test would be to just use the version from an SVN trunk snapshot - I've extracted one here: http://oligarchy.co.uk/xapian/patches/queryparser_internal.cc Cheers, Olly From david.sainty at dtsp.co.nz Wed Jan 21 05:31:51 2009 From: david.sainty at dtsp.co.nz (David Sainty) Date: Wed, 21 Jan 2009 18:31:51 +1300 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <20090121042209.GJ8027@survex.com> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> <496FCE8E.3080108@dtsp.co.nz> <20090119081749.GQ15292@survex.com> <49768F7C.7040304@dtsp.co.nz> <20090121042209.GJ8027@survex.com> Message-ID: <4976B347.4050009@dtsp.co.nz> > The simplest test would be to just use the version from an SVN trunk > snapshot - I've extracted one here: > > http://oligarchy.co.uk/xapian/patches/queryparser_internal.cc > It looks like things have changed since 1.0.10, I get build errors with this for undefined QUERYPARSER and LOGLINE when I drop it in. The diff looks "different" too :) From olly at survex.com Wed Jan 21 08:33:11 2009 From: olly at survex.com (Olly Betts) Date: Wed, 21 Jan 2009 08:33:11 +0000 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <4976B347.4050009@dtsp.co.nz> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> <496FCE8E.3080108@dtsp.co.nz> <20090119081749.GQ15292@survex.com> <49768F7C.7040304@dtsp.co.nz> <20090121042209.GJ8027@survex.com> <4976B347.4050009@dtsp.co.nz> Message-ID: <20090121083311.GL8027@survex.com> On Wed, Jan 21, 2009 at 06:31:51PM +1300, David Sainty wrote: > It looks like things have changed since 1.0.10, I get build errors with > this for undefined QUERYPARSER and LOGLINE when I drop it in. The diff > looks "different" too :) Ah yes. I had done a quick diff, but only noticed a lot of changes due to using a newer version of lemon, but the debug logging macros have also changed. I've backported the fix now. If you're still up for it, try snapshot 11827 or later of xapian-core from here (it hasn't yet built as I write but should have within the hour): http://oligarchy.co.uk/xapian/branches/1.0/ Cheers, Olly From l.rieder at gmail.com Wed Jan 21 16:04:49 2009 From: l.rieder at gmail.com (Lukas Rieder) Date: Wed, 21 Jan 2009 17:04:49 +0100 Subject: [Xapian-devel] Xapian Ruby bindings as a Ruby Gem Message-ID: <11d847b60901210804o83f7d7dl4d2098166112ba22@mail.gmail.com> Hello everyone, I'm very interested in packaging the Xapian Ruby bindings as a gem for easier installation and integration into Ruby projects. I came to this idea, because many Ruby developers face the problem of installing xapian under hosting service where they have no root rights. And Xapian is THE essential part of the Rails plugin acts_as_xapian, wich brings all the Xapian features to the Rails world. This plugin is maintained by me, Francis the creator has stopped development due to the lack of time. Could you please tell me who is the maintainer of the Xapian Ruby bindings? Is there such a project already? Is someone interested in working on this idea? Thank you very much, Lukas Rieder -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xapian.org/pipermail/xapian-devel/attachments/20090121/59eaaa78/attachment.htm From richard at lemurconsulting.com Wed Jan 21 16:14:20 2009 From: richard at lemurconsulting.com (Richard Boulton) Date: Wed, 21 Jan 2009 16:14:20 +0000 Subject: [Xapian-devel] Xapian Ruby bindings as a Ruby Gem In-Reply-To: <11d847b60901210804o83f7d7dl4d2098166112ba22@mail.gmail.com> References: <11d847b60901210804o83f7d7dl4d2098166112ba22@mail.gmail.com> Message-ID: <20090121161420.GA15317@meerkat> On Wed, Jan 21, 2009 at 05:04:49PM +0100, Lukas Rieder wrote: > Could you please tell me who is the maintainer of the Xapian Ruby bindings? Currently, there isn't an "offical Ruby maintainer" - that is, someone dedicated to improving the Ruby bindings and fixing problems with them. Olly is probably the closest thing to such a maintainer. The Ruby bindings are produced in xapian-bindings using the same SWIG framework that the python, C#, php, etc bindings are produced from. Olly occasionally applies patches to xapian-bindings to fix problems with these, but as far as I know, he's not using Ruby in earnest. > Is there such a project already? > Is someone interested in working on this idea? Not that I'm aware of. Assuming a ruby gem is somewhat like a java jar, or a python egg, in which case this sounds like a useful project. -- Richard From james-xapian at tartarus.org Wed Jan 21 16:16:41 2009 From: james-xapian at tartarus.org (James Aylett) Date: Wed, 21 Jan 2009 16:16:41 +0000 Subject: [Xapian-devel] Xapian Ruby bindings as a Ruby Gem In-Reply-To: <20090121161420.GA15317@meerkat> References: <11d847b60901210804o83f7d7dl4d2098166112ba22@mail.gmail.com> <20090121161420.GA15317@meerkat> Message-ID: <20090121161641.GF2354@tartarus.org> On Wed, Jan 21, 2009 at 04:14:20PM +0000, Richard Boulton wrote: > Assuming a ruby gem is somewhat like a java jar, or a python egg, in > which case this sounds like a useful project. It's a bit more like a ports package, I believe, in that it will build from source (including bundled libraries). So a gem of the Xapian bindings should be able to work with an installed Xapian, or build its own, depending on what it finds. (Certainly that's how things like the ferret gem worked last time I used it.) J -- James Aylett talktorex.co.uk - xapian.org - uncertaintydivision.org From olly at survex.com Wed Jan 21 16:23:14 2009 From: olly at survex.com (Olly Betts) Date: Wed, 21 Jan 2009 16:23:14 +0000 Subject: [Xapian-devel] Xapian Ruby bindings as a Ruby Gem In-Reply-To: <20090121161420.GA15317@meerkat> References: <11d847b60901210804o83f7d7dl4d2098166112ba22@mail.gmail.com> <20090121161420.GA15317@meerkat> Message-ID: <20090121162313.GJ15292@survex.com> On Wed, Jan 21, 2009 at 04:14:20PM +0000, Richard Boulton wrote: > On Wed, Jan 21, 2009 at 05:04:49PM +0100, Lukas Rieder wrote: > > Could you please tell me who is the maintainer of the Xapian Ruby bindings? > > Currently, there isn't an "offical Ruby maintainer" - that is, someone > dedicated to improving the Ruby bindings and fixing problems with them. > > Olly is probably the closest thing to such a maintainer. The Ruby bindings > are produced in xapian-bindings using the same SWIG framework that the > python, C#, php, etc bindings are produced from. Olly occasionally applies > patches to xapian-bindings to fix problems with these, but as far as I > know, he's not using Ruby in earnest. I don't use Ruby except for the work I've done on the bindings. I'd be delighted if someone better versed in the language wanted to take over. There was discussion of a Ruby Gem in the past - see here, and the earlier thread linked to from there: http://thread.gmane.org/gmane.comp.search.xapian.general/6278 Cheers, Olly From towel77 at gmail.com Wed Jan 21 20:24:44 2009 From: towel77 at gmail.com (towel moist) Date: Wed, 21 Jan 2009 12:24:44 -0800 Subject: [Xapian-devel] chert vs flint vs lucene In-Reply-To: <20090119113311.GW15292@survex.com> References: <398117540901161534x1315aeb5u86b8fb0c307e50dc@mail.gmail.com> <20090119113311.GW15292@survex.com> Message-ID: <398117540901211224l2be0c523va601a1506c30fc96@mail.gmail.com> Thanks for the pointers. I tried to look for runtime query serving performance numbers to no avail. What is the latency range for random queries at a given rate (say 500 QPS) to a index built at a certain (say 5M docs) size, assuming not too many concurrent connections? I am expecting sub-second but just curious whether it's more 500ms or 100-200ms range. Thanks! Crystal On Mon, Jan 19, 2009 at 3:33 AM, Olly Betts wrote: > On Fri, Jan 16, 2009 at 03:34:11PM -0800, towel moist wrote: > > What's the main difference between chert and flint? What above vs > lucene? > > Flint is documented here: > > http://trac.xapian.org/wiki/FlintBackend > > The user-visible change in Chert are covered in here (to find them, > search for "chert backend:"): > > http://trac.xapian.org/browser/trunk/xapian-core/NEWS > > That should be complete at present, but there are likely to be further > changes before chert is declared "finished". > > I don't know of any comparisons with Lucene's low level details. > > Cheers, > Olly > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xapian.org/pipermail/xapian-devel/attachments/20090121/bd62811b/attachment.htm From l.rieder at gmail.com Wed Jan 21 21:13:17 2009 From: l.rieder at gmail.com (Lukas Rieder) Date: Wed, 21 Jan 2009 22:13:17 +0100 Subject: [Xapian-devel] Xapian Ruby bindings as a Ruby Gem In-Reply-To: <20090121162313.GJ15292@survex.com> References: <11d847b60901210804o83f7d7dl4d2098166112ba22@mail.gmail.com> <20090121161420.GA15317@meerkat> <20090121162313.GJ15292@survex.com> Message-ID: <11d847b60901211313l36c28d3cr3cb832241efaf534@mail.gmail.com> Hy, everybody thank you for your reply. This is really kind of funny, because Francis (the guy who asked last year) was the maintainer of the plugin I am leading at the moment. Hehe, that's how things go, and the idea isn't a new one. Ok, I've never written a Ruby Gem but I'll ask some people and maybe someone is interested plus has the knowledge to do so. Maybe (and hopefully) I can organize something to build and maintain a Gem version of Xapian and its bindings. But it is very good to know who is in for what, and I'll definitely contact Olly again for that reason. I shall come back here soon ;) In the meantime, have fun and keep up your good work! Cheers, Lukas 2009/1/21 Olly Betts > On Wed, Jan 21, 2009 at 04:14:20PM +0000, Richard Boulton wrote: > > On Wed, Jan 21, 2009 at 05:04:49PM +0100, Lukas Rieder wrote: > > > Could you please tell me who is the maintainer of the Xapian Ruby > bindings? > > > > Currently, there isn't an "offical Ruby maintainer" - that is, someone > > dedicated to improving the Ruby bindings and fixing problems with them. > > > > Olly is probably the closest thing to such a maintainer. The Ruby > bindings > > are produced in xapian-bindings using the same SWIG framework that the > > python, C#, php, etc bindings are produced from. Olly occasionally > applies > > patches to xapian-bindings to fix problems with these, but as far as I > > know, he's not using Ruby in earnest. > > I don't use Ruby except for the work I've done on the bindings. I'd be > delighted if someone better versed in the language wanted to take over. > > There was discussion of a Ruby Gem in the past - see here, and the > earlier thread linked to from there: > > http://thread.gmane.org/gmane.comp.search.xapian.general/6278 > > Cheers, > Olly > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xapian.org/pipermail/xapian-devel/attachments/20090121/f692d6a4/attachment.htm From olly at survex.com Thu Jan 22 00:29:14 2009 From: olly at survex.com (Olly Betts) Date: Thu, 22 Jan 2009 00:29:14 +0000 Subject: [Xapian-devel] chert vs flint vs lucene In-Reply-To: <398117540901211224l2be0c523va601a1506c30fc96@mail.gmail.com> References: <398117540901161534x1315aeb5u86b8fb0c307e50dc@mail.gmail.com> <20090119113311.GW15292@survex.com> <398117540901211224l2be0c523va601a1506c30fc96@mail.gmail.com> Message-ID: <20090122002914.GO6366@survex.com> On Wed, Jan 21, 2009 at 12:24:44PM -0800, towel moist wrote: > I tried to look for runtime query serving performance numbers to no avail. > What is the latency range for random queries at a given rate (say 500 QPS) > to a index built at a certain (say 5M docs) size, assuming not too many > concurrent connections? > > I am expecting sub-second but just curious whether it's more 500ms or > 100-200ms range. I'm not sure I can usefully answer a question like this - it will depend significantly on the nature of the data, the nature of the queries, and the hardware specs. The best way to get a feel for it is to build a prototype with realistic data and queries and see. BTW, be careful of benchmarking with "random" queries - here's an example where the picture is very different when you look at queries using words from the documents vs "nonsense" queries, few of which actually match any documents: http://tag1consulting.com/Comparing_Xapian_and_Drupal5_Core_Search That's a fairly extreme case, but even pulling random words from the vocabulary won't produce representative queries and may skew results. Ideally you want to replay query logs from actual users searching over the same database, but unfortunately you rarely have those when you start developing a system. Cheers, Olly From david.sainty at dtsp.co.nz Thu Jan 22 01:36:58 2009 From: david.sainty at dtsp.co.nz (David Sainty) Date: Thu, 22 Jan 2009 14:36:58 +1300 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <20090121083311.GL8027@survex.com> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> <496FCE8E.3080108@dtsp.co.nz> <20090119081749.GQ15292@survex.com> <49768F7C.7040304@dtsp.co.nz> <20090121042209.GJ8027@survex.com> <4976B347.4050009@dtsp.co.nz> <20090121083311.GL8027@survex.com> Message-ID: <4977CDBA.6060600@dtsp.co.nz> Hi Olly, > I've backported the fix now. If you're still up for it, try snapshot > 11827 or later of xapian-core from here (it hasn't yet built as I > write but should have within the hour): > > http://oligarchy.co.uk/xapian/branches/1.0/ > I tested this, it builds fine with no patching required. Thanks :) Dave From me.show at gmail.com Tue Jan 27 14:49:38 2009 From: me.show at gmail.com (Yann ROBIN) Date: Tue, 27 Jan 2009 15:49:38 +0100 Subject: [Xapian-devel] Segmentation fault in MSetIterator get_weight Message-ID: <1db5d7c40901270649u13efc302u122dd3831832106d@mail.gmail.com> Hi, I'm using xapian with c# and mono and i'm having a segfault in get_weight. When i print the index variable, the value is clearly too high. I think something write over it. Do you have any idea on how i could trace the beginning of the segmentation fault ? Thanks, -- Yann From me.show at gmail.com Tue Jan 27 16:52:24 2009 From: me.show at gmail.com (Yann ROBIN) Date: Tue, 27 Jan 2009 17:52:24 +0100 Subject: [Xapian-devel] Segmentation fault in MSetIterator get_weight In-Reply-To: <1db5d7c40901270649u13efc302u122dd3831832106d@mail.gmail.com> References: <1db5d7c40901270649u13efc302u122dd3831832106d@mail.gmail.com> Message-ID: <1db5d7c40901270852o398f756ckfc2ae48b20597c73@mail.gmail.com> So i found that it is due to the .Net Garbage Collector. For more information see : http://www.swig.org/Doc1.3/CSharp.html#csharp_memory_management_member_variables I join a new "util.i" for c# bindings with the needed correction. -- Yann -------------- next part -------------- %{ /* csharp/util.i: custom C# typemaps for xapian-bindings * * Copyright (c) 2005,2006,2008 Olly Betts * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as * published by the Free Software Foundation; either version 2 of the * License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 * USA */ #include // In C#, we don't get SWIG_exception in the generated C++ wrapper sources. #define XapianException(TYPE, MSG) SWIG_CSharpException(TYPE, (MSG).c_str()) %} // Use SWIG directors for C# wrappers. #define XAPIAN_SWIG_DIRECTORS // Rename function and method names to match C# conventions (e.g. from // get_description() to GetDescription()). %rename("%(camelcase)s",%$isfunction) ""; // Fix up API methods which aren't split by '_' on word boundaries. %rename("GetTermPos") get_termpos; %rename("GetTermFreq") get_termfreq; %rename("GetTermWeight") get_termweight; %rename("GetDocCount") get_doccount; %rename("GetDocId") get_docid; %rename("GetDocLength") get_doclength; %rename("GetDocumentId") get_document_id; %rename("PositionListBegin") positionlist_begin; %rename("PositionListEnd") positionlist_end; %rename("GetValueNo") get_valueno; %rename("TermListCount") termlist_count; %rename("TermListBegin") termlist_begin; %rename("TermListEnd") termlist_end; %rename("GetFirstItem") get_firstitem; %rename("GetSumPart") get_sumpart; %rename("GetMaxPart") get_maxpart; %rename("GetSumExtra") get_sumextra; %rename("GetMaxExtra") get_maxextra; %rename("GetSumPartNeedsDocLength") get_sumpart_needs_doclength; %rename("PostListBegin") postlist_begin; %rename("PostListEnd") postlist_end; %rename("AllTermsBegin") allterms_begin; %rename("AllTermsEnd") allterms_end; %rename("GetLastDocId") get_lastdocid; %rename("GetAvLength") get_avlength; %rename("StopListBegin") stoplist_begin; %rename("StopListEnd") stoplist_end; %rename("GetMSet") get_mset; %rename("GetESet") get_eset; %ignore ValueRangeProcessor::operator(); %inline { namespace Xapian { // Wrap Xapian::version_string as Xapian.Version.String() as C# can't have // functions outside a class and we don't want Xapian.Xapian.VersionString()! class Version { private: Version(); ~Version(); public: static const char * String() { return Xapian::version_string(); } static int Major() { return Xapian::major_version(); } static int Minor() { return Xapian::minor_version(); } static int Revision() { return Xapian::revision(); } }; } } namespace Xapian { %ignore version_string; %ignore major_version; %ignore minor_version; %ignore revision; %typemap(cscode) class MSetIterator %{ private MSet msetRef; internal void addReference(MSet mset) { msetRef = mset; } public static MSetIterator operator++(MSetIterator it) { return it.Next(); } public static MSetIterator operator--(MSetIterator it) { return it.Prev(); } public override bool Equals(object o) { return o is MSetIterator && Equals((MSetIterator)o); } public static bool operator==(MSetIterator a, MSetIterator b) { if ((object)a == (object)b) return true; if ((object)a == null || (object)b == null) return false; return a.Equals(b); } public static bool operator!=(MSetIterator a, MSetIterator b) { if ((object)a == (object)b) return false; if ((object)a == null || (object)b == null) return true; return !a.Equals(b); } // Implementing GetHashCode() to always return 0 is rather lame, but // using iterators as keys in a hash table would be rather strange. public override int GetHashCode() { return 0; } %} %typemap(cscode) ESetIterator %{ public static ESetIterator operator++(ESetIterator it) { return it.Next(); } public static ESetIterator operator--(ESetIterator it) { return it.Prev(); } public override bool Equals(object o) { return o is ESetIterator && Equals((ESetIterator)o); } public static bool operator==(ESetIterator a, ESetIterator b) { if ((object)a == (object)b) return true; if ((object)a == null || (object)b == null) return false; return a.Equals(b); } public static bool operator!=(ESetIterator a, ESetIterator b) { if ((object)a == (object)b) return false; if ((object)a == null || (object)b == null) return true; return !a.Equals(b); } // Implementing GetHashCode() to always return 0 is rather lame, but // using iterators as keys in a hash table would be rather strange. public override int GetHashCode() { return 0; } %} %typemap(cscode) TermIterator %{ public static TermIterator operator++(TermIterator it) { return it.Next(); } public override bool Equals(object o) { return o is TermIterator && Equals((TermIterator)o); } public static bool operator==(TermIterator a, TermIterator b) { if ((object)a == (object)b) return true; if ((object)a == null || (object)b == null) return false; return a.Equals(b); } public static bool operator!=(TermIterator a, TermIterator b) { if ((object)a == (object)b) return false; if ((object)a == null || (object)b == null) return true; return !a.Equals(b); } // Implementing GetHashCode() to always return 0 is rather lame, but // using iterators as keys in a hash table would be rather strange. public override int GetHashCode() { return 0; } %} %typemap(cscode) ValueIterator %{ public static ValueIterator operator++(ValueIterator it) { return it.Next(); } public override bool Equals(object o) { return o is ValueIterator && Equals((ValueIterator)o); } public static bool operator==(ValueIterator a, ValueIterator b) { if ((object)a == (object)b) return true; if ((object)a == null || (object)b == null) return false; return a.Equals(b); } public static bool operator!=(ValueIterator a, ValueIterator b) { if ((object)a == (object)b) return false; if ((object)a == null || (object)b == null) return true; return !a.Equals(b); } // Implementing GetHashCode() to always return 0 is rather lame, but // using iterators as keys in a hash table would be rather strange. public override int GetHashCode() { return 0; } %} %typemap(cscode) PostingIterator %{ public static PostingIterator operator++(PostingIterator it) { return it.Next(); } public override bool Equals(object o) { return o is PostingIterator && Equals((PostingIterator)o); } public static bool operator==(PostingIterator a, PostingIterator b) { if ((object)a == (object)b) return true; if ((object)a == null || (object)b == null) return false; return a.Equals(b); } public static bool operator!=(PostingIterator a, PostingIterator b) { if ((object)a == (object)b) return false; if ((object)a == null || (object)b == null) return true; return !a.Equals(b); } // Implementing GetHashCode() to always return 0 is rather lame, but // using iterators as keys in a hash table would be rather strange. public override int GetHashCode() { return 0; } %} %typemap(cscode) PositionIterator %{ public static PositionIterator operator++(PositionIterator it) { return it.Next(); } public override bool Equals(object o) { return o is PositionIterator && Equals((PositionIterator)o); } public static bool operator==(PositionIterator a, PositionIterator b) { if ((object)a == (object)b) return true; if ((object)a == null || (object)b == null) return false; return a.Equals(b); } public static bool operator!=(PositionIterator a, PositionIterator b) { if ((object)a == (object)b) return false; if ((object)a == null || (object)b == null) return true; return !a.Equals(b); } // Implementing GetHashCode() to always return 0 is rather lame, but // using iterators as keys in a hash table would be rather strange. public override int GetHashCode() { return 0; } %} %typemap(cscode) MSet %{ // Ensure that the GC doesn't collect any MSet instance set from C# private Enquire enquireRef; internal void addReference(Enquire enquire) { enquireRef = enquire; } %} // Add a C# reference to prevent premature garbage collection and resulting use // of dangling C++ pointer. Intended for methods that return pointers or // references to a member variable. %typemap(csout, excode=SWIGEXCODE) MSet get_mset(doccount first, doccount maxitems, const RSet *omrset, const MatchDecider *mdecider = 0) { IntPtr cPtr = $imcall;$excode $csclassname ret = null; if (cPtr != IntPtr.Zero) { ret = new $csclassname(cPtr, $owner); ret.addReference(this); } return ret; } %typemap(csout, excode=SWIGEXCODE) MSetIterator begin() { IntPtr cPtr = $imcall;$excode $csclassname ret = null; if (cPtr != IntPtr.Zero) { ret = new $csclassname(cPtr, $owner); ret.addReference(this); } return ret; } } /* vim:set syntax=cpp:set noexpandtab: */ From l.rieder at gmail.com Thu Jan 29 08:11:49 2009 From: l.rieder at gmail.com (Lukas Rieder) Date: Thu, 29 Jan 2009 09:11:49 +0100 Subject: [Xapian-devel] Xapian Ruby bindings do not implement full multi-value-sorting functionality? Message-ID: <11d847b60901290011k774ff8f2u641fe0e317f9b7c6@mail.gmail.com> Hello, this is a question that could be answered by collaborators of the Ruby bindings. Today I've played around with the Xapian::MultiValueSorter class. I've set everything up and then I tried following on an instance of Xapian::Enquire: : enquire = Xapian::Enquire.new(database) enquire.query = options[:query] : sorter = Xapian::MultiValueSorter.new sorter.add(0, true) sorter.add(1, true) : enquire.sort_by_key_then_relevance(sorter) : And it seems that there is no 'sort_by_key_then_relevane' method implented in Xapian::Enquire. The documentation tells me: http://www.xapian.org/docs/apidoc/html/classXapian_1_1Enquire.html#7c6c0c1f66bdeefbd09a0575584ba9b9 Is there a reason for this? How could it be implemented into the Ruby bindings? I've read the HACKING document that comes with the xapian-bindings but I've never used SWIG, I wasn't able to help myself. Any help is highly appreciated. Lukas Rieder -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xapian.org/pipermail/xapian-devel/attachments/20090129/07343bbc/attachment.htm From richard at lemurconsulting.com Thu Jan 29 09:35:33 2009 From: richard at lemurconsulting.com (Richard Boulton) Date: Thu, 29 Jan 2009 09:35:33 +0000 Subject: [Xapian-devel] Xapian Ruby bindings do not implement full multi-value-sorting functionality? In-Reply-To: <11d847b60901290011k774ff8f2u641fe0e317f9b7c6@mail.gmail.com> References: <11d847b60901290011k774ff8f2u641fe0e317f9b7c6@mail.gmail.com> Message-ID: <20090129093533.GA3861@meerkat> On Thu, Jan 29, 2009 at 09:11:49AM +0100, Lukas Rieder wrote: > this is a question that could be answered by collaborators of the Ruby > bindings. ... > enquire.sort_by_key_then_relevance(sorter) > > And it seems that there is no 'sort_by_key_then_relevane' method implented > in Xapian::Enquire. > Is there a reason for this? How could it be implemented into the Ruby > bindings? Which version of Xapian are you using? Enquire::set_sort_by_key_then_relevance() was added to the SWIG bindings on Nov 29th 2007, and included in releases of xapian from 1.0.5 onwards. I've not checked that it works in the Ruby bindings, but it's certainly present in the generated xapian_wrap.cc file for Ruby. -- Richard From olly at survex.com Tue Jan 6 03:47:24 2009 From: olly at survex.com (Olly Betts) Date: Tue, 6 Jan 2009 03:47:24 +0000 Subject: [Xapian-devel] NearPostList and get_wdf In-Reply-To: <1db5d7c40812290509w39cd275r13e8fc27a9519a@mail.gmail.com> References: <1db5d7c40812280701y2b522decm93d6de9091d34369@mail.gmail.com> <20081229125051.GA25008@meerkat> <1db5d7c40812290509w39cd275r13e8fc27a9519a@mail.gmail.com> Message-ID: <20090106034724.GG15292@survex.com> On Mon, Dec 29, 2008 at 02:09:14PM +0100, Yann ROBIN wrote: > On Mon, Dec 29, 2008 at 1:50 PM, Richard Boulton > wrote: > > I'm not sure that modifying the wdf is really the way to go about this - it > > seems to me that you might do better to use a custom weight class, which > > factored in the frequencies of the individual terms, as well as their > > proximity. You have to choose a weight class for the whole query - it can't be different for different subqueries. So I'm not sure how this would work. A sane approach would probably be in NewNearPostList::get_weight() to multiply the weight returned by the AND query's get_weight() method by a non-negative factor which varies depending how close the terms are - largest when they're together, much smaller when they are far apart. This will be slower to run than the current NearPostList though as it can't stop working on a document when it finds a match within the window size - instead it has to check all the positional data for each document matching the AND query to find the closest match. This factor needs to have a known upper bound, which you multiply get_maxweight() and recalc_maxweight() from the AND query by. > > Feel free to open a feature request ticket, describing the feature that you > > would like to exist. OP_NEAR as it is currently implemented is behaving as > > intended, though. > > The ticket was more for the get_wdf not being called, i don't think this was > something intended. Currently NearPostList::get_wdf() and friends are dead code - I think whoever wrote them probably didn't realise they wouldn't be needed. It's even possible that they were actually used in a really early version. But once the synonym patch gets merged, I think they'll get used if you do a synonym operation with OP_NEAR or OP_PHRASE as a subquery, so it seems unhelpful to rip them out at this point. Cheers, Olly From david.sainty at dtsp.co.nz Thu Jan 15 01:28:34 2009 From: david.sainty at dtsp.co.nz (David Sainty) Date: Thu, 15 Jan 2009 14:28:34 +1300 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 Message-ID: <496E9142.9020209@dtsp.co.nz> Hi, Under gcc 2.95 Xapian fails to build like so: g++ -DHAVE_CONFIG_H -I. -I./common -I./include -I/home/dsainty/not-backed-up/pkgsrc/textproc/xapian/work/.buildlink/include -Wall -W -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wno-long-long -Wformat-security -fno-gnu-keywords -Wundef -O2 -c queryparser/queryparser_internal.cc -Wp,-MD,queryparser/.deps/queryparser_internal.TPlo -fPIC -DPIC -o queryparser/.libs/queryparser_internal.o /data/home/olly/tmp/xapian-svn-snapshot/tags/1.0.10/xapian/xapian-core/queryparser/queryparser.lemony:25: queryparser_internal.h: No such file or directory /data/home/olly/tmp/xapian-svn-snapshot/tags/1.0.10/xapian/xapian-core/queryparser/queryparser.lemony:31: queryparser_token.h: No such file or directory *** Error code 1 The problem seems to be that the build system is relying on the compiler to imply -Iqueryparser when the source file is queryparser/queryparser_internal.cc. Modern gcc makes this implication, gcc 2.95 doesn't. queryparser/Makefile.mk has an almost-solution there already, but it's conditionally disabled. if VPATH_BUILD # We need this so that generated sources can find non-generated headers in a # VPATH build from SVN. INCLUDES += -I$(top_srcdir)/queryparser if MAINTAINER_MODE # We need this because otherwise, if depcomp is being used (as it will be for a # build with gcc-2.95), depcomp will be unable to find queryparser_token.h. # This may be a bug in depcomp, but it certainly happens with automake-1.10. INCLUDES += -I$(top_builddir)/queryparser endif endif If instead it unconditionally set: INCLUDES += -I$(top_builddir)/queryparser ... regardless of flags, that should fix the build on old GCC and perhaps some other non-GCC compilers, and would make it clearer what is happening for those people like me that still find the newer GCC behaviour a little odd :) (Please CC: email, I'm not on the list) Cheers, Dave From david.sainty at dtsp.co.nz Thu Jan 15 03:53:52 2009 From: david.sainty at dtsp.co.nz (David Sainty) Date: Thu, 15 Jan 2009 16:53:52 +1300 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <496E9142.9020209@dtsp.co.nz> References: <496E9142.9020209@dtsp.co.nz> Message-ID: <496EB350.7050105@dtsp.co.nz> David Sainty wrote: > Hi, > > Under gcc 2.95 Xapian fails to build like so: I can confirm that the attached patch fixes the build under gcc 2.95 (after an automake). Cheers, Dave -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: xapian-gcc-295 URL: From olly at survex.com Thu Jan 15 11:39:43 2009 From: olly at survex.com (Olly Betts) Date: Thu, 15 Jan 2009 11:39:43 +0000 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <496EB350.7050105@dtsp.co.nz> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> Message-ID: <20090115113943.GN15292@survex.com> On Thu, Jan 15, 2009 at 04:53:52PM +1300, David Sainty wrote: > David Sainty wrote: > >Under gcc 2.95 Xapian fails to build like so: > > I can confirm that the attached patch fixes the build under gcc 2.95 > (after an automake). Thanks for the patch. But it seems there's something odd going on, as other subdirectories also include headers from the same directory without an explicit -I. The files here are generated, but that shouldn't make a difference as they are shipped in the tarball and it appears you're building from the 1.0.10 source tarball. Perhaps the issue is the "#line" directives with full paths in queryparser_internal.cc - if GCC 2.95 resolves header includes relative to the filename given by "#line" then that would cause this problem. Could you try: perl -pi -e 's/^#line.*//' queryparser/queryparser_internal.cc And then building without your patch. (Unfortunately I no longer have access to GCC 2.95 to test this myself). Another question - what's the reason for using GCC 2.95? We came quite closing to dropping support for GCC < 3 a while back (but instead we ended up requiring 2.95.3 which added ). But I'd assumed that 2.95 was probably no longer used by now, and there are a few minor issues we currently work around to keep support for it, so if there are still people using it I'm interested as to why. Cheers, Olly From david.sainty at dtsp.co.nz Fri Jan 16 00:02:22 2009 From: david.sainty at dtsp.co.nz (David Sainty) Date: Fri, 16 Jan 2009 13:02:22 +1300 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <20090115113943.GN15292@survex.com> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> Message-ID: <496FCE8E.3080108@dtsp.co.nz> Olly Betts wrote: > On Thu, Jan 15, 2009 at 04:53:52PM +1300, David Sainty wrote: > >> David Sainty wrote: >> >>> Under gcc 2.95 Xapian fails to build like so: >>> >> I can confirm that the attached patch fixes the build under gcc 2.95 >> (after an automake). >> > > Thanks for the patch. > > But it seems there's something odd going on, as other subdirectories > also include headers from the same directory without an explicit -I. > The files here are generated, but that shouldn't make a difference as > they are shipped in the tarball and it appears you're building from the > 1.0.10 source tarball. > > Perhaps the issue is the "#line" directives with full paths in > queryparser_internal.cc - if GCC 2.95 resolves header includes relative > to the filename given by "#line" then that would cause this problem. > > Could you try: > > perl -pi -e 's/^#line.*//' queryparser/queryparser_internal.cc > Huh, mighty good guessing! Yeah, that also fixes the build (without the patch). Cleaning out the "#line"s is a good thing, but I'm not totally sure it should replace the patch? (Since files are being included unqualified from "queryparser/" I think it deserves being in INCLUDES?) I don't think gcc's behaviour here is universally true of all compilers (implying the source directory as an include path entry), but in saying that I'm not sure of a counterexample either. Obviously 2.95(.4) has the required behaviour in some form, but is confused by the #line lines. > (Unfortunately I no longer have access to GCC 2.95 to test this myself). > And nor should you :) I was taken aback when I noticed what version I had to work with too :) > Another question - what's the reason for using GCC 2.95? > > We came quite closing to dropping support for GCC < 3 a while back (but > instead we ended up requiring 2.95.3 which added ). But I'd > assumed that 2.95 was probably no longer used by now, and there are a > few minor issues we currently work around to keep support for it, so if > there are still people using it I'm interested as to why. > Yeah, it's in use on some old systems that are long long overdue for updates. A separate project is working on that, but it's not a variable I can control. Essentially it's the usual reasons - the more important the server the harder it is to regularly maintain it :) Cheers, Dave From towel77 at gmail.com Fri Jan 16 23:34:11 2009 From: towel77 at gmail.com (towel moist) Date: Fri, 16 Jan 2009 15:34:11 -0800 Subject: [Xapian-devel] chert vs flint vs lucene Message-ID: <398117540901161534x1315aeb5u86b8fb0c307e50dc@mail.gmail.com> Hi, What's the main difference between chert and flint? What above vs lucene? I am mainly asking about data structure (lexicon, posting list, document data), what's in memory, what's on disk, hash vs b-tree and reasons behind them. Any pointer is appreciated. Thanks! Crystal -------------- next part -------------- An HTML attachment was scrubbed... URL: From olly at survex.com Mon Jan 19 08:17:49 2009 From: olly at survex.com (Olly Betts) Date: Mon, 19 Jan 2009 08:17:49 +0000 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <496FCE8E.3080108@dtsp.co.nz> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> <496FCE8E.3080108@dtsp.co.nz> Message-ID: <20090119081749.GQ15292@survex.com> On Fri, Jan 16, 2009 at 01:02:22PM +1300, David Sainty wrote: > I don't think gcc's behaviour here is universally true of all compilers > (implying the source directory as an include path entry), but in saying > that I'm not sure of a counterexample either. I've successfully compiled Xapian with quite a few different compilers without encountering problems with this (at least GCC, Intel, Sun, HP, SGI). My understanding is that #include with "" implicitly adds the source directory to the search path (whereas #include with <> doesn't). I'm reluctant to start coding around behaviour which compilers *might* have, as that's a very open-ended list. But if anyone has actual evidence of a compiler which doesn't behave this way, we probably need to explicitly add -I options for several other subdirectories which rely on this behaviour. > Obviously 2.95(.4) has the required behaviour in some form, but is > confused by the #line lines. My guess is that this is because it uses a separate preprocessor and relies on "#line" in the preprocessor output to tell the compiler the filename of the source file. GCC 2.95 is adding the source directory to the search path but is confused as to what the source directory is. This is the patch I've actually applied to trunk, which fixes up "#line" directives rather than nuking them (depcomp parses preprocessor output for "#line" so we no longer need that workaround): http://trac.xapian.org/changeset/11823/trunk/xapian-core/queryparser/Makefile.mk?format=diff&new=11823 I'd be grateful if you could try this (it should apply to 1.0.10 cleanly). If it works I'll backport it for 1.0.11. > Yeah, it's in use on some old systems that are long long overdue for > updates. A separate project is working on that, but it's not a variable > I can control. Essentially it's the usual reasons - the more important > the server the harder it is to regularly maintain it :) OK. I guess that's a minor argument for keeping GCC 2.95 support, though it's problematic that we aren't regularly testing it. At some point we're going to have to just start telling people to upgrade. Cheers, Olly From olly at survex.com Mon Jan 19 11:33:11 2009 From: olly at survex.com (Olly Betts) Date: Mon, 19 Jan 2009 11:33:11 +0000 Subject: [Xapian-devel] chert vs flint vs lucene In-Reply-To: <398117540901161534x1315aeb5u86b8fb0c307e50dc@mail.gmail.com> References: <398117540901161534x1315aeb5u86b8fb0c307e50dc@mail.gmail.com> Message-ID: <20090119113311.GW15292@survex.com> On Fri, Jan 16, 2009 at 03:34:11PM -0800, towel moist wrote: > What's the main difference between chert and flint? What above vs lucene? Flint is documented here: http://trac.xapian.org/wiki/FlintBackend The user-visible change in Chert are covered in here (to find them, search for "chert backend:"): http://trac.xapian.org/browser/trunk/xapian-core/NEWS That should be complete at present, but there are likely to be further changes before chert is declared "finished". I don't know of any comparisons with Lucene's low level details. Cheers, Olly From david.sainty at dtsp.co.nz Wed Jan 21 02:59:08 2009 From: david.sainty at dtsp.co.nz (David Sainty) Date: Wed, 21 Jan 2009 15:59:08 +1300 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <20090119081749.GQ15292@survex.com> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> <496FCE8E.3080108@dtsp.co.nz> <20090119081749.GQ15292@survex.com> Message-ID: <49768F7C.7040304@dtsp.co.nz> Hi Olly, > This is the patch I've actually applied to trunk, which fixes up "#line" > directives rather than nuking them (depcomp parses preprocessor output > for "#line" so we no longer need that workaround): > > http://trac.xapian.org/changeset/11823/trunk/xapian-core/queryparser/Makefile.mk?format=diff&new=11823 > > I'd be grateful if you could try this (it should apply to 1.0.10 cleanly). > If it works I'll backport it for 1.0.11. > It took me a while to get the maintainer mode tools together to get the patch to have an effect. I've tried a fresh build in maintainer mode with and without this patch. The bad news is that it builds with both :) The reason is that in maintainer mode the #line entries are valid anyway (oddly I don't get the fully qualified paths [without the patch] that you get when building the tarball - something different about how you kick off the build). I think your patch is a good fix (and gets rid of your machine's build paths in the distribtion :). But my test isn't perfect - though you do at least know that the build works after the with-patch manipulation. If you want to verify for sure I guess you need to build queryparser_internal.cc as you would for a distribution tar, and then I can do a non-maintainer build in this environment with that file and see if it completes. Cheers, Dave From olly at survex.com Wed Jan 21 04:22:09 2009 From: olly at survex.com (Olly Betts) Date: Wed, 21 Jan 2009 04:22:09 +0000 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <49768F7C.7040304@dtsp.co.nz> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> <496FCE8E.3080108@dtsp.co.nz> <20090119081749.GQ15292@survex.com> <49768F7C.7040304@dtsp.co.nz> Message-ID: <20090121042209.GJ8027@survex.com> On Wed, Jan 21, 2009 at 03:59:08PM +1300, David Sainty wrote: > It took me a while to get the maintainer mode tools together to get the > patch to have an effect. I've tried a fresh build in maintainer mode > with and without this patch. The bad news is that it builds with both :) > The reason is that in maintainer mode the #line entries are valid anyway > (oddly I don't get the fully qualified paths [without the patch] that > you get when building the tarball - something different about how you > kick off the build). Ah, sorry about this. I should have thought things through more. The script which builds releases and snapshots does them with builddir != srcdir (sometimes called a VPATH build) - that's the difference you're seeing. With builddir = srcdir you used to get stuff like this (with an extra harmless './'): #line 1234 "./queryparser/queryparser.lemony" Thanks for your efforts though - much appreciated, and they do give me extra confidence in the fix. > I think your patch is a good fix (and gets rid of your machine's build > paths in the distribtion :). Yes, that's certainly an improvement (though there are other places where these leak in still). > But my test isn't perfect - though you do > at least know that the build works after the with-patch manipulation. If > you want to verify for sure I guess you need to build > queryparser_internal.cc as you would for a distribution tar, and then I > can do a non-maintainer build in this environment with that file and see > if it completes. The simplest test would be to just use the version from an SVN trunk snapshot - I've extracted one here: http://oligarchy.co.uk/xapian/patches/queryparser_internal.cc Cheers, Olly From david.sainty at dtsp.co.nz Wed Jan 21 05:31:51 2009 From: david.sainty at dtsp.co.nz (David Sainty) Date: Wed, 21 Jan 2009 18:31:51 +1300 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <20090121042209.GJ8027@survex.com> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> <496FCE8E.3080108@dtsp.co.nz> <20090119081749.GQ15292@survex.com> <49768F7C.7040304@dtsp.co.nz> <20090121042209.GJ8027@survex.com> Message-ID: <4976B347.4050009@dtsp.co.nz> > The simplest test would be to just use the version from an SVN trunk > snapshot - I've extracted one here: > > http://oligarchy.co.uk/xapian/patches/queryparser_internal.cc > It looks like things have changed since 1.0.10, I get build errors with this for undefined QUERYPARSER and LOGLINE when I drop it in. The diff looks "different" too :) From olly at survex.com Wed Jan 21 08:33:11 2009 From: olly at survex.com (Olly Betts) Date: Wed, 21 Jan 2009 08:33:11 +0000 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <4976B347.4050009@dtsp.co.nz> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> <496FCE8E.3080108@dtsp.co.nz> <20090119081749.GQ15292@survex.com> <49768F7C.7040304@dtsp.co.nz> <20090121042209.GJ8027@survex.com> <4976B347.4050009@dtsp.co.nz> Message-ID: <20090121083311.GL8027@survex.com> On Wed, Jan 21, 2009 at 06:31:51PM +1300, David Sainty wrote: > It looks like things have changed since 1.0.10, I get build errors with > this for undefined QUERYPARSER and LOGLINE when I drop it in. The diff > looks "different" too :) Ah yes. I had done a quick diff, but only noticed a lot of changes due to using a newer version of lemon, but the debug logging macros have also changed. I've backported the fix now. If you're still up for it, try snapshot 11827 or later of xapian-core from here (it hasn't yet built as I write but should have within the hour): http://oligarchy.co.uk/xapian/branches/1.0/ Cheers, Olly From l.rieder at gmail.com Wed Jan 21 16:04:49 2009 From: l.rieder at gmail.com (Lukas Rieder) Date: Wed, 21 Jan 2009 17:04:49 +0100 Subject: [Xapian-devel] Xapian Ruby bindings as a Ruby Gem Message-ID: <11d847b60901210804o83f7d7dl4d2098166112ba22@mail.gmail.com> Hello everyone, I'm very interested in packaging the Xapian Ruby bindings as a gem for easier installation and integration into Ruby projects. I came to this idea, because many Ruby developers face the problem of installing xapian under hosting service where they have no root rights. And Xapian is THE essential part of the Rails plugin acts_as_xapian, wich brings all the Xapian features to the Rails world. This plugin is maintained by me, Francis the creator has stopped development due to the lack of time. Could you please tell me who is the maintainer of the Xapian Ruby bindings? Is there such a project already? Is someone interested in working on this idea? Thank you very much, Lukas Rieder -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard at lemurconsulting.com Wed Jan 21 16:14:20 2009 From: richard at lemurconsulting.com (Richard Boulton) Date: Wed, 21 Jan 2009 16:14:20 +0000 Subject: [Xapian-devel] Xapian Ruby bindings as a Ruby Gem In-Reply-To: <11d847b60901210804o83f7d7dl4d2098166112ba22@mail.gmail.com> References: <11d847b60901210804o83f7d7dl4d2098166112ba22@mail.gmail.com> Message-ID: <20090121161420.GA15317@meerkat> On Wed, Jan 21, 2009 at 05:04:49PM +0100, Lukas Rieder wrote: > Could you please tell me who is the maintainer of the Xapian Ruby bindings? Currently, there isn't an "offical Ruby maintainer" - that is, someone dedicated to improving the Ruby bindings and fixing problems with them. Olly is probably the closest thing to such a maintainer. The Ruby bindings are produced in xapian-bindings using the same SWIG framework that the python, C#, php, etc bindings are produced from. Olly occasionally applies patches to xapian-bindings to fix problems with these, but as far as I know, he's not using Ruby in earnest. > Is there such a project already? > Is someone interested in working on this idea? Not that I'm aware of. Assuming a ruby gem is somewhat like a java jar, or a python egg, in which case this sounds like a useful project. -- Richard From james-xapian at tartarus.org Wed Jan 21 16:16:41 2009 From: james-xapian at tartarus.org (James Aylett) Date: Wed, 21 Jan 2009 16:16:41 +0000 Subject: [Xapian-devel] Xapian Ruby bindings as a Ruby Gem In-Reply-To: <20090121161420.GA15317@meerkat> References: <11d847b60901210804o83f7d7dl4d2098166112ba22@mail.gmail.com> <20090121161420.GA15317@meerkat> Message-ID: <20090121161641.GF2354@tartarus.org> On Wed, Jan 21, 2009 at 04:14:20PM +0000, Richard Boulton wrote: > Assuming a ruby gem is somewhat like a java jar, or a python egg, in > which case this sounds like a useful project. It's a bit more like a ports package, I believe, in that it will build from source (including bundled libraries). So a gem of the Xapian bindings should be able to work with an installed Xapian, or build its own, depending on what it finds. (Certainly that's how things like the ferret gem worked last time I used it.) J -- James Aylett talktorex.co.uk - xapian.org - uncertaintydivision.org From olly at survex.com Wed Jan 21 16:23:14 2009 From: olly at survex.com (Olly Betts) Date: Wed, 21 Jan 2009 16:23:14 +0000 Subject: [Xapian-devel] Xapian Ruby bindings as a Ruby Gem In-Reply-To: <20090121161420.GA15317@meerkat> References: <11d847b60901210804o83f7d7dl4d2098166112ba22@mail.gmail.com> <20090121161420.GA15317@meerkat> Message-ID: <20090121162313.GJ15292@survex.com> On Wed, Jan 21, 2009 at 04:14:20PM +0000, Richard Boulton wrote: > On Wed, Jan 21, 2009 at 05:04:49PM +0100, Lukas Rieder wrote: > > Could you please tell me who is the maintainer of the Xapian Ruby bindings? > > Currently, there isn't an "offical Ruby maintainer" - that is, someone > dedicated to improving the Ruby bindings and fixing problems with them. > > Olly is probably the closest thing to such a maintainer. The Ruby bindings > are produced in xapian-bindings using the same SWIG framework that the > python, C#, php, etc bindings are produced from. Olly occasionally applies > patches to xapian-bindings to fix problems with these, but as far as I > know, he's not using Ruby in earnest. I don't use Ruby except for the work I've done on the bindings. I'd be delighted if someone better versed in the language wanted to take over. There was discussion of a Ruby Gem in the past - see here, and the earlier thread linked to from there: http://thread.gmane.org/gmane.comp.search.xapian.general/6278 Cheers, Olly From towel77 at gmail.com Wed Jan 21 20:24:44 2009 From: towel77 at gmail.com (towel moist) Date: Wed, 21 Jan 2009 12:24:44 -0800 Subject: [Xapian-devel] chert vs flint vs lucene In-Reply-To: <20090119113311.GW15292@survex.com> References: <398117540901161534x1315aeb5u86b8fb0c307e50dc@mail.gmail.com> <20090119113311.GW15292@survex.com> Message-ID: <398117540901211224l2be0c523va601a1506c30fc96@mail.gmail.com> Thanks for the pointers. I tried to look for runtime query serving performance numbers to no avail. What is the latency range for random queries at a given rate (say 500 QPS) to a index built at a certain (say 5M docs) size, assuming not too many concurrent connections? I am expecting sub-second but just curious whether it's more 500ms or 100-200ms range. Thanks! Crystal On Mon, Jan 19, 2009 at 3:33 AM, Olly Betts wrote: > On Fri, Jan 16, 2009 at 03:34:11PM -0800, towel moist wrote: > > What's the main difference between chert and flint? What above vs > lucene? > > Flint is documented here: > > http://trac.xapian.org/wiki/FlintBackend > > The user-visible change in Chert are covered in here (to find them, > search for "chert backend:"): > > http://trac.xapian.org/browser/trunk/xapian-core/NEWS > > That should be complete at present, but there are likely to be further > changes before chert is declared "finished". > > I don't know of any comparisons with Lucene's low level details. > > Cheers, > Olly > -------------- next part -------------- An HTML attachment was scrubbed... URL: From l.rieder at gmail.com Wed Jan 21 21:13:17 2009 From: l.rieder at gmail.com (Lukas Rieder) Date: Wed, 21 Jan 2009 22:13:17 +0100 Subject: [Xapian-devel] Xapian Ruby bindings as a Ruby Gem In-Reply-To: <20090121162313.GJ15292@survex.com> References: <11d847b60901210804o83f7d7dl4d2098166112ba22@mail.gmail.com> <20090121161420.GA15317@meerkat> <20090121162313.GJ15292@survex.com> Message-ID: <11d847b60901211313l36c28d3cr3cb832241efaf534@mail.gmail.com> Hy, everybody thank you for your reply. This is really kind of funny, because Francis (the guy who asked last year) was the maintainer of the plugin I am leading at the moment. Hehe, that's how things go, and the idea isn't a new one. Ok, I've never written a Ruby Gem but I'll ask some people and maybe someone is interested plus has the knowledge to do so. Maybe (and hopefully) I can organize something to build and maintain a Gem version of Xapian and its bindings. But it is very good to know who is in for what, and I'll definitely contact Olly again for that reason. I shall come back here soon ;) In the meantime, have fun and keep up your good work! Cheers, Lukas 2009/1/21 Olly Betts > On Wed, Jan 21, 2009 at 04:14:20PM +0000, Richard Boulton wrote: > > On Wed, Jan 21, 2009 at 05:04:49PM +0100, Lukas Rieder wrote: > > > Could you please tell me who is the maintainer of the Xapian Ruby > bindings? > > > > Currently, there isn't an "offical Ruby maintainer" - that is, someone > > dedicated to improving the Ruby bindings and fixing problems with them. > > > > Olly is probably the closest thing to such a maintainer. The Ruby > bindings > > are produced in xapian-bindings using the same SWIG framework that the > > python, C#, php, etc bindings are produced from. Olly occasionally > applies > > patches to xapian-bindings to fix problems with these, but as far as I > > know, he's not using Ruby in earnest. > > I don't use Ruby except for the work I've done on the bindings. I'd be > delighted if someone better versed in the language wanted to take over. > > There was discussion of a Ruby Gem in the past - see here, and the > earlier thread linked to from there: > > http://thread.gmane.org/gmane.comp.search.xapian.general/6278 > > Cheers, > Olly > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olly at survex.com Thu Jan 22 00:29:14 2009 From: olly at survex.com (Olly Betts) Date: Thu, 22 Jan 2009 00:29:14 +0000 Subject: [Xapian-devel] chert vs flint vs lucene In-Reply-To: <398117540901211224l2be0c523va601a1506c30fc96@mail.gmail.com> References: <398117540901161534x1315aeb5u86b8fb0c307e50dc@mail.gmail.com> <20090119113311.GW15292@survex.com> <398117540901211224l2be0c523va601a1506c30fc96@mail.gmail.com> Message-ID: <20090122002914.GO6366@survex.com> On Wed, Jan 21, 2009 at 12:24:44PM -0800, towel moist wrote: > I tried to look for runtime query serving performance numbers to no avail. > What is the latency range for random queries at a given rate (say 500 QPS) > to a index built at a certain (say 5M docs) size, assuming not too many > concurrent connections? > > I am expecting sub-second but just curious whether it's more 500ms or > 100-200ms range. I'm not sure I can usefully answer a question like this - it will depend significantly on the nature of the data, the nature of the queries, and the hardware specs. The best way to get a feel for it is to build a prototype with realistic data and queries and see. BTW, be careful of benchmarking with "random" queries - here's an example where the picture is very different when you look at queries using words from the documents vs "nonsense" queries, few of which actually match any documents: http://tag1consulting.com/Comparing_Xapian_and_Drupal5_Core_Search That's a fairly extreme case, but even pulling random words from the vocabulary won't produce representative queries and may skew results. Ideally you want to replay query logs from actual users searching over the same database, but unfortunately you rarely have those when you start developing a system. Cheers, Olly From david.sainty at dtsp.co.nz Thu Jan 22 01:36:58 2009 From: david.sainty at dtsp.co.nz (David Sainty) Date: Thu, 22 Jan 2009 14:36:58 +1300 Subject: [Xapian-devel] Xapian core build failure under gcc 2.95 In-Reply-To: <20090121083311.GL8027@survex.com> References: <496E9142.9020209@dtsp.co.nz> <496EB350.7050105@dtsp.co.nz> <20090115113943.GN15292@survex.com> <496FCE8E.3080108@dtsp.co.nz> <20090119081749.GQ15292@survex.com> <49768F7C.7040304@dtsp.co.nz> <20090121042209.GJ8027@survex.com> <4976B347.4050009@dtsp.co.nz> <20090121083311.GL8027@survex.com> Message-ID: <4977CDBA.6060600@dtsp.co.nz> Hi Olly, > I've backported the fix now. If you're still up for it, try snapshot > 11827 or later of xapian-core from here (it hasn't yet built as I > write but should have within the hour): > > http://oligarchy.co.uk/xapian/branches/1.0/ > I tested this, it builds fine with no patching required. Thanks :) Dave From me.show at gmail.com Tue Jan 27 14:49:38 2009 From: me.show at gmail.com (Yann ROBIN) Date: Tue, 27 Jan 2009 15:49:38 +0100 Subject: [Xapian-devel] Segmentation fault in MSetIterator get_weight Message-ID: <1db5d7c40901270649u13efc302u122dd3831832106d@mail.gmail.com> Hi, I'm using xapian with c# and mono and i'm having a segfault in get_weight. When i print the index variable, the value is clearly too high. I think something write over it. Do you have any idea on how i could trace the beginning of the segmentation fault ? Thanks, -- Yann From me.show at gmail.com Tue Jan 27 16:52:24 2009 From: me.show at gmail.com (Yann ROBIN) Date: Tue, 27 Jan 2009 17:52:24 +0100 Subject: [Xapian-devel] Segmentation fault in MSetIterator get_weight In-Reply-To: <1db5d7c40901270649u13efc302u122dd3831832106d@mail.gmail.com> References: <1db5d7c40901270649u13efc302u122dd3831832106d@mail.gmail.com> Message-ID: <1db5d7c40901270852o398f756ckfc2ae48b20597c73@mail.gmail.com> So i found that it is due to the .Net Garbage Collector. For more information see : http://www.swig.org/Doc1.3/CSharp.html#csharp_memory_management_member_variables I join a new "util.i" for c# bindings with the needed correction. -- Yann -------------- next part -------------- %{ /* csharp/util.i: custom C# typemaps for xapian-bindings * * Copyright (c) 2005,2006,2008 Olly Betts * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as * published by the Free Software Foundation; either version 2 of the * License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 * USA */ #include // In C#, we don't get SWIG_exception in the generated C++ wrapper sources. #define XapianException(TYPE, MSG) SWIG_CSharpException(TYPE, (MSG).c_str()) %} // Use SWIG directors for C# wrappers. #define XAPIAN_SWIG_DIRECTORS // Rename function and method names to match C# conventions (e.g. from // get_description() to GetDescription()). %rename("%(camelcase)s",%$isfunction) ""; // Fix up API methods which aren't split by '_' on word boundaries. %rename("GetTermPos") get_termpos; %rename("GetTermFreq") get_termfreq; %rename("GetTermWeight") get_termweight; %rename("GetDocCount") get_doccount; %rename("GetDocId") get_docid; %rename("GetDocLength") get_doclength; %rename("GetDocumentId") get_document_id; %rename("PositionListBegin") positionlist_begin; %rename("PositionListEnd") positionlist_end; %rename("GetValueNo") get_valueno; %rename("TermListCount") termlist_count; %rename("TermListBegin") termlist_begin; %rename("TermListEnd") termlist_end; %rename("GetFirstItem") get_firstitem; %rename("GetSumPart") get_sumpart; %rename("GetMaxPart") get_maxpart; %rename("GetSumExtra") get_sumextra; %rename("GetMaxExtra") get_maxextra; %rename("GetSumPartNeedsDocLength") get_sumpart_needs_doclength; %rename("PostListBegin") postlist_begin; %rename("PostListEnd") postlist_end; %rename("AllTermsBegin") allterms_begin; %rename("AllTermsEnd") allterms_end; %rename("GetLastDocId") get_lastdocid; %rename("GetAvLength") get_avlength; %rename("StopListBegin") stoplist_begin; %rename("StopListEnd") stoplist_end; %rename("GetMSet") get_mset; %rename("GetESet") get_eset; %ignore ValueRangeProcessor::operator(); %inline { namespace Xapian { // Wrap Xapian::version_string as Xapian.Version.String() as C# can't have // functions outside a class and we don't want Xapian.Xapian.VersionString()! class Version { private: Version(); ~Version(); public: static const char * String() { return Xapian::version_string(); } static int Major() { return Xapian::major_version(); } static int Minor() { return Xapian::minor_version(); } static int Revision() { return Xapian::revision(); } }; } } namespace Xapian { %ignore version_string; %ignore major_version; %ignore minor_version; %ignore revision; %typemap(cscode) class MSetIterator %{ private MSet msetRef; internal void addReference(MSet mset) { msetRef = mset; } public static MSetIterator operator++(MSetIterator it) { return it.Next(); } public static MSetIterator operator--(MSetIterator it) { return it.Prev(); } public override bool Equals(object o) { return o is MSetIterator && Equals((MSetIterator)o); } public static bool operator==(MSetIterator a, MSetIterator b) { if ((object)a == (object)b) return true; if ((object)a == null || (object)b == null) return false; return a.Equals(b); } public static bool operator!=(MSetIterator a, MSetIterator b) { if ((object)a == (object)b) return false; if ((object)a == null || (object)b == null) return true; return !a.Equals(b); } // Implementing GetHashCode() to always return 0 is rather lame, but // using iterators as keys in a hash table would be rather strange. public override int GetHashCode() { return 0; } %} %typemap(cscode) ESetIterator %{ public static ESetIterator operator++(ESetIterator it) { return it.Next(); } public static ESetIterator operator--(ESetIterator it) { return it.Prev(); } public override bool Equals(object o) { return o is ESetIterator && Equals((ESetIterator)o); } public static bool operator==(ESetIterator a, ESetIterator b) { if ((object)a == (object)b) return true; if ((object)a == null || (object)b == null) return false; return a.Equals(b); } public static bool operator!=(ESetIterator a, ESetIterator b) { if ((object)a == (object)b) return false; if ((object)a == null || (object)b == null) return true; return !a.Equals(b); } // Implementing GetHashCode() to always return 0 is rather lame, but // using iterators as keys in a hash table would be rather strange. public override int GetHashCode() { return 0; } %} %typemap(cscode) TermIterator %{ public static TermIterator operator++(TermIterator it) { return it.Next(); } public override bool Equals(object o) { return o is TermIterator && Equals((TermIterator)o); } public static bool operator==(TermIterator a, TermIterator b) { if ((object)a == (object)b) return true; if ((object)a == null || (object)b == null) return false; return a.Equals(b); } public static bool operator!=(TermIterator a, TermIterator b) { if ((object)a == (object)b) return false; if ((object)a == null || (object)b == null) return true; return !a.Equals(b); } // Implementing GetHashCode() to always return 0 is rather lame, but // using iterators as keys in a hash table would be rather strange. public override int GetHashCode() { return 0; } %} %typemap(cscode) ValueIterator %{ public static ValueIterator operator++(ValueIterator it) { return it.Next(); } public override bool Equals(object o) { return o is ValueIterator && Equals((ValueIterator)o); } public static bool operator==(ValueIterator a, ValueIterator b) { if ((object)a == (object)b) return true; if ((object)a == null || (object)b == null) return false; return a.Equals(b); } public static bool operator!=(ValueIterator a, ValueIterator b) { if ((object)a == (object)b) return false; if ((object)a == null || (object)b == null) return true; return !a.Equals(b); } // Implementing GetHashCode() to always return 0 is rather lame, but // using iterators as keys in a hash table would be rather strange. public override int GetHashCode() { return 0; } %} %typemap(cscode) PostingIterator %{ public static PostingIterator operator++(PostingIterator it) { return it.Next(); } public override bool Equals(object o) { return o is PostingIterator && Equals((PostingIterator)o); } public static bool operator==(PostingIterator a, PostingIterator b) { if ((object)a == (object)b) return true; if ((object)a == null || (object)b == null) return false; return a.Equals(b); } public static bool operator!=(PostingIterator a, PostingIterator b) { if ((object)a == (object)b) return false; if ((object)a == null || (object)b == null) return true; return !a.Equals(b); } // Implementing GetHashCode() to always return 0 is rather lame, but // using iterators as keys in a hash table would be rather strange. public override int GetHashCode() { return 0; } %} %typemap(cscode) PositionIterator %{ public static PositionIterator operator++(PositionIterator it) { return it.Next(); } public override bool Equals(object o) { return o is PositionIterator && Equals((PositionIterator)o); } public static bool operator==(PositionIterator a, PositionIterator b) { if ((object)a == (object)b) return true; if ((object)a == null || (object)b == null) return false; return a.Equals(b); } public static bool operator!=(PositionIterator a, PositionIterator b) { if ((object)a == (object)b) return false; if ((object)a == null || (object)b == null) return true; return !a.Equals(b); } // Implementing GetHashCode() to always return 0 is rather lame, but // using iterators as keys in a hash table would be rather strange. public override int GetHashCode() { return 0; } %} %typemap(cscode) MSet %{ // Ensure that the GC doesn't collect any MSet instance set from C# private Enquire enquireRef; internal void addReference(Enquire enquire) { enquireRef = enquire; } %} // Add a C# reference to prevent premature garbage collection and resulting use // of dangling C++ pointer. Intended for methods that return pointers or // references to a member variable. %typemap(csout, excode=SWIGEXCODE) MSet get_mset(doccount first, doccount maxitems, const RSet *omrset, const MatchDecider *mdecider = 0) { IntPtr cPtr = $imcall;$excode $csclassname ret = null; if (cPtr != IntPtr.Zero) { ret = new $csclassname(cPtr, $owner); ret.addReference(this); } return ret; } %typemap(csout, excode=SWIGEXCODE) MSetIterator begin() { IntPtr cPtr = $imcall;$excode $csclassname ret = null; if (cPtr != IntPtr.Zero) { ret = new $csclassname(cPtr, $owner); ret.addReference(this); } return ret; } } /* vim:set syntax=cpp:set noexpandtab: */ From l.rieder at gmail.com Thu Jan 29 08:11:49 2009 From: l.rieder at gmail.com (Lukas Rieder) Date: Thu, 29 Jan 2009 09:11:49 +0100 Subject: [Xapian-devel] Xapian Ruby bindings do not implement full multi-value-sorting functionality? Message-ID: <11d847b60901290011k774ff8f2u641fe0e317f9b7c6@mail.gmail.com> Hello, this is a question that could be answered by collaborators of the Ruby bindings. Today I've played around with the Xapian::MultiValueSorter class. I've set everything up and then I tried following on an instance of Xapian::Enquire: : enquire = Xapian::Enquire.new(database) enquire.query = options[:query] : sorter = Xapian::MultiValueSorter.new sorter.add(0, true) sorter.add(1, true) : enquire.sort_by_key_then_relevance(sorter) : And it seems that there is no 'sort_by_key_then_relevane' method implented in Xapian::Enquire. The documentation tells me: http://www.xapian.org/docs/apidoc/html/classXapian_1_1Enquire.html#7c6c0c1f66bdeefbd09a0575584ba9b9 Is there a reason for this? How could it be implemented into the Ruby bindings? I've read the HACKING document that comes with the xapian-bindings but I've never used SWIG, I wasn't able to help myself. Any help is highly appreciated. Lukas Rieder -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard at lemurconsulting.com Thu Jan 29 09:35:33 2009 From: richard at lemurconsulting.com (Richard Boulton) Date: Thu, 29 Jan 2009 09:35:33 +0000 Subject: [Xapian-devel] Xapian Ruby bindings do not implement full multi-value-sorting functionality? In-Reply-To: <11d847b60901290011k774ff8f2u641fe0e317f9b7c6@mail.gmail.com> References: <11d847b60901290011k774ff8f2u641fe0e317f9b7c6@mail.gmail.com> Message-ID: <20090129093533.GA3861@meerkat> On Thu, Jan 29, 2009 at 09:11:49AM +0100, Lukas Rieder wrote: > this is a question that could be answered by collaborators of the Ruby > bindings. ... > enquire.sort_by_key_then_relevance(sorter) > > And it seems that there is no 'sort_by_key_then_relevane' method implented > in Xapian::Enquire. > Is there a reason for this? How could it be implemented into the Ruby > bindings? Which version of Xapian are you using? Enquire::set_sort_by_key_then_relevance() was added to the SWIG bindings on Nov 29th 2007, and included in releases of xapian from 1.0.5 onwards. I've not checked that it works in the Ruby bindings, but it's certainly present in the generated xapian_wrap.cc file for Ruby. -- Richard