[Xapian-devel] A beginner in "Posting list encoding improvements"

Olly Betts olly at survex.com
Wed Feb 12 08:55:23 GMT 2014


On Tue, Feb 11, 2014 at 10:36:07PM +0800, Hurricane Tong wrote:
> According to the guideline for beginners in Xapian, I started to build
> Xapian in my computer. I used to work in Windows, with MS Visual
> Studio 2012. But I was faced with many problems when building. Some
> source code doesn't support Chinese well, such as
> xapian-core-1.2.8\win32\xapdep\xapdep.c. I need to modify some code to
> fit Chinese environment. And some code seem not to fit new C++
> features in VS2012 well. If there is someone who also uses Xapian in
> Windows, I think it will be helpful for us to talk about some issues
> in building in Windows together.

Unfortunately those makefiles haven't been actively maintained for a
while.  If you can figure out what needs changing, I'm happy to apply
patches to improve things.

For GSoC projects, I'd recommend developing on Linux, or another
Unix-like platform.  I think everyone who has so far expressed an
interest in mentoring uses Linux or Mac OS X, so we're much better
placed to help with development on such platforms.

You'll also want to use trunk for GSoC projects - the 1.2 release
series only gets bug fixes and non-invasive new features.

> I have finished reading the paper provided, about VSEncoding. And plan
> to read some source code concerning about this project. Then I will
> try to put up some my own proposal. And I will appreciate it much if
> you can give me some extra advice for beginning with the project
> "posting list encoding improvements".  I'm looking forward to
> participating in this project.

VSEncoding seems a good all-round candidate for encoding posting lists
compactly, but it would also be good to have some other encodings
available.  For example, document lengths (which are essentially
stored as a posting list) would benefit from being encoding in a way
such that we could skip ahead in a chunk quickly.  E.g. a fixed
width per chunk would give O(1).  http://trac.xapian.org/ticket/326
has some discussion about that.

Cheers,
    Olly



More information about the Xapian-devel mailing list