GSOC 2018: Diversification of Search Results

Uppinder Chugh uppinderchugh at gmail.com
Thu Jun 7 19:39:31 BST 2018


Ricchiey Thomas, Vivek Pal and Amanda Jayanetti (Sorry, I don't know your
IRC nicks, so I'm sending this via mailing list): Please review PR #198 (
https://github.com/xapian/xapian/pull/198). I'd like to get it to a
mergeable state and quickly move on to optimisation and then evaluation of
diversification.

On Tue, Jun 5, 2018 at 9:18 PM, Amanda Jayanetti <amandajayanetti at gmail.com>
 wrote:

> Great! Thanks Uppinder.
>
> Best Regards,
> Amanda
>
> On Tue, Jun 5, 2018 at 5:18 PM, Uppinder Chugh <uppinderchugh at gmail.com>
> wrote:
>
>> Hi Amanda,
>>
>>     I have updated the journal. Regarding the TREC ClueWeb09 dataset, I
>> have contacted Olly. I cannot directly apply for the dataset myself.
>>
>> Sincerely,
>> Uppinder
>>
>> On Tue, Jun 5, 2018 at 12:20 AM, Amanda Jayanetti <
>> amandajayanetti at gmail.com> wrote:
>>
>>> Hi Uppinder,
>>>
>>> I noticed that you have not updated the journal [1] since May 14th, so
>>> appreciate if you could provide an update on the current status of the
>>> project. Also, have you applied for the TREC ClueWeb09 dataset?
>>>
>>> [1] https://trac.xapian.org/wiki/GSoC2018/Diversification/Journal
>>>
>>> Best Regards,
>>> Amanda
>>>
>>> On Sat, Apr 28, 2018 at 8:53 AM, Amanda Jayanetti <
>>> amandajayanetti at gmail.com> wrote:
>>>
>>>> Hi Uppinder,
>>>>
>>>> Congratulations on being accepted into GSoC 2018 with Xapian!
>>>>
>>>> as discussed in the interview, I might evaluate the
>>>>> GLS-MPT implementation before moving on to optimizations (C2-GLS).
>>>>>
>>>>
>>>> We had a discussion with regard to this, and the decision was to
>>>> perform evaluation after the optimizations as you had originally proposed.
>>>> So let's stick to your original plan and complete the implementation of
>>>> C2-GLS before going ahead with evaluation.
>>>>
>>>> Best Regards,
>>>> Amanda
>>>>
>>>> On Fri, Apr 27, 2018 at 8:37 AM, Gaurav Arora <
>>>> gauravarora.daiict at gmail.com> wrote:
>>>>
>>>>> We are equally excited about working with you over summer.
>>>>>
>>>>> I think you missed reply by Olly on IRC, you can find it in logs
>>>>> here:  https://botbot.me/freenode/xapian/2018-04-24/?msg=993
>>>>> 36093&page=1
>>>>>
>>>>>    - olly
>>>>>    icebyte[m]: i think that probably needs to go through SFC (
>>>>>    https://sfconservancy.org/) as the "legal entity"
>>>>>    - 2:05 am <https://botbot.me/freenode/xapian/msg/99336095/>
>>>>>    icebyte[m]: i can talk to them about it
>>>>>
>>>>>
>>>>>
>>>>> - Gaurav
>>>>>
>>>>> On Fri, Apr 27, 2018 at 12:23 AM, Uppinder Chugh <
>>>>> uppinderchugh at gmail.com> wrote:
>>>>>
>>>>>> Thanks for selecting my proposal for GSoC, looking forward to
>>>>>> contributing further to Xapian. I've posted this in the IRC but didn't
>>>>>> receive any reply, so I'm presuming this must've been missed and thus
>>>>>> posting it here. As proposed, I plan to use ClueWeb09 Category B
>>>>>> dataset for evaluating diversification. A hosted copy is available
>>>>>> (http://lemurproject.org/clueweb09.php/index.php#Services) which may
>>>>>> be accessed but requires a license. The license is free and granted to
>>>>>> an organisation by applying online
>>>>>> (http://lemurproject.org/clueweb09/organization_agreement.cl
>>>>>> ueweb09.worder.Mar30-18.pdf)
>>>>>> . If a maintainer could have a look at this, that would be great. It's
>>>>>> mentioned on the website that it takes around 2 weeks to obtain the
>>>>>> license, and as discussed in the interview, I might evaluate the
>>>>>> GLS-MPT implementation before moving on to optimizations (C2-GLS).
>>>>>>
>>>>>> On Sat, Mar 10, 2018 at 12:08 AM, Uppinder Chugh
>>>>>> <uppinderchugh at gmail.com> wrote:
>>>>>> >
>>>>>> > Hi, I'd like to share my proposal for GSoC and get feedback on it.
>>>>>> >
>>>>>> > https://docs.google.com/document/d/1A4HF2lZBnLh1TUY3Y2DDUfz-
>>>>>> nzbIL1NNAo8Adl3gN-8/edit?usp=sharing
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Uppinder Chugh
>>>>>> >
>>>>>> > On Mon, Feb 26, 2018 at 2:14 AM, Uppinder Chugh <
>>>>>> uppinderchugh at gmail.com> wrote:
>>>>>> >>
>>>>>> >> In particular, I have the following doubts:
>>>>>> >>
>>>>>> >> a) Is wrapping Xapian::Mset matcher::get_set(..) suitable in this
>>>>>> scenario and with the api? Also, how can I allow the user to manually allow
>>>>>> diversification while he configures his result set via Matcher API?
>>>>>> >>
>>>>>> >> b) Should I include the LC clustering algorithm in
>>>>>> xapian-core/cluster (as there's the base class Cluster which can be
>>>>>> inherited) or make it part of diversification implementation.
>>>>>> >>
>>>>>> >> c) Apart from the proposed methods, I'd be writing automated
>>>>>> tests, examples and documenting the new feature. Some tips here are
>>>>>> appreciated as I've never written tests for code. Also, for documenting, I
>>>>>> believe only getting-started-with-xapian should be updated with examples
>>>>>> for using the new feature.
>>>>>> >>
>>>>>> >> Apart from the above, if I'm missing something or didn't go into
>>>>>> enough detail, please let me know. :)
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20180608/7892f51f/attachment.html>


More information about the Xapian-devel mailing list