[Xapian-discuss] editdistance.cc average case and worst case running time compared to dynamic programming algorithm

Tue Oct 9 07:34:44 BST 2012

Mr. Olly Betts, Thank you for your reply. I had just about given up hope from ever hearing from you. I have a doctor's appointment tomorrow morning. After I return from the doctor's appointment tomorrow, I will follow the steps outlined in your thoughtful email and report back to you. Again, thank you for your courtesy.  

> Date: Tue, 9 Oct 2012 02:17:57 +0100
> From: olly at survex.com
> To: frank_chang91 at hotmail.com
> CC: xapian-discuss at lists.xapian.org
> Subject: Re: [Xapian-discuss] editdistance.cc average case and worst case running time compared to dynamic programming algorithm
> 
> On Thu, Sep 06, 2012 at 01:21:51PM -0400, Frank Chang wrote:
> > Good afternoon, I have been running some timing tests with 
> > retval = edit_distance_unsigned((const unsigned int *)MaryCamsterBuffer, len2,
> > (const unsigned int *)MaryBuffer, len1, 2); where k =2 is the Levenshtein distance cutoff.
> [...]
> > double cFuzzyCompareLevenshtein::CalcCompare(char* Str1_, char* Str2_, int Size_){
> 
> What types are MaryBuffer and MaryCamsterBuffer?
> 
> edit_distance_unsigned() expects pointers to arrays of unsigned int
> (holding Unicode code points) but I notice your edit distance function
> takes char *, so I'm wondering if you're simply casting char* to
> unsigned int* - that's not going to work as you intend, and would likely
> result in computing edit distance for two much longer strings of junk
> data, which would take longer than computing the edit distance of the
> intended input.
> 
> > Our results indicate that after 50,000 string pair comparisons that
> > edit_distance is equal in speed to a conventional Levenshtein distance
> > dynamic programming implementation Big O(N ^^ 2) shown below.
> > I would expect the running time of the Xapian edit distance
> > implementation based on Hal Berghel and David Roach's Big O(linear
> > running time) algorithm would be substantially faster. Could someone
> > please explain how I can make this speeed improvement happen if is is
> > possible? Thank you.
> 
> If it isn't the issue above, could you post the whole test harness
> you're using? It's hard to reproduce if we have to recreate that code.
> 
> Cheers,
> Olly