[Xapian-discuss] problems with indexing xlsx files

Chris Purves chris at northfolk.ca
Fri Apr 5 19:47:11 BST 2013


I have a number of Excel .xlsx files that aren't indexed properly.  To illustrate, I have a file called "this is a test.xlsx".  It consists of four cells:

| this |
| is   |
| a    |
| test |

It gets indexed but I am unable to search for it.

I was able to determine the index number and use delve to see the term list:

#delve users -r 16496
Term List for record #16496: D20130405 Hvesuvius M201304 P/ Tapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet Ufile://vesuvius/cpurves/this is a test.xlsx Y2013 Zthisisatest thisisatest

You can see that the words are all concatenated together as if they are a single word.  If I search for "thisisatest" it comes up, but not otherwise.

I'm using version 1.2.3 on Debian.

Chris Purves
Visit my blog: http://chris.northfolk.ca

"The idea is to zap them with lasers and see how they respond." - Dr. Scott Menary

More information about the Xapian-discuss mailing list