|
FastMail Forum All posts relating to FastMail.FM should go here: suggestions, comments, requests for help, complaints, technical issues etc. |
|
Thread Tools |
12 Apr 2016, 04:54 AM | #1 |
Essential Contributor
Join Date: Apr 2008
Posts: 371
|
Search problem with "Wedding" (bad stemming?)
I noticed an interesting problem when searching for messages containing the word "wedding" — using either the web interface or an IMAP search. It seems that I got a LOT of hits, and on further investigation, it looks like every message sent to me on a Wednesday gets included.
I suspect it's an unintended stemming problem, but obviously an inconvenient one in this case. In much the same way a search for "bus" matches "buses" (and vice-versa), a search for "wedding" is going to match "wed" since of course that's another form of the word. The problem, of course, is that's also the abbreviation for "Wednesday" The only way I could find to work around this was to use the "substr" directive to search for the exact word, but of course this only works in the web interface, and not when searching from an IMAP client such as Apple Mail. |
12 Apr 2016, 05:11 AM | #2 |
The "e" in e-mail
Join Date: Feb 2006
Location: EU
Posts: 4,945
|
Try wedding NOT Wed Works for me in the classic interface,
|
13 Apr 2016, 12:46 AM | #3 | |
The "e" in e-mail
Join Date: Dec 2004
Location: a virtually impossible but finitely improbable position
Posts: 2,320
|
Quote:
/cl |
|
13 Apr 2016, 08:19 AM | #4 |
Ultimate Contributor
Join Date: Dec 2001
Location: Canada.
Posts: 10,355
|
Would it not also turn up results for 'We' .......
|
13 Apr 2016, 09:06 AM | #5 |
Master of the @
Join Date: May 2012
Location: Melbourne, Australia
Posts: 1,007
Representative of:
Fastmail.fm |
We're currently using Xapian's stock English stemmer, also known as "porter2", which is considered the "standard" English stemmer for general use.
(Here's an intro to stemming for those interested). A standard IMAP SEARCH command (which most clients use) will not use the search index, but instead just do regular substring searches. These are slow, but sometimes more precise. You can get this behaviour in the web interface using substr: or even imap: (more info in the search docs). Clients can the IMAP SEARCH=FUZZY extension if they want to use the search index, which will give largely the same results as the web client (but not across folders; that's a FastMail extension). So if you're getting the same results in a client, that's probably what's going on (I don't know what Apple Mail does myself. I can test if you're interested). Since iOS 6 the iOS Mail app issues regular (non-indexed) searches, but does them across all folders at once, so ends up really hurting the server. To get around that, when we detect a search from iOS Mail, we automatically enable SEARCH=FUZZY to ensure good performance. Hopefully that explain it all. |
16 Apr 2016, 12:31 AM | #6 |
Essential Contributor
Join Date: Apr 2008
Posts: 371
|
Ah, of course. Great tip, thanks. Seems to do something different with an iOS-based IMAP search, but I haven't quite figured out what yet.... Looks like it might actually break something, in fact, as I get one result, and then the search just kind of sits there, not quite complete. Either way, going back to the Classic interface of the FastMail iOS app isn't a big problem for times when I need to do more complicated searches.
I'd guess not, since "We" isn't a short or alternative form of "Wedding" .... Stemming, as I understand it, is for word constructs, not merely shorter versions..... "Wedding", although commonly used these days a noun, I guess would technically be the action version of the verb "to wed" It's just unfortunate that it's also the short form for Wednesday. There's probably a message there somewhere, but I'm not sure exactly what |
16 Apr 2016, 12:42 AM | #7 | ||
Essential Contributor
Join Date: Apr 2008
Posts: 371
|
Quote:
Apple Mail (the OS X app) isn't really impacted by this for me as I don't do IMAP searches from there — I've got all of my mail synced to my Mac, so it's using its own Spotlight system at that point, with its own special variety of twisted logic Quote:
That said, I did run into something interesting.... An Apple Mail IMAP search that uses "NOT", as janusz described above, seems to work properly but also takes a significantly longer amount of time to run — on the order of several minutes (to give you an idea, the search was still happening when I finished my last reply, and only spit out the results as I was about half way through this one. I got one result the first time, and the search bar in IOS Mail skimmed across to about 90% and then sat there for another 3-5 minutes before finishing and displaying the rest of the results. The second time I ran the same search, I got more results (all of them as far as I can tell), but the search bar did the same thing. |
||