EmailDiscussions.com - Looking for opinions on discarding "spam"

Page 1 of 2

Show 40 post(s) from this thread on one page

EmailDiscussions.com (http://www.emaildiscussions.com/index.php)

- FastMail Forum (http://www.emaildiscussions.com/forumdisplay.php?f=27)

- - Looking for opinions on discarding "spam" (http://www.emaildiscussions.com/showthread.php?t=73964)

xyzzy

1 Oct 2018 07:39 AM

Looking for opinions on discarding "spam"

I was looking at the headers in various spam emails I get and I noticed a majority of it has an X-Spam-source containing Host='noreverse', i.e., the originator is hiding who they really are. Nothing surprising about that. I only recently switched to using FM as my email service. As an excuse to try to start to get myself acquainted with sieve (and have a little fun) I added the following code to the sieve script at the start just before section 1.

Code:

if not header :matches "X-Spam-Known-Sender" "yes*" {

   if 

      allof(

        header :contains "X-Spam-source" "Host='noreverse'",

        header :value "ge" :comparator "i;ascii-numeric" "X-Spam-score" "5"

      )

   {

      fileinto "INBOX.noreverse";

      stop;

   }

}

Eventually this could be just a discard rule and I could use the UI to do that but I wanted to be sure the theory was valid so I wrote it explicitly to sort the matches into a new folder named "noreverse". That way I could easily see if the stuff I was sorting into there was all truely spam.

Before I turn this into a discard rule I was wondering if there are any exceptions that I might want to actually see. Are there any valid cases where an email classed as spam and has Host='noreverse' where I wouldn't want to just discard it? So far I haven't seen any exceptions. While I don't get a lot of spam (say 5 to 30 a day) it's still easier to look over the remaining spam that doesn't fit this criteria if they aren't listed in the spam folder first place. Maybe for safety I should set the X-Spam-score threshold higher than the usual 5, say 10, 15, 20?

Given the relatively low quantity of spam I get I am not even sure I even want to do this. I'm not totally "in love" with this idea but it was at least an excuse to read the sieve specs and play around a little bit. But it never hurts to ask for other opinions anyhow. I'm just curious what others think about this idea pro or con.

Thanks in advance.

BritTim

1 Oct 2018 09:48 AM

I have not previously thought about using the fact that the sender is hiding the originator path as part of my spam discard decisions. Off the top of my head, it does make sense. Checking if the email has an apparently valid reply path would also be useful, as many phishing emails send any replies into the ether. I am unsure how often we could determine, in sieve code, if the reply path is valid.

jhollington

25 Oct 2018 02:43 AM

To be clear, "noreverse" does not necessarily mean that the sender is deliberately hiding who they really are. In fact, it's often not something the actual sender even has control over, unless they're running their own mail server AND own their own block of IP addresses.

What "noreverse" actually indicates is that the server the message originated from does not have a reverse DNS record — that is to say, a record that maps its IP address back to a valid DNS name. This is a configuration that is normally handled by an ISP or DNS provider, and it's not even something that can simply be turned off to hide one's identity (nor would it really help to do so, since you still have the IP address, which is a far better identifier than a hostname).

That said, in today's internet, almost all IP addresses should have a reverse DNS associated with them — even the IP that your ISP assigned to you will have something that maps back to a DNS name, even if it's something bizarre-looking like "toroon0717w-lp138-06-70-57-8-62.dsl.bell.ca."

Hence, it's probably a safe assumption that anybody without a proper reverse DNS entry has a higher likelihood of being a spammer, but it's by no means a guarantee of that — I would not recommend turning this into an outright "discard" rule, as it is certainly possible for legitimate emails to come from servers that do not have a properly configured DNS record. I've seen this problem from more than a few small businesses who run their own mail servers over the years (although it's almost always the ISPs fault in these cases).

BritTim

25 Oct 2018 03:21 AM

I think what noreverse will catch is the common case where hacked computers are being used to send a large volume of spam from machines that would not normally be SMTP servers.

jhollington

25 Oct 2018 03:45 AM

Sadly, that's not even really a guarantee anymore either.... Do a reverse DNS lookup on whatever IP address you have at home, and you'll very likely find that it resolves to something —*usually just an amalgam of your IP address as a subdomain of your ISP. There are blacklists (RBLs) out there that include ranges of dynamic "residential" IP addresses for this reason (Sorbs' DUL and Spamhaus' PBL come to mind), but very few mail providers actually use them, and FastMail definitely doesn't.

Ultimately, however, "noreverse" should be very uncommon these days, as it's usually the result of a misconfiguration, or somebody using some very small and obscure ISP who doesn't really know what they're doing. It's definitely not a bad thing to use it as a factor in spam scoring (and I'm fairly certain FastMail already does this), but it would be a mistake to assume that a message should be discarded simply because a reverse entry can't be found to match the sending IP address.

Keep in mind also that this is a dynamic lookup that occurs when the message is actually received by FastMail. I had one client a few years ago who ran their own mail server and had configured their firm-wide spam filters this way. They ended up losing several hours worth of ALL legitimate mail simply because of a DNS lookup problem (it looked to their server like nobody had a valid PTR record, so every single message that came in was discarded as spam). While that's less likely to happen to Fastmail, there are a lot of pieces to DNS that can cause problems out there on the internet at large, and just because a server can't find a reverse DNS entry for a given IP address, it's no guarantee that one doesn't actually exist.

xyzzy

26 Oct 2018 07:06 PM

Thanks for the new comments. Just noticed them since I wasn't checking for a few days since no one had commented since the initial reply.

I've continued to use the filter moving the spam that had "noreverse" into it's own folder. It's been almost a month and so far the spam (score>=5) + norevese test hasn't had any failures.

It was mentioned that "it would be a mistake to assume that a message should be discarded simply because a reverse entry can't be found". But remember I am also checking the spam score as well. Does that change the argument any?

I appreciate the comments on the reliability (or lack of it) of using noreverse as a filtering criteria. This is the kind of info I was asking for in my decision to even keep this additional stuff. I might just end up taking it out and let it end up in the normal spam folder.

FWIW I would estimate more than 90% of the spam I've been getting also had noreverse and thus ended up filtering into my "noreverse" folder which is how I came up with that 90% number (didn't actually count them though). Given that statistic maybe it isn't worth the added test if I cannot trust the reliability and just let the stuff end up in the spam folder where it was originally destined. But hey, at least I got to play with sieve a little.:)

n5bb	29 Oct 2018 08:16 AM

Searching for noreverse

You can search for all such existing messages with this search:

Code:

header:"X-Spam-source: Host='noreverse' "

A warning: Header searches can be slow on large mailboxes. My main Fastmail account is over 14 years old and I have over 11 GB of accumulated mail, so it’s very slow. I found about 970 noreverse messages which were no spam (most from common senders I trust), and looked at the full headers. So that’s less that’s 1% of my non-spam received.

After using a reverse IP lookup tool, I believe that most or all of the noreverse I received from non-spammers were transient failures for Fastmail to quickly discover a reverse IP. So as you are doing, I would combine this with a high spam score before using this to block messages.

If you have reported at least 200 messages as spam and 200 as non-spam (see the bottom of the Spam Protection setup screen), your personal Bayes database spam filter will kick in. After years of experience, I decided to use custom settings and set my move-to-spam-folder threshold to 1.8 and spam discard threshold to 9.0. From time to time I check this behavior by removing the discard setting and checking the pile of spam. I also use address book whitelisting, which makes it very hard for messages from known senders to end up in my spam folder. If you discard messages from the spam folder they are marked as spam, so be sure to mark any ham in the spam folder as non-spam if they are desired and they will move to your Inbox.

Bill

xyzzy

29 Oct 2018 07:24 PM

Quote:

Originally Posted by n5bb (Post 608117)

You can search for all such existing messages with this search:

Code:

header:"X-Spam-source: Host='noreverse"

A warning: Header searches can be slow on large mailboxes.

Heh, the deeper I dig into FM or from reading posts in this forum I keep finding additional features of FM I didn't even know existed nor to ask about. This is one such case. I didn't even know you could do searches like that! Just looked a the FM search doc. Cool. Thanks for the example.

Quote:

My main Fastmail account is over 14 years old and I have over 11 GB of accumulated mail, so it’s very slow. I found about 970 noreverse messages which were no spam (most from common senders I trust), and looked at the full headers. So that’s less that’s 1% of my non-spam received.

I'm mainly using FM through Thunderbird (POP with delete from server) so there really isn't an "real" email saved in the webmail except stuff that doesn't make it to the inbox. Still have to check the webmail spam though which is what started this exercise of making the filters trying to reduce the spam in the first place (although it's not out of hand anyhow).

Quote:

After using a reverse IP lookup tool, I believe that most or all of the noreverse I received from non-spammers were transient failures for Fastmail to quickly discover a reverse IP. So as you are doing, I would combine this with a high spam score before using this to block messages.

The reason I jumped on "noreverse" as a possibility is that 100% of the stuff I've gotten in the short time I've been using it was indeed spam. But based on the opinions here I think I will abandon the noreverse and try other criteria (email contents, substrings in titles, etc.). For the moment though I am keeping the noreverse test in just to sort those email into their own mailbox.

Quote:

If you have reported at least 200 messages as spam and 200 as non-spam

Yeah, the spam filters are just going up their "learning curve". I've only had FM since July and only recently passed 200. All in all I haven't had to correct the filters too much which is why I am only just passing 200 now.

Quote:

After years of experience, I decided to use custom settings and set my move-to-spam-folder threshold to 1.8 and spam discard threshold to 9.0.

It will take me a while for me to get a feel for the spam values and what's best for me. I did include the setting to put the spam number in the subject line to make it easier to see.

Quote:

I also use address book whitelisting, which makes it very hard for messages from known senders to end up in my spam folder.

Yes, I do that too. Very hard? If they are in the contacts how could they be placed in the spam box in the first place? The sieve code tests X-Spam-Known-Sender before it even gets a chance to sort into the spam box. Are there cases where FM doesn't insert a X-Spam-Known-Sender header?

Quote:

If you discard messages from the spam folder they are marked as spam, so be sure to mark any ham in the spam folder as non-spam if they are desired and they will move to your Inbox.

I realize marking as not spam is what training the spam filters (or making stuff not in the spam box as spam). But it's not been clear to me about marking the spam that's already in the spam box when you empty it. The stuff was given a spam score and that's what got it filtered into the spam box in the first place. So when I empty the spam box (or it gets auto emptied) why would that "mark" them as spam again? Or is that just training my own local spam filters at that point?

somdcomputerguy

29 Oct 2018 10:32 PM

Quote:

Originally Posted by xyzzy (Post 608120)

I think what was meant was to make sure that non-spam in your Spam folder is marked as not spam before you delete it if you don't want it.

xyzzy

30 Oct 2018 05:14 AM

Quote:

Originally Posted by somdcomputerguy (Post 608121)

I think what was meant was to make sure that non-spam in your Spam folder is marked as not spam before you delete it if you don't want it.

I understand marking spam as not spam (or not spam as spam) is the way of training my spam filters before deleting the stuff. But I believe the determination of what is or is not spam is determined at delivery time, not when I delete stuff (i.e., nothing affects the spam filter training at deletion time). When I mark stuff as spam (or not spam) I am only trying to correct (re-train) the spam filters to avoid their mistake on future deliveries.

n5bb	30 Oct 2018 01:32 PM

Quote:

Originally Posted by xyzzy (Post 608120)

...I didn't even know you could do searches like that! Just looked a the FM search doc. Cool. Thanks for the example...

Oops … I made a slight syntax mistake (now corrected in my post). The original search string worked, but I meant to include the training single quote ' before the double quote ". You don't need a space between ' and ", but I added it so you could tell them apart.

Quote:

Originally Posted by xyzzy (Post 608120)

...Very hard? If they are in the contacts how could they be placed in the spam box in the first place? The sieve code tests X-Spam-Known-Sender before it even gets a chance to sort into the spam box. Are there cases where FM doesn't insert a X-Spam-Known-Sender header?...

There are rare cases where the sender appears to be spoofed (such as DMARC failure), so X-Spam-Known-Sender is forced to no even if the address is in your contacts list.

Bill

xyzzy

30 Oct 2018 04:21 PM

Quote:

Originally Posted by n5bb (Post 608127)

There are rare cases where the sender appears to be spoofed (such as DMARC failure), so X-Spam-Known-Sender is forced to no even if the address is in your contacts list.

Ok, thanks for the clarification.

jhollington

2 Nov 2018 10:27 PM

Quote:

Originally Posted by xyzzy (Post 608125)

Actually, while your'e correct that messages are identified as spam at the point of delivery, it is the point of deletion at which spam messages are trained as spam (in terms of the Bayes database). This is to prevent false positives that would come from simply training based on the spam folder (although you can set your standard junk mail folder to train all messages that are in it as spam, this isn't the default setting, and in fact FastMail specifically recommends that you don't do this.

Basically, if messages that land in your "Junk" folder get trained as spam, then the Bayes database is going to include every message that lands in this folder, whether it's spam or not. While you can later mark these as "not spam" that doesn't actually undo the process of teaching FastMail that the original message was spam, but rather simply adds it to the "ham" database — so now you have two conflicting entries in the Bayes database, one that says a message is spam, and one that says it's not spam.

So unless you actually set your Junk Mail folder to be a source for learning spam, the Bayes database only gets updated when a message is specifically marked as spam or when it's deleted from this folder —*either manually or as part of an auto-purge rule.

Also keep in mind that this only applies to marking messages as spam or deleting them from the Junk folder via the FastMail web interface or FastMail mobile app. Messages deleted or marked as spam from an IMAP client don't contribute to the Bayes database — although if your IMAP client moves messages that you mark as spam into the standard Junk Mail folder, and you hae an automatic purge on that folder, then they'll eventually get learned as spam once they get automatically deleted by FastMail.

The other alternative, if you use an IMAP client primarily, is to create a second Junk Mail folder that your IMAP client uses to send spam to. Mark this second folder to be the one from which spam is learned. FastMail will put all of the spam that it detects into the default junk folder, while anything you flag as junk in your IMAP app will be moved to its own junk folder, which FastMail will scan regularly to learn more about what is spam.

See https://www.fastmail.com/help/receive/stopspam.html for more details on how this all fits together.

NumberSix

9 Nov 2018 06:57 AM

Quote:

Originally Posted by jhollington (Post 608187)

Holy moly! This possibly explains my recent dissatisfaction with the spam learning... I haven't emptied my Spam folder for quite a long time. Thanks for the explanation!

n5bb	9 Nov 2018 12:18 PM

Quote:

Originally Posted by NumberSix (Post 608233)

... I haven't emptied my Spam folder for quite a long time...

You can set the properties for a folder so that unpinned messages in that folder are auto-purged (permanently deleted) after a chosen time interval So you could set the auto-purge delay to 7 days and know that your personal Bayes database would be updated, giving you a week to check for false positives.

Bill

All times are GMT +9. The time now is 12:11 PM.

Page 1 of 2

Show 40 post(s) from this thread on one page