Dont fiddle with your Spam settings - Page 4

xyzzy · 24 Aug 2020, 01:21 PM

Quote:

Originally Posted by Grhm

And your comments are in proper, comprehensible English, too! Bravo!

Huh?

Update:
The other day, just for fun, I reopened my old ticket on the removal of the Spam Score just to tell them screw it and I had long since had a workaround. Just got a reply form them and here's what they said:

Quote:

This setting has now been removed for all accounts and is no longer available. I'm glad you have found a workflow you are happy with.

qwertz123456 · 17 Jun 2023, 01:33 AM

@xyzzy

Thanks so much for providing this code.

Just for us mere mortals can understand too, do I have to insert my spam threshold somewhere in the code or is that pulled directly from my spam settings in my FM account?

So, currently I have my spam settings on "custom" and "move messages with score of 4.5 or higher to spam". Or do I just leave the code as is?

And sorry if this is a stupid question

JeremyNicoll · 17 Jun 2023, 06:37 PM

Quote:

Originally Posted by qwertz123456

@xyzzy

Thanks so much for providing this code.

Just for us mere mortals can understand too, do I have to insert my spam threshold somewhere in the code or is that pulled directly from my spam settings in my FM account?

So, currently I have my spam settings on "custom" and "move messages with score of 4.5 or higher to spam". Or do I just leave the code as is?

And sorry if this is a stupid question

Don't know (because I don't use FM's automatic spam-scoring stuff.) But if you go into the Sieve edit screen and click the "copy to clipboard" button, then paste that into your text editor, then look at what the FM-generated code does, you should be able to find out whether they have a variable of their own containing your current GUI-set threshold value. If they do, you could use it.

The disadvantage is that if they ever change the name of their variable, then your code would then be referring to a no-longer-set variable. It would be safer for you to invent your own variable. Except ... that then you have to /remember/ that your sieve code uses your variable which has nothing to do with the one in the GUI.

Personally I'd use my own variable. I also wouldn't give any of my own variables such "obvious" names as eg "subject" or "spam_score" but instead names like "jn_saved_subject" and "jn_saved_spamscore" and so on, as it's unlikely FM would ever name any of their variables with a "jn_" prefix (those are my initials).

If you look at FM's generated code you'll see that they don't have a sensible naming convention for all of their variables - some of them have names starting L0_ L1_ L2_ ...L99_ etc but they also use eg "deletetotrash", "flagged", "hasmailbox", "processimip_outcome", "read", "skipinbox", "spam" and "stop".

Or even, if I had many pieces of sieve code I might name them by the date I wrote the code, so eg "jn_20230617_saved_subject" and "jn_20230617_saved_spamscore", or instead of date put a "change number" into the varnames (if I had a change history describing all the changes I'd made to the code and wanted to find all the variables related, say, to my 93rd change to that code).

As well as reducing the chance of a nasty accident, using variables with my own prefixes means I can find them all (in a long script) by searching for "jn_" in my text editor.

JeremyNicoll · 17 Jun 2023, 07:44 PM

I've just raised a support ticket asking Fastmail to add a fixed prefix of some sort to all their generated variable names.

qwertz123456 · 17 Jun 2023, 07:46 PM

Quote:

Originally Posted by JeremyNicoll

Don't know (because I don't use FM's automatic spam-scoring stuff.) But if you go into the Sieve edit screen and click the "copy to clipboard" button, then paste that into your text editor, then look at what the FM-generated code does, you should be able to find out whether they have a variable of their own containing your current GUI-set threshold value. If they do, you could use it.

Thanks for your input!
As I'm not a coder I just try to follow examples and instructions to find out by trial and error how things work, which can be frustrating a lot of times

I currently have this in the FM generated part:

Code:

  if anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "4", header :regex "X-Spam-score" "^4\.[2-9]$") {
    fileinto "\\Junk";
    stop;

So would my code be correct like this?

Code:

if allof(string :is "${ADD_SCORE}" "true",
         anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${ST1}",
               header :regex "X-Spam-score" "^${ST1}\.[${ST2}]$")) { # X-Spam-score >= SPAM_THRESHOLD
  if header :matches "Subject" "{SPAM*} *" {        # "Add score" is working
    if string :is "${known_sender}" "true" {        # if known sender...
      deleteheader "Subject";                       # ...strip "{SPAM xx.x}" from Subject line
      addheader "Subject" "${2}";                   # X-Spam-score still shows the spam score
    }
  } elsif allof(string :is "${known_sender}" "false", # FM's "Add score" is not working or off
                header :matches "Subject" "*") {    # don't have to worry about known senders
    set "Subject" "${1}";
    if header :matches "X-Spam-score" "*" {
      set "spam_score" "${1}";
      if header :value "lt" :comparator "i;ascii-numeric" "X-Spam-score" "4" {
        set "spam_score" "0${1}";                   # add leading 0 for spam scores < 10
      }
    }
    deleteheader "Subject";                         # prefix Subject with "{SPAM xx.x}"
    addheader "Subject" "{SPAM ${spam_score}} ${subject}";
  } else {                                          # known sender && spam && no {SPAM*} prefix
    # as in the 1st case X-Spam-score still shows the actual spam score
  }
} # ADD_SCORE && spam

I changed the

Code:

"X-Spam-score" "4"

I'm not sure if the part

Code:

"^4\.[2-9]$") {

is important and where I might need to input that.

BritTim · 17 Jun 2023, 08:35 PM

Quote:

Originally Posted by qwertz123456

I changed the

Code:

"X-Spam-score" "4"

I'm not sure if the part

Code:

"^4\.[2-9]$") {

is important and where I might need to input that.

I have not carefully looked at your full code, but that regular expression is carefully allowing for a spam score that is is a decimal number between 4.2 and 4.9. I am surprised it is necessary, but presumably testing found that the comparison against 4 was only reliable for an X-Spam score of 5 and above. It suggests you need to be careful with your own comparisons. I think my own approach would be try to use Fastmail's comparisons, using them to set a flag that I use later rather than trying to change between GT and LT comparisons.

qwertz123456 · 17 Jun 2023, 08:54 PM

Quote:

Originally Posted by BritTim

I think my own approach would be try to use Fastmail's comparisons, using them to set a flag that I use later rather than trying to change between GT and LT comparisons.

That goes way over my head

I don't even know what GT and LT means. And I wouldn't know where to put the

Code:

"^4\.[2-9]$") {

part.

JeremyNicoll · 17 Jun 2023, 10:17 PM

I think before we go too much further it would help if @xyzzy commented.
I think there's a (small) bug (or a bad comment perhaps) in the code posted on 24 Aug 2020.

To: qwertz123456 - the "gt" or "lt" things indicate if a comparison is being made of one value "gt" ie greater than another, or "lt" less than.

The sieve code that has stuff like

:value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "4"

in it compares a value extracted from a header with another value. In this case it compares the value from the header "X-Spam-score", with 4, and the test succeeds if the value in the header is gt (ie greater than) 4.

However this test only looks at the integer part (the bit to the left of a decimal point) of a number. So if it compares "4.7" with 4 it would think they were the same (because it only extracts the "4" from "4.7", and then tests that 4 with the 4 specified at the end of the test. And a value like that 4.7 certainly wouldn't be considered greater than 4.

So the tests also use a different technique to look at anything that might have been to the right of a decimal point.

This is where I think xyzzy may have made an error. He(?) describes code that he didn't show that chops up a threshold value, described as "x.yy" (but shown as an example as 6.2) into two parts. He puts the "x" part into a variable named ST1 and the "yy" part (modified a bit) into ST2.

ST2 is described as then containing "yy-9". This is fine if "yy" was just a single digit, so eg chopping "4.7" up would produce "4" in ST1 and "7-9" in ST2. But if the bit after the dot had 2 or more digits, eg if the threshold value was "4.75", then ST2 would end up as "75-9" ... and I'm not sure what that would do (because it's syntactically invalid in the regex it's later used in, I think).

The reason there's a dash in ST2 is that in the test that uses its value later on the character after the decimal point in a number in the "X-Spam-score" header is tested to see if it matches (say) "5-9" and that is a shorthand way of saying if it is a 5 or a 6 or a 7 or an 8 or a 9.

That aside, to use this code you also have to

(a) set, probably at the very start of your code, the variable SPAM_THRESHOLD

(b) have some code that chops SPAM_THRESHOLD up into two variables ST1 and ST2.

(c) have a variable named ADD_SCORE and have that set true if you want this code to do anything

(d) you need to correct an error you introduced in two lines;

Code:

if header :value "lt" :comparator "i;ascii-numeric" "X-Spam-score" "4" {
        set "spam_score" "0${1}";                   # add leading 0 for spam scores < 10

which are meant (if an extracted spam score is less than 10) to prefix the single digit score before the decimal point with an extra "0". The place that you replaced a "10" by a "4" should have been left as "10".

JeremyNicoll · 17 Jun 2023, 10:26 PM

Also ...the first thing that the snippet of FM-generated code shows is that Fastmail do NOT put a variable in their script that contains your spam threshold. Instead they generate that first test, knowing what your threshold is. Those lines:

Code:

if anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "4", header :regex "X-Spam-score" "^4\.[2-9]$") {
    fileinto "\\Junk";
    stop;

mean: if either of two conditions is true then move a mail to the Junk folder and stop looking at that mail. The two conditions are

- if the INTEGER part of the number in "X-Spam-score" is greater than 4

and

- if the value in "X-Spam-score", examined character by character (ie not as a number) is a 4, then a dot, then a single digit in range 2-9 inclusive.

This implies a threshold of 4.1 I think, BUT DOESN'T MAKE SENSE in terms of what you said earlier - that you wish mails with a score > 4.5 to be moved to the spam folder.

JeremyNicoll · 17 Jun 2023, 10:33 PM

Oops, and also ... When qwertz123456 asked

Quote:

I'm not sure if the part

"^4\.[2-9]$") {

is important and where I might need to input that.

it shows another fundamental misunderstanding.

YES it is important. But you don't need to put it anywhere. It is already where it needs to be, in the FM-generated code. It is part of the sieve code that does what you configured spam processing to do, ie to move some mails to the spam folder.

That logic really has nothing to do with the separate piece of sieve code that adds in or in some cases removes the "SPAM n.nn" text to/from a Subject header.

The only reason it was relevant was that IF that FM-generated code had contained your current threshold value in a variable, you could have chosen to use its value in the second block of code ... and the only way to find out if such a variable existed was to look at this generated code.

Does that make sense?

qwertz123456 · 17 Jun 2023, 11:00 PM

Thanks. Yeah, I think it would make sense to wait for xyzzy to respond. I thought I could just copy the code insert one or two numbers and it would work. But the code seems to be much more complex than I thought.

And your explanations do make sense, but putting it all together and understanding the whole code is way beyond my pea-sized brain

xyzzy · 18 Jun 2023, 12:05 PM

I'll try to summarize generally on these recent posts to this thread. And I apologize in advance for the length of this post since I'm including my code and some associated explanation.

Unfortunately the UI does not have any predefined Sieve variables for any of the UI settings. The funky names generated in the Sieve code are I think generally sufficient to reduce the odds of clashing with user variable names.

Regex terms like "^4\.[2-9]$" are checking for fractional values in the range specified, 4.2 through 4.9 in that example (from X-Spam-score value start, ^, to end, $). You will see this whenever the range test is for a fractional value. The comparator only tests the integer portion of the score. If the threshold is a integer (e.g., 4.0) then the fractional test is not generated because the range then would be [0-9]. It would still always work but FM doesn't generate it in their Sieve code since the comparator is sufficient. They know the value they're testing for.

Since these recent posts are resurrecting an old thread some things have changed since my post #44.The primary thing is that since FM no longer supports generating the {SPAM x.xx) because they add the score in a badge to messages sent to the Spam folder, then testing for it is not necessary and simplifies my code a little.

So below is my actual Sieve code related to the threshold handling. This time I am including the initialization code. This is all copy/pasted as-is directly from my Sieve script. My code is organized into a global initialization section (define constants and lists) at the start of the Script making stuff easier to find, initialized code section, and finally the code itself in the various Sieve sections provided for that.

Note, my coding style is to comment a lot so I hope that helps in understanding it.

First the global initialization...

Code:

/*--------------------------------*
 | Spam Protection level settings |
 *--------------------------------*

 The following settings REPLACE FM's UI Protection level "Move" and "Permanently delete".
 DISABLE THOSE PROTECTION LEVEL TESTS.  NOTE HOWEVER THAT PROTECTION LEVEL'S CUSTOM RADIO
 BUTTON MUST STILL BE CHECKED TO ENSURE THE SERVER GENERATES THE SPAM SCORE.  
*/
set "SPAM_THRESHOLD"      "5.0";            # SPAM_DISCARD > spam score >= SPAM_THRESHOLD
set "SPAM_FLAGS"          "";               # fileinto "\\Junk" flags
                                            # \\Seen ==> read, \\Flagged ==> pinned
                                            # "\\Seen" | "\\Flagged" | "\\Seen \\Flagged" | ""
                                            # this spam is always filed in \\Junk (same as UI)
                                            # DISABLE UI's PROTECTION LEVEL's "MOVE"

set "SPAM_DISCARD"        "9.2";            # spam score >= SPAM_DISCARD
set "SPAM_DISCARD_FOLDER" "INBOX.Discard";  # folder for msgs in that range (null ==> discard)
set "SPAM_DISCARD_FLAGS"  "";               # fileinto SPAM_DISCARD_FOLDER flags
                                            # set SPAM_DISCARD to 100.0 to disable this test 
                                            # DISABLE PROTECTION LEVEL's "PERMANENTLY DELETE"

# The effect of these settings is identical to Spam Protection's "Move" and "Permanently
# delete" when the SPAM_DISCARD_FOLDER is set to null.  If it specifies a folder then the
# messages are filed into there to give the opportunity to review.

# Note as stated these settings replace the UI "Spam protection" settings.  If it is also
# desired to set the Backscatter Advanced settings that is still done from there as usual. 

/*------------------------------------------------------------*
 | ADD_SCORE - Prefix Subject with spam score: "{SPAM xx.x} " |
 *------------------------------------------------------------*

 This is a legacy setting FM used to have (it's been removed form the FM UI) but is still
 considered wortwhile supporting.  ADD_SCORE determines whether to prefix the Subject line
 with the spam score as "{SPAM xx.x} ".  Three possibilities are supported as shown below.
*/
set "ADD_SCORE" "1";                        # 0 ==> do NOT prefix ANY spam subject lines
                                            # 1 ==> prefix spam that's NOT filed into Spam
                                            # 2 ==> prefix ALL SPAM with with "{SPAM xx.x} "

SPAM_THRESHOLD is my "Move messages with a score..." setting. "SPAM_DISCARD" is the "Permanently delete..." setting. I don't remember now the titles FM once used in the UI but the names I used were "born" from them, not what they have in the recent Spam protection UI. At any rate both these settings must be set off in the UI to allow my code to do it's own threshold checks instead. The UI's Spam protection protection level must still be enabled to get the X-Spam-score header to be generated.

Unlike the UI I support a SPAM_DISCARD_FOLDER. I don't trust the UI's Permanently delete where messages are simply discarded. I want to possibly review them first. So if SPAM_DISCARD_FOLDER is specified such messages are sent to that folder (presumably set to hold message a short time) giving you time to review them. Setting SPAM_DISCARD_FOLDER to null will however cause a discard just like the Permanently delete UI setting would have caused.

Both folders have their respective flags setting too. For example setting SPAM_DISCARD_FLAGS to "\\Seen" would have the same effect as the UI's Spam protection "Mark spam as read" setting.

Finally note that my ADD_SCORE has three settings. 0 is basically FM's way of handling things. No spam score except for showing the flag on stuff filed into the Spam folder. 1 allows adding "{SPAM xx.x} " to stuff anywhere else, specifically spam that's going into SPAM_DISCARD. Finally, for the sake of completeness, show the spam score even on those filed into the Spam folder. Never used that one but what the heck!

Next the initialization code that breaks the threshold values in to variables we can test the spam score values.

Code:

#
# For places where the spam thresholds need to be explicitly checked define the variables
# needed to do those checks as a function of the threshold values.
#
if string :matches "${SPAM_THRESHOLD}" "*.*" {
  set "ST1" "${1}";
  set "ST2" "${2}-9";
} else {            # should never happen if SPAM_THRESHOLD is properly formatted as "x.y"
  set "ST1" "0";    # set so all X-Spam-score's will always be greater than this
  set "ST2" "0-0";
  addheader "Debug" "????????? Miss-formatted SPAM_THRESHOLD value! ?????????";
}

if string :matches "${SPAM_DISCARD}" "*.*" {
  set "SD1" "${1}";
  set "SD2" "${2}-9";
} else {            # should never happen if SPAM_DISCARD is properly formatted as "x.y"
  set "SD1" "0";    # set so all X-Spam-score's will always be greater than this
  set "SD2" "0-0";
  addheader "Debug" "????????? Miss-formatted SPAM_DISCARD value! ?????????";
}

/* Example uses:

  anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${ST1}",
        header :regex "X-Spam-score" "^${ST1}\.[${ST2}]$") # X-Spam-score >= SPAM_THRESHOLD

  anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${SD1}",
        header :regex "X-Spam-score" "^${SD1}\.[${SD2}]$") # X-Spam-score >= SPAM_DISCARD

For some reason FM only uses a single back slash to escape the dot. It should really be two
back slashes but use only single back slash as illustrated to be consistent with FM's
"mistake". Doesn't matter here because X-Spam-score is always generated syntactically correct.
*/

Finally the code that adds the "{SPAM xx.y} " to the header. Note that I have one variable I use called known_sender where I don't show it's initialization. In it's simplest case it's set to "true" if the X-Spam-Known-Sender header says it is (header :matches "X-Spam-Known-Sender" "yes*"). But I have additional code that can set it to true based on a white list for glob constructs that you cannot specify with Contacts globbing (e.g., subdomains). In that case the code needs to change X-Spam-Known-Sender to say "yes" and "in-addressbook" to get other FM generated Sieve code to go down the path I need it to go.

Code:

#
# For unknown senders prefix Subject with "{SPAM xx.y} " if X-Spam-score >= SPAM_THRESHOLD...
#
# FM used to do this under the Spam Protection's "Add Score" option but they have since
# removed that option.                                                               [8/24/20]
#
# Note that the spam subject lines for unknown senders are always prefixed here if ADD_SCORE
# calls for it.  Later, at appropriate points in the code, ADD_SCORE may still call for taking
# it back out, e.g., for messages being filed into the Spam folder when ADD_SCORE is 0.  But
# that's not known at this point.
#
if allof(not string :is "${ADD_SCORE}" "0",          # spam score prefix is to be added
         string :is "${known_sender}" "false",       # but only for unknown senders
         anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "${ST1}",
               header :regex "X-Spam-Score" "^${ST1}\.[${ST2}]$"),
         header :matches "Subject" "*") {
  set "subject" "${1}";                              # X-Spam-score >= SPAM_THRESHOLD
  if header :matches "X-Spam-score" "*" {            # prefix Subject with "{SPAM xx.y} "
    set "spam_score" "${1}";
    if header :value "lt" :comparator "i;ascii-numeric" "X-Spam-score" "10" {
      set "spam_score" "0${1}";                      # add leading 0 for spam scores < 10
    }
    deleteheader "Subject";
    addheader "Subject" "{SPAM ${spam_score}} ${subject}";# prefix Subject with "{SPAM xx.x} "
  }
} # ADD_SCORE == 0 && spam to an unknown sender

So that's the code evolved from the one I posted in post #44.

I hope all this helps clarify everything. If you really want the code I have to set known_sender let me know.

xyzzy · 18 Jun 2023, 05:16 PM

Quote:

Originally Posted by JeremyNicoll

This is where I think xyzzy may have made an error. He(?) describes code that he didn't show that chops up a threshold value, described as "x.yy" (but shown as an example as 6.2) into two parts. He puts the "x" part into a variable named ST1 and the "yy" part (modified a bit) into ST2.

ST2 is described as then containing "yy-9". This is fine if "yy" was just a single digit, so eg chopping "4.7" up would produce "4" in ST1 and "7-9" in ST2. But if the bit after the dot had 2 or more digits, eg if the threshold value was "4.75", then ST2 would end up as "75-9" ... and I'm not sure what that would do (because it's syntactically invalid in the regex it's later used in, I think).

My post above summarized the way I handle this stuff. With all the recent posts I think I probably overlooked some of the specific questions. Let me address the JeremyNicoll thread where he though I had an error.

Looking at "normal" code that Sieve checks X-Spam-score the spam scores are always only in the form of xx.y or x.y, i.e., only one decimal place. There cannot be a 2-digit fractional part. Values range from 0.0 to 99.9 (Spam Protection's UI settings enforce this too). Admittedly I am a little lax in the internal error checks. Thinking about this a bit I probably should change the test that extracts the values from the score to a more rigorous regex as follows:

Code:

 if string :regex "${SPAM_THRESHOLD}" "^([0-9]{1,2})\\.([0-9])$"

This will only match on x.y and xx.y where all the x's and y's are digits.

SPAM_DISCARD is similar but allow 3 digit integers (e.g., 100.0) to disable discards since spam scores are always <100.0.

JeremyNicoll · 18 Jun 2023, 07:29 PM

Thanks to @xyzzy for such thorough answers.

I am pleased to see someone else who writes the sort of comments I put in my own code. I don't class this as "a lot"; I class nearly every other programmer's code as "inadequately commented".

I do have an observation and some questions though. The first is that for code for one's own use one doesn't have to be terribly careful. I would fully expect that if xyzzy changes this code, he(?) pays full attention to the implications of what gets changed. But when code is made available for other people to use one can't be certain that they will concentrate as much, and/or understand what they are doing. That is, I think this publicly-shared code might need to be written so it doesn't assume parameters are configured correctly.

In the code that splits up a configured SPAM_THRESHOLD or SPAM_DISCARD value, the "else" action is to set values to 0. While for SPAM-THRESHOLD that means that later on every email will satisfy the "is my spam score greater than 0" test and thus have the spam score added to its subject, that's unlikely to matter. But the same thing would happen with SPAM_DISCARD and thus every one of a user's emails would be sent to the discard folder. I think that the score values set here should be SD1/ST1 as 9999 (ie much larger than likely to occur) and SD2/ST2 as a regex that will never match. I'd probably also set an error flag (which I'd have initialised to a default error message, clear only if parameters were ok, and set to a specific error message if specific errors were found), and only execute the following code if the error flag had been cleared. I /would/ set both the shouldn't match SD1/ST1/SD2/ST2 values AND the error message even if in theory only the test values or error message needs set, just to be on the safe side (if someone else changes later code without thinking or I change it without paying enough attention).

It's also not enough to check that individual parameters have valid values. You need to use them only if the combinations of parameters make sense.

Something like

not string :is "${ADD_SCORE}" "0"

is safe for xyzzy's own code (because he will be careful when setting ADD_SCORE's value in the first place), but not safe (albeit the risk of the condition accidentally being true is low for this specific action) for a user who might not have set ADD_SCORE at all or mis-set it. This "not string" is a shorthand for ADD_SCORE being set to 1 or 2, but really you should explicitly check that the value IS 1 or 2, not that it's not 0. Lots of things are not 0, but only two things are either 1 or 2.

I understand completely why you "addheader" error messages into the mails - but a user who never looks at headers won't ever see them (until, maybe, directed to look for them by you). For a user who screws up using code like this, a safer action might be the addheader plus stopping the sieve script completely (so that it can't take any unintended action).

Also... irrelevant in a way bearing in mind what I posted above about the "else" values of SD2 and ST2, but I was curious why you set those to "0-0". That makes no sense in regex terms; why not just code "0"?

xyzzy · 18 Jun 2023, 08:49 PM

Quote:

Originally Posted by JeremyNicoll

Thanks to @xyzzy for such thorough answers.

I am pleased to see someone else who writes the sort of comments I put in my own code. I don't class this as "a lot"; I class nearly every other programmer's code as "inadequately commented".

That's one of the reasons I said "a lot"!

The other is that if I have to revisit my code in the future the comments help me remember what I was thinking when I originally wrote it.

Quote:

I do have an observation and some questions though. The first is that for code for one's own use one doesn't have to be terribly careful. I would fully expect that if xyzzy changes this code, he(?) pays full attention to the implications of what gets changed.

Which is the reason I never paid too much attention to the SPAM_THRESHOLD and SPAM_DISCARD error recovery. That code was done early in my use of Sieve a few years ago so I was not too much interested in elegant error recovery.

By the way, it's "he"!

Quote:

But when code is made available for other people to use one can't be certain that they will concentrate as much, and/or understand what they are doing. That is, I think this publicly-shared code might need to be written so it doesn't assume parameters are configured correctly.

Except for posting sections of my code here I doubt it will ever be more widely distributed and if I add too much "hardening" code it might distract from whatever I am trying to illustrate.

Quote:

In the code that splits up a configured SPAM_THRESHOLD or SPAM_DISCARD value, the "else" action is to set values to 0. While for SPAM-THRESHOLD that means that later on every email will satisfy the "is my spam score greater than 0" test and thus have the spam score added to its subject, that's unlikely to matter. But the same thing would happen with SPAM_DISCARD and thus every one of a user's emails would be sent to the discard folder. I think that the score values set here should be SD1/ST1 as 9999 (ie much larger than likely to occur) and SD2/ST2 as a regex that will never match.

This goes back to me originally controlling what I am doing. I honestly never gave much thought on what to do when those values are defined as invalid.

I think your 9999 idea is a good one. In fact 100.0 is sufficient since spam scores never exceed 99.9. In the comments for SPAM_DISCARD I already say setting it to 100.0 effectively disables it. Same idea would work for SPAM_THRESHOLD. So I can also use that fact for the error recovery setting ST1/SD1 to 100. Doesn't matter what ST2/SD2 are,

Quote:

I'd probably also set an error flag (which I'd have initialised to a default error message, clear only if parameters were ok, and set to a specific error message if specific errors were found), and only execute the following code if the error flag had been cleared. I /would/ set both the shouldn't match SD1/ST1/SD2/ST2 values AND the error message even if in theory only the test values or error message needs set, just to be on the safe side (if someone else changes later code without thinking or I change it without paying enough attention).

Setting ST1/SD1 to 100 would effectively disable all spam checking based on the spam score. Such emails would be treated as if they are not spam. Other non-spam score tests (and I do have such stuff) however would still be enabled. I am not sure I want a master bypass switch for that stuff.

Quote:

It's also not enough to check that individual parameters have valid values. You need to use them only if the combinations of parameters make sense.

Something like

not string :is "${ADD_SCORE}" "0"

is safe for xyzzy's own code (because he will be careful when setting ADD_SCORE's value in the first place), but not safe (albeit the risk of the condition accidentally being true is low for this specific action) for a user who might not have set ADD_SCORE at all or mis-set it. This "not string" is a shorthand for ADD_SCORE being set to 1 or 2, but really you should explicitly check that the value IS 1 or 2, not that it's not 0. Lots of things are not 0, but only two things are either 1 or 2.

I am not sure I want to exhaustively check all values. Clutters up the code. Some multiple cases insist on specific switch values but if there is a default I might not insist on the documented value.

Quote:

I understand completely why you "addheader" error messages into the mails - but a user who never looks at headers won't ever see them (until, maybe, directed to look for them by you). For a user who screws up using code like this, a safer action might be the addheader plus stopping the sieve script completely (so that it can't take any unintended action).

I agree with you about the error message header but Sieve doesn't leave a lot of degrees of freedom here. If I set the spam and discard thresholds to 100 that should let most messages through (albeit the ones that aren't spam score dependent - they are filtered normally).

Be it all bypassed or doing what I suggest still leaves the problem on how to error report. Short of adding a header, thinking out loud, I guess I could add a prefix to the subject just like the the spam score. That would show up on the subject line. Not sure that's very good either.

Quote:

Also... irrelevant in a way bearing in mind what I posted above about the "else" values of SD2 and ST2, but I was curious why you set those to "0-0". That makes no sense in regex terms; why not just code "0"?

Don't know now. I think back then I didn't analyze all the implications. I just wanted a path or placeholder to handle mistyping the threshold values. Also that code was some of the earliest code I added to the script and a lot or evolution occurred after that. I don't know about back then but now setting to integer values to 100 makes more sense.

17 Jun 2023, 10:17 PM	#53
JeremyNicoll Essential Contributor Join Date: Dec 2017 Location: Scotland Posts: 490	I think before we go too much further it would help if @xyzzy commented. I think there's a (small) bug (or a bad comment perhaps) in the code posted on 24 Aug 2020. To: qwertz123456 - the "gt" or "lt" things indicate if a comparison is being made of one value "gt" ie greater than another, or "lt" less than. The sieve code that has stuff like :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "4" in it compares a value extracted from a header with another value. In this case it compares the value from the header "X-Spam-score", with 4, and the test succeeds if the value in the header is gt (ie greater than) 4. However this test only looks at the integer part (the bit to the left of a decimal point) of a number. So if it compares "4.7" with 4 it would think they were the same (because it only extracts the "4" from "4.7", and then tests that 4 with the 4 specified at the end of the test. And a value like that 4.7 certainly wouldn't be considered greater than 4. So the tests also use a different technique to look at anything that might have been to the right of a decimal point. This is where I think xyzzy may have made an error. He(?) describes code that he didn't show that chops up a threshold value, described as "x.yy" (but shown as an example as 6.2) into two parts. He puts the "x" part into a variable named ST1 and the "yy" part (modified a bit) into ST2. ST2 is described as then containing "yy-9". This is fine if "yy" was just a single digit, so eg chopping "4.7" up would produce "4" in ST1 and "7-9" in ST2. But if the bit after the dot had 2 or more digits, eg if the threshold value was "4.75", then ST2 would end up as "75-9" ... and I'm not sure what that would do (because it's syntactically invalid in the regex it's later used in, I think). The reason there's a dash in ST2 is that in the test that uses its value later on the character after the decimal point in a number in the "X-Spam-score" header is tested to see if it matches (say) "5-9" and that is a shorthand way of saying if it is a 5 or a 6 or a 7 or an 8 or a 9. That aside, to use this code you also have to (a) set, probably at the very start of your code, the variable SPAM_THRESHOLD (b) have some code that chops SPAM_THRESHOLD up into two variables ST1 and ST2. (c) have a variable named ADD_SCORE and have that set true if you want this code to do anything (d) you need to correct an error you introduced in two lines; Code: if header :value "lt" :comparator "i;ascii-numeric" "X-Spam-score" "4" { set "spam_score" "0${1}"; # add leading 0 for spam scores < 10 which are meant (if an extracted spam score is less than 10) to prefix the single digit score before the decimal point with an extra "0". The place that you replaced a "10" by a "4" should have been left as "10".

17 Jun 2023, 10:26 PM	#54
JeremyNicoll Essential Contributor Join Date: Dec 2017 Location: Scotland Posts: 490	Also ...the first thing that the snippet of FM-generated code shows is that Fastmail do NOT put a variable in their script that contains your spam threshold. Instead they generate that first test, knowing what your threshold is. Those lines: Code: if anyof(header :value "gt" :comparator "i;ascii-numeric" "X-Spam-score" "4", header :regex "X-Spam-score" "^4\.[2-9]$") { fileinto "\\Junk"; stop; mean: if either of two conditions is true then move a mail to the Junk folder and stop looking at that mail. The two conditions are - if the INTEGER part of the number in "X-Spam-score" is greater than 4 and - if the value in "X-Spam-score", examined character by character (ie not as a number) is a 4, then a dot, then a single digit in range 2-9 inclusive. This implies a threshold of 4.1 I think, BUT DOESN'T MAKE SENSE in terms of what you said earlier - that you wish mails with a score > 4.5 to be moved to the spam folder.

17 Jun 2023, 01:33 AM	#47
qwertz123456 Essential Contributor Join Date: Jan 2008 Posts: 378	@xyzzy Thanks so much for providing this code. Just for us mere mortals can understand too, do I have to insert my spam threshold somewhere in the code or is that pulled directly from my spam settings in my FM account? So, currently I have my spam settings on "custom" and "move messages with score of 4.5 or higher to spam". Or do I just leave the code as is? And sorry if this is a stupid question

17 Jun 2023, 07:44 PM	#49
JeremyNicoll Essential Contributor Join Date: Dec 2017 Location: Scotland Posts: 490	I've just raised a support ticket asking Fastmail to add a fixed prefix of some sort to all their generated variable names.

17 Jun 2023, 11:00 PM	#56
qwertz123456 Essential Contributor Join Date: Jan 2008 Posts: 378	Thanks. Yeah, I think it would make sense to wait for xyzzy to respond. I thought I could just copy the code insert one or two numbers and it would work. But the code seems to be much more complex than I thought. And your explanations do make sense, but putting it all together and understanding the whole code is way beyond my pea-sized brain

18 Jun 2023, 07:29 PM	#59
JeremyNicoll Essential Contributor Join Date: Dec 2017 Location: Scotland Posts: 490	Thanks to @xyzzy for such thorough answers. I am pleased to see someone else who writes the sort of comments I put in my own code. I don't class this as "a lot"; I class nearly every other programmer's code as "inadequately commented". I do have an observation and some questions though. The first is that for code for one's own use one doesn't have to be terribly careful. I would fully expect that if xyzzy changes this code, he(?) pays full attention to the implications of what gets changed. But when code is made available for other people to use one can't be certain that they will concentrate as much, and/or understand what they are doing. That is, I think this publicly-shared code might need to be written so it doesn't assume parameters are configured correctly. In the code that splits up a configured SPAM_THRESHOLD or SPAM_DISCARD value, the "else" action is to set values to 0. While for SPAM-THRESHOLD that means that later on every email will satisfy the "is my spam score greater than 0" test and thus have the spam score added to its subject, that's unlikely to matter. But the same thing would happen with SPAM_DISCARD and thus every one of a user's emails would be sent to the discard folder. I think that the score values set here should be SD1/ST1 as 9999 (ie much larger than likely to occur) and SD2/ST2 as a regex that will never match. I'd probably also set an error flag (which I'd have initialised to a default error message, clear only if parameters were ok, and set to a specific error message if specific errors were found), and only execute the following code if the error flag had been cleared. I /would/ set both the shouldn't match SD1/ST1/SD2/ST2 values AND the error message even if in theory only the test values or error message needs set, just to be on the safe side (if someone else changes later code without thinking or I change it without paying enough attention). It's also not enough to check that individual parameters have valid values. You need to use them only if the combinations of parameters make sense. Something like not string :is "${ADD_SCORE}" "0" is safe for xyzzy's own code (because he will be careful when setting ADD_SCORE's value in the first place), but not safe (albeit the risk of the condition accidentally being true is low for this specific action) for a user who might not have set ADD_SCORE at all or mis-set it. This "not string" is a shorthand for ADD_SCORE being set to 1 or 2, but really you should explicitly check that the value IS 1 or 2, not that it's not 0. Lots of things are not 0, but only two things are either 1 or 2. I understand completely why you "addheader" error messages into the mails - but a user who never looks at headers won't ever see them (until, maybe, directed to look for them by you). For a user who screws up using code like this, a safer action might be the addheader plus stopping the sieve script completely (so that it can't take any unintended action). Also... irrelevant in a way bearing in mind what I posted above about the "else" values of SD2 and ST2, but I was curious why you set those to "0-0". That makes no sense in regex terms; why not just code "0"?