The tested LLMs fared much worse, though, when the Apple researchers modified the GSM-Symbolic benchmark by adding “seemingly relevant but ultimately inconsequential statements” to the questions
Good thing they’re being trained on random posts and comments on the internet, which are known for being succinct and accurate.
That’s my new pet peeve. The thing is I don’t remember seeing people do this in the past and certainly not frequently, but now I see it all the time. Mind-boggling selfishness. I think Covid rotted everyone’s brains way more than we realize.