[SpamCop-List] Re: Mail Daemon Spam - what is it?
nobody at spamcop.net
Wed Mar 3 14:29:27 EST 2004
Michael Lefevre (michael.spamcop at michaellefevre.com) wrote in
news:c24olm$vvi$1 at news.spamcop.net:
>> The body of a bounce doesn't advertize websites,
>> products or services; most spam links at least to one website, I've
>> never seen a real bounce that does.
> Firstly, I have actually seen a "real" unknown user bounce with a URL
> (to a directory to find the correct address). If the bounce message
> is due to, say, a DNSBL, then it's quite likely to include a lookup
> URL from that.
OK - I just said I'd never seen one :)
> The killer point is that bounces tend to include the original
> content, which may well be an actual spam, and depending on the format
> of the bounce, it can be tricky to work out where the included message
Um, no, that's not a killer point - on the contrary, it's a way to
actually recognize a real bounce. *If* that pattern occurs, it's most
likley a bounce (not that every bounce would have the original content):
it's one of the things a human uses to recognize a real bounce, and a
nice example of what could be used to develop a better heuristic to
recognize actual bounces.
Actually, you could probably use bayesian technology to "teach" a system
to recognize real bounces.
> Could you describe exactly what patterns to look for though -
> something that could be implemented in a line or two of code, that
> would be better than what we have now. I'm sure it'd be possible to
> come up with something, but then the spammers can just include those
> lines in the message body - it's not as if they have a problem with
> including some extra junk.
It's not a matter of specific lines, but of patterns. I doubt it could
be done in two lines of code - but I think it could be handed off to a
bayesian pattern recognition that gets triggered only if there is a
suspicion the mail *may* be a bounce (going by the headers): IOW, if the
headers raise the suspicion, don't decide it's a bounce (as now), but
investigate further. That way not every mail would have to be subjected
to this kind of further scrutiny.
>> Real bounces *are* recognizable to humans (and not by looking at
>> headers); there must be a way to build that recognition process into
>> a heuristic to make a much better determination of what is (most
>> likley) a bounce.
> Possibly, but that's sounding like a lot of work. And whatever
> heuristic you come up with, the spammers can work around it.
I don't think it's necessarily a lot of work. Bayesian and other
pattern-recognition algorithms are already out there; it's more a matter
of wiring them together. Spammers are getting around simple content
filters that look for strings, but it's much harder to get around
(learning) bayesian recognition. That's why you can't just write two
lines of code (which would be the equivalent of a content filter) but
need a more "intelligent" technology that's much harder to get around.
> Anyway, I'm just pointing out why it's not that easy to fix - I
> imagine Julian will do something at some point...
I won't say it's easy - but I'm sure it's possible. And at some point,
something will have to be done - start thinking and testing now!
Marjolein Katsma - Amsterdam, NL - http://hshelp.com/
Spam reporting addresses: http://banspam.javawoman.com/report3.html
Spammers steal resources: they're my enemy.
Cyveillance steals resources: they're my enemy.
The enemy of my enemy can be my enemy, too.
More information about the SpamCop-List