[SpamCop-List] Re: taobao.com spam
DS
9ucs5y001 at sneakemail.com
Sun Feb 29 11:29:18 EST 2004
"Mike Easter" <MikeE at ster.invalid> wrote in message
news:c1tdbp$b9u$1 at news.spamcop.net...
> DS wrote:
> > I am going to post the attachment as a UNICODE text file in the
> > spamcop.spam group. If it is a faux pas to do so, please forgive me
> > in advance. :)
>
> I can't do anything useful with that thing. I took the isolated
> 'taobao.txt' attachment and renamed it 'taobao.uue' and then decoded it,
> but the result was 'notepad unsatisfactory'. In notepad, it appeared to
> be a 'messed up' condition of the mail's headers [and a little bit more,
> hard to interpret] but/and in WordPad, the headers appeared 'normal' and
> the item was almost exactly like what you originally posted, with the
> exception of WordPad 'choosing' the non-printing 'box' character instead
> of the question marks of your original post.
The UNICODE .txt file has a UNICODE BOM (Byte Order Marker) of 0xff, 0xfe at
the head of it indicating it is a little-endian, UTF-16 file. Notepad on
any WinXP system or better should handle it, and I believe that Win2K should
also handle it just fine. A good font to use for rendering it is "Arial
Unicode MS".
>
> I'm not using any Chinese characters in my system.
>
> I have a feeling we're dealing with some kind of adverse effect of what
> OL is doing to render the item for you; that is, OL wants you to be
> able to read the Chinese. SpamCop must not only not want to read the
> Chinese but it gags on it.
It could very well be that in order to render it properly using the raw
message it should have some sort of MIME type or other encoding in it. My
beef is that it renders fine in notepad with the proper font and such,
copies correctly, but the webform/webparse on Spamcop completely chokes on
it. I need someone on the inside to see what the parser is receiving as
well as what it is barfing on.
>
> If that item had been received by something else besides OL, like OE, it
> would have been stored in its original form, which I call 'smtp mime'
> but 'mime experts' tell me that is simply 'mime', and I'll bet SC could
> handle it.
>
> Once it has been rendered and stored by OL, you can't get the original
> back.
Good idea! I finally sent the raw message including headers that I have to
my external account where OE can pick it up. It has some of the following
in it:
------_=_NextPart_001_01C3FEF8.3E805358
Content-Type: text/plain;
charset="gb2312"
Content-Transfer-Encoding: quoted-printable
with the Chinese text encoded as:
=D6=D0=B9=FA=B9=E3=B6=AB=CA=D6=BB=FA=CD=F8=C9=CF=B3=AC=CA=D0=CA=D6=BB=FA=D0=
=C2=B4=BA=B4=F3=B3=EA=B1=F6 266=D4=AA=C6=F0
I suspect the 'charset="gb2312"' is exactly what is missing for getting
SpamCop to parse the Chinese text in this case. I'll see if I can get my
buddies in Outlook to come up with a way to get proper headers and body for
this. I have no idea if the message itself is correctly formatted when
received at all now. :(
The backend gets them correct when sending the message at least.
>
> In any case, as you noticed from my other post, the provider for the
> website doesn't want to hear about it, so for that particular item, the
> header parse is the only thing that is 'counting' for anything.
Yeah, I know. It is frustrating though. It is pretty obvious that
taobao.com doesn't care about it and in fact likes the extra advert. even if
it isn't coming from them directly.
>
> Are you trying to solve a bigger problem about Chinese spam or just this
> one particular type?
None of the Asian spam I get that contains actual Far East characters works
in the web-parse. I always have to clean it out of Far East content.
--
DS
More information about the SpamCop-List
mailing list