Re: Reading message bodies with charset="utf-8"

Giganews Newsgroups
Subject: Re: Reading message bodies with charset="utf-8"
Posted by:  Remy Lebeau \(Indy Team\) (no.spam@no.spam.com)
Date: Tue, 29 Aug 2006

"Nick Sivo" <junk6indyforu…@kogir.com> wrote in message
news:3635321CD005E340junk6indyforu…@kogir.com...

> I'm attempting to use Indy Pop3 to retrieve and process
> some email messages with the following properties:
>
> Content-Type: text/plain; charset="utf-8"

Indy does not support UTF-8 at this time.  You will have to decode UTF-8
data manually.

> I am unable to access the message body.  The msg.Body.Text property
mangles the UTF-8 text

A MIME-encoded message will (usually) not populate the msg.Body to begin
with.  It will populate the msg.MessageParts instead.

With that said, however, the Text property is a regular String, which is
Unicode in .NET.  UTF-8 is compatible with ASCII, and Indy reads strings
from the socket as ASCII.  Thus, your UTF-8 data should not be getting
mangled when converted to Unicode.  What exactly do you think is being
mangled?

> both msg.Body.SaveToStream() and msg.Body.SaveToFile() are unimplemented.

Yes, they are.

> I tried searching msg.MessageParts, but since the message is not mime
encoded
> there are no parts (msg.MessageParts.Count = 0).

Then why does it have a MIME header?  What does the message actually look
like?

> In an unsuccessful act of desperation I tried treating each character in
> msg.Body.Text as a byte and parsing the resulting byte array, but it too
> was invalid.

Why do you think it is invalid?  Please be more specific.

> Is there a way to get the raw bytes of the body so I can properly parse
them?

Use RetreiveRaw() instead of Retrieve().

Gambit

Replies

None

In response to

Reading message bodies with charset="utf-8" posted by Nick Sivo on Tue, 29 Aug 2006