Re: TIdMessage.Body UTF-8 encoded

Giganews Newsgroups
Subject: Re: TIdMessage.Body UTF-8 encoded
Posted by:  Remy Lebeau (re…@lebeausoftware.org)
Date: Fri, 14 Aug 2015

Boba wrote:

> IdMsg->Body->Add( "US ASCII" );//looks OK
> IdMsg->Body->Add( "?£¥" );//non-ASCII characters ???

You are using a Unicode version of BCB, and as such String-based and TStrings-based
properties expect UnicodeString values, but you are passing char* strings
to Body->Add(), so the RTL will perform a data conversion at runtime to convert
the strings from the OS default Ansi locale to Unicode, and that conversion
can be lossy depending on the OS's actual locale.  To avoid that, do not
pass Ansi strings to begin with, pass Unicode strings instead:

{code}
IdMsg->Body->Add( L"US ASCII" );
IdMsg->Body->Add( L"?£¥" );
{code}

> TFileStream *fs = new TFileStream("msg.txt", fmCreate|fmShareExclusive);
> IdMsg->SaveToStream(fs);
> delete fs;

You do know that TIdMessage has a SaveToFile() method, right?

> as you can see "charset=UTF-8" is missing from the header

I cannot reproduce that in the latest version of Indy 10.  If a charset is
assigned, it always appears.

However, you are using a very old version of Indy, so it is likely that in
that version, when you have the TIdMessage::Encoding property set to meDefault
or mePlainText, it does not output the charset for non-MIME emails.  Try
setting the TIdMessage::Encoding to meMIME instead, even if you are not using
the TIdMessage::MessageParts property, eg:

{code:cpp}
IdMsg->Clear();
IdMsg->Encoding = meMIME; // <--
//...
{code}

Alternatively, put your text into a TIdText object in the TIdMessage::MessageParts,
instead of in the TIdMessage::Body, eg:

{code:cpp}
IdMsg->Clear();
IdMsg->Encoding = meMIME;

TIdText *txt = new TIdText(IdMsg->MessageParts);
txt->ContentType = "text/plain";
//txt->ContentType = "text/plain; charset=UTF-8";
txt->CharSet = "UTF-8";
//txt->ContentTransferEncoding = "quoted-printable";
txt->ContentTransferEncoding = "8bit";
txt->Body->Add( L"US ASCII" );//looks OK
txt->Body->Add( L"?£¥" );//non-ASCII characters ???
//...
{code}

> and non-ASCII chars are screwed.

They are screwed before TIdMessage ever sees them, due to the Ansi->Unicode
conversion you are invoking before Body->Add() receives the data.

--
Remy Lebeau (Indy Team)

Replies

In response to

TIdMessage.Body UTF-8 encoded posted by Boba on Fri, 14 Aug 2015