Embarcadero C++ Builder – Sending UnicodeString via COM truncates string to half

CB2009 defines a BSTR as a wchar_t*, which is the type returned by c_str() when UNICODE is set. Therefore, if you call a COM function which expects a BSTR with UnicodeString.c_str(), the compiler returns the wchar_t* which the function expects. Sounds good! Except it doesn’t work!!

Actually, both BSTR and the character array in UnicodeString prefix the array of wide chars with a four-byte integer that gives the length. However, BSTR expects this to be the length IN BYTES, whereas UnicodeString makes this the length IN CHARACTERS. Thus, the server gets the wchar_t*, looks right before the pointer for an int, and only uses that many bytes. So, if your UnicodeString is seven chars long, and thus 14 bytes long, CB2009 sets the length to seven, and COM only accepts the first seven bytes (and thus the first four chars). So, all your strings are cut in half!

To make things worse, UnicodeString.Length() does not count out the length to the null terminating char, but rather just returns the integer. So, if you fix the integer to be 14 as COM expects, UnicodeString.Length now returns 14 for your seven character string! We dare not mess with UnicodeString’s data.

The solution is to use WideString instead. Instead of:

UnicodeString myString = L”My Data”;
ptr->ComFunctionExpectingBSTR(myString.c_str()); // this compiles, but COM only uses the first half of the string

use

UnicodeString myString = L”My Data”;
ptr->ComFunctionExpectingBSTR(WideString(myString).c_bstr()); // this has the correct length

References:
http://msdn.microsoft.com/en-us/library/ms221069.aspx – Microsoft’s reference for the BSTR type in MSDN/Win32 and COM Development/Component Development/COM/Automation Programming Reference/Data Types, Structures and Enumerations/IDispatch Data Types and Structures

http://docwiki.embarcadero.com/RADStudio/en/Unicode_in_RAD_Studio#New_String_Type:_UnicodeString – Unicode in RAD Studio

NOTE: In my testing, this is a problem sending data to Microsoft Outlook 2007. It did not appear to cause a problem when sending to Crystal Reports, so the issue might well be with the server code rather than the COM subsystem, depending on how the server determines the length of the passed string. I have seen example code that just passes a WideString to COM, but for me, the compiler gives a Type mismatch error unless I pass the pointer returned by c_bstr()

To see how the length of the string is set:

wchar_t* lit = L"My Sttring";
int* litIPtr = (int*) lit;
litIPtr--;
int litLen = *litIPtr;

WideString ws = lit;
wchar_t* wsPtr = ws.c_bstr();
int* wsIPtr = (int*) wsPtr;
wsIPtr--;
int wsLen = *wsIPtr;

UnicodeString us = lit;
wchar_t* usPtr = us.c_str();
int* usIPtr = (int*) usPtr;
usIPtr--;
int usLen = *usIPtr;

// int actualLen = StrLen(lit); // if UNICODE is set
int actualLen = wcslen(lit);

ShowMessage("For a " + String(actualLen) + " character string, Literal gives " +
String(litLen) + ", WideString gives " +
String(wsLen) + ", UnicodeString gives " + String(usLen));

Leave a Reply

Your email address will not be published. Required fields are marked *