Advertisement

Cross platform between Mac OS X/Win/Linux and char/wchar_t

Started by August 09, 2010 04:17 AM
15 comments, last by SiCrane 14 years, 2 months ago
hi, I am pretty confused over how Mac OS X and Linux handle their string. I have searched on google and some have said that in mac and linux, char are handled natively as unicode. Does that mean I don't have to use wchar_t/std::wstring in mac/linux??

I have been programming in windows and have always you std::wstring. What about in mac/linux? Is there anyway to easily achieve platform independence? For e.g in windows, LoadLibrary uses a wchar_t but mac/linux dlopen uses chars.

I was thinking of using something like

#ifdef WIN32typedef std::wstring tstring;#define TSTR(text) L##text#elsetypedef std::string tstring;#define TSTR(text) text#endifint main(){    tstring libraryPath(TSTR("path\myfilename"));#ifdef WIN32    LoadLibrary(libraryPath.c_str());#else    dlopen(libraryPath.c_str(), RTD_LAZY);#endof    return 0;}


Is this a recommended method??

thanks.

p.s Sometimes I wonder why can't all OS have a standard way of doing things, makes developer life easier in doing cross platform =D
Quote: Original post by littlekid
hi, I am pretty confused over how Mac OS X and Linux handle their string. I have searched on google and some have said that in mac and linux, char are handled natively as unicode. Does that mean I don't have to use wchar_t/std::wstring in mac/linux??

std::string and "char*-strings" are always C/C++ strings. Support for unicode requires extra care.


Quote: I have been programming in windows and have always you std::wstring. What about in mac/linux?

Personally, I've always only had a use for std::string, which works fine among OSes.

Quote: Is there anyway to easily achieve platform independence?

Of course, stick with the Standard-C++ given stuff. Full independence, though, is only seldomly sane (or do you want to support fridges and remote controls?), and in case you really need Unicode support or whatever, you might want to use an external library that has support for what you want to support.

The next revision of the C++ standard will also feature a few more string variants, btw.


Quote: I was thinking of using something like

#ifdef WIN32typedef std::wstring tstring;#define TSTR(text) L##text#elsetypedef std::string tstring;#define TSTR(text) text#endifint main(){    tstring libraryPath(TSTR("path\myfilename"));#ifdef WIN32    LoadLibrary(libraryPath.c_str());#else    dlopen(libraryPath.c_str(), RTD_LAZY);#endof    return 0;}


Is this a recommended method??

If this method is portable enough for you, then I'd say use it. But it is more probably like
int main(){#ifdef WIN32    tstring libraryPath(TSTR("foo\\myfilename"));    LoadLibrary(libraryPath.c_str());#else    tstring libraryPath(TSTR("frob/myfilename"));    dlopen(libraryPath.c_str(), RTD_LAZY);#endof    return 0;}



Also, be aware that you have a severe mistake in your code:

tstring libraryPath(TSTR("path\myfilename"));


Use slashes, not backslashes. Filenames in C/C++ are always with slashes as the path-seperator. Backslash might work, but are OSspecific (Windows in this case), and error-prone:

path\myfilename

evaluates to

[path] [\m] [yfilename]

. In this case you are lucky, but you be less lucky if it happens to be a valid escape character like \n or \0. Then it becomes a runtime error, and possibly be hard to detect. The better version would be

path\\myfilename

and the canonical C++ version would be

path/myfilename


Quote: p.s Sometimes I wonder why can't all OS have a standard way of doing things, makes developer life easier in doing cross platform =D

Oh, there is: POSIX. You'll see that a plethora of operating systems follow that standard to a large degree.

I can recommend Qt+MinGW for apps-development, which will let you reuse all or a lot of your code on windows, if planned right (my pet picogen [see sig.] compiles and runs on windows+linux thatnks to it).
Advertisement
thanks for the reply. Sorry about the slashes, I typed the code straight to the browser of my memory. I normally just use forward slash "/" :)

Hmm but the problem I am facing is that Microsoft seems to like the wide variant. I know I can change the settings to multibyte via the project settings. But is there any reason why almost all Microsoft projects default to the Unicode wide char variant?? and the sizeof(wchar_t) in both mac os x and windows return different values too why is that so?? the mac os x return 4 while windows return 2.
Quote: Original post by littlekid
hi, I am pretty confused over how Mac OS X and Linux handle their string. I have searched on google and some have said that in mac and linux, char are handled natively as unicode. Does that mean I don't have to use wchar_t/std::wstring in mac/linux??


wchar_t/char is the same on all the common platforms, at least that's how it's supposed to be.

Quote: I have been programming in windows and have always you std::wstring. What about in mac/linux? Is there anyway to easily achieve platform independence? For e.g in windows, LoadLibrary uses a wchar_t but mac/linux dlopen uses chars.


I'm not sure, but I suspect that dlopen() accepts unicode strings encoded as UTF-8.

Quote: I was thinking of using something like

*** Source Snippet Removed ***

Is this a recommended method??


It's going to be a mess with so many #if/#endifs all over the place :)

Personally I use std::wstring everywhere I want strings that support unicode, coupled with some utilities to convert between UTF-8 and such.

For platform independence I prefer to create a "system" or "platform" interface where I can tug away all the platform independent stuff like dynamic library loading and then not worry more about it.

For example you can create a function called MyAwesomeLoadLibrary(const std::wstring &LibraryPath) with different implementations for Windows/Mac/Linux/etc. Alternatively you can use a middle layer like SDL to take care of stuff like that.
Quote:
I'm not sure, but I suspect that dlopen() accepts unicode strings encoded as UTF-8.


you mean even when the filename contains foreign char like Russian or Polish characters?
Quote: Original post by littlekid
Hmm but the problem I am facing is that Microsoft seems to like the wide variant. I know I can change the settings to multibyte via the project settings. But is there any reason why almost all Microsoft projects default to the Unicode wide char variant??


Because it allows strings (especially file names) with strange characters.

Almost all Win32 library functions are supplied by their respective DLLs in two versions: One that accepts wide characters (postfixed with a "W") and one that accepts normal old school 8-bit characters (postfixed with a "A"). If UNICODE is #define'd when you #inclued windows.h, your calls will be redirected to the wide-character functions, otherwise you'll call the other ones.

For example:

#ifdef UNICODE#define LoadLibrary  LoadLibraryW#else#define LoadLibrary  LoadLibraryA#endif // !UNICODE


If you really don't care about wide character support (i.e. people who don't speak english :P), then just make sure that UNICODE ain't getting defined when you compile your projects (it's available as a project setting in Visual Studio).

Quote:
and the sizeof(wchar_t) in both mac os x and windows return different values too why is that so?? the mac os x return 4 while windows return 2.


wchar_t is implementation specific, never rely on it having a specific size (unlike char which is always 1 byte). The problem is that there are many ways of dealing with unicode.

Advertisement
Quote: Original post by littlekid
Quote:
I'm not sure, but I suspect that dlopen() accepts unicode strings encoded as UTF-8.


you mean even when the filename contains foreign char like Russian or Polish characters?


Yes, if it's UTF-8.

Actually I'm pretty sure that unix allows file names to contain anything except '/' and '\0'. I.e., the underlying system don't care about UTF-8 or any other encodings, it just sees it as a bunch of arbitrary bytes.

Obviously this means that you can't pass raw unicode as file names, since it tends to contain a lot of '\0's.
Quote: Original post by rneckelmann
wchar_t is implementation specific, never rely on it having a specific size (unlike char which is always 1 byte).

char is not always one byte. The only guarantee C++ gives is that (sizeof(char) == 1) and that (sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)). The "units" used by sizeof are not specified. Probably most current implementations will make char 1 byte, though.

Quote: Original post by Koen
Quote: Original post by rneckelmann
wchar_t is implementation specific, never rely on it having a specific size (unlike char which is always 1 byte).

char is not always one byte. The only guarantee C++ gives is that (sizeof(char) == 1) and that (sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)). The "units" used by sizeof are not specified. Probably most current implementations will make char 1 byte, though.


If we're going to be that nitpicky then we have to call it an octet instead of a byte (which doesn't exactly has to be 8 bits). :)

http://drj11.wordpress.com/2007/04/08/sizeofchar-is-1/
http://en.wikipedia.org/wiki/Byte
http://en.wikipedia.org/wiki/Octet_%28computing%29
after spending quite awhile on the internet, I think it is better for cross platform purpose to use char instead of wchar_t. Mainly because wchar_t varies in size according to implementation. GCC, ming, VS all uses different sizes. Mainly either 4 or 2. In windows wchar_t is 2 byte because it treats most wchar_t as UTF-16 by default. But unix treats wchar_t to be that of UTF-32. (correct me if i am wrong)

Does window 7 os encode in UTF-8?? Because if the Window 7 OS does use UTF-8 in its underlying system, why is there a need to provide a LoadLibraryW or other wide char functions variant, since the standard char is capable of displaying the other glyphs?? Unless maybe Window doesn't use UTF-8 at all.

This topic is closed to new replies.

Advertisement