2007/08/01

在 GCC 中同时使用 std::cout 和 std::wcout 的方法

在 GCC 3.4 以后,下面的程序会有问题:
#include 

int main()
{
  std::locale::global(std::locale("zh_CN.utf8"));
  std::cout << "Hello" << std::endl; // *
  std::wcout << L"你好" << std::endl;
  std::cout << "Hello" << std::endl;
}
程序的输出结果是:
Hello
`}
Hello
如果注释掉 * 行,则输出结果是:
你好
我们发现,如果首先使用了 std::cout ,那么 std::wcout 就会失效;如果首先使用 std::wcout ,那么 std::cout 就会失效。究其原因,C99 7.19.2 规定
5 Byte input/output functions shall not be applied to a wide-oriented stream and wide character input/output functions shall not be applied to a byte-oriented stream. The remaining stream operations do not affect, and are not affected by, a stream's orientation, except for the following additional restrictions: -- Binary wide-oriented streams have the file-positioning restrictions ascribed to both text and binary streams. -- For wide-oriented streams, after a successful call to a file-positioning function that leaves the file position indicator prior to the end-of-file, a wide character output function can overwrite a partial multibyte character; any file contents beyond the byte(s) written are henceforth indeterminate.
C++03 27.3 的规定是
2 Mixing operations on corresponding wide- and narrow-character streams follows the same semantics as mixing such operations on FILEs, as specified in Amendment 1 of the ISO C standard. The objects are constructed, and the associations are established at some time prior to or during first time an object of class ios_base::Init is constructed, and in any case before the body of main begins execution. The objects are not destroyed during program execution.
不清楚依据的是 C89 还是 C99 。不过有一点很清楚,即使现在的 C++ 标准允许 mixing ,以后也可能会禁止。请参考 libstdc++/27569 不过还是有办法的,在程序开始加入一句话:
#include 

int main()
{
  std::locale::global(std::locale("zh_CN.utf8"));
  std::ios::sync_with_stdio(false);
  std::cout << "Hello" << std::endl;
  std::wcout << L"你好" << std::endl;
  std::cout << "Hello" << std::endl;
}
不过这反而是个 bug 。参考 libstdc++/11705 还有一种方法,就是用 codecvt 自己写一个 to_wstring。像这样:
using namespace std;

string to_string(const wstring& wstr, const locale& loc = locale())
{
  typedef codecvt converter;
  const converter& cvt(use_facet(loc));
  mbstate_t stat;
  memset(&stat, 0, sizeof(stat));
  vector buf(cvt.max_length() * wstr.size());
  const wchar_t* wfrom = wstr.c_str();
  const wchar_t* wend = wfrom + wstr.size();
  const wchar_t* wnext = wfrom;
  char* from = &buf[0];
  char* end = from + buf.size();
  char* next = from;
  cvt.out(stat, wfrom, wend, wnext, from, end, next);
  return string(from, next);
}

wstring to_wstring(const string& str, const locale& loc = locale())
{
  typedef codecvt converter;
  const converter& cvt(use_facet(loc));
  mbstate_t stat;
  memset(&stat, 0, sizeof(stat));
  vector buf(str.size());
  const char* from = str.c_str();
  const char* end = from + str.size();
  const char* next = from;
  wchar_t* wfrom = &buf[0];
  wchar_t* wend = wfrom + buf.size();
  wchar_t* wnext = wfrom;
  cvt.in(stat, from, end, next, wfrom, wend, wnext);
  return wstring(wfrom, wnext);
}
其实有的时候还是很需要混合使用 std::coutstd::wcout 的,比如在国际化的程序里我会使用 wstring ,但是如果我想输出 locale 的名字就没办法了,因为 std::locale::name() 返回的是 string ,我只能用 std::cout 输出,这样就只能上面的两种方法了。

没有评论: