I develop an application who has to be compatible with different charsets encoding. To do that, I always use TCHAR*
instead of char*
to define strings. Therefore I use _tcslen
to get the size of my strings.
Today, I saw on the versioning system of my company that one of my workmate edited the line where I wrote _tcslen
to use _tcsclen
instead.
The only link I found who is talking about the particularity of this function is this one and it doesn't explain the difference between those functions.
Can someone explain me the difference between _tcslen
and _tcsclen
?
The _t
prefix means that these are text handling functions (actually macros) that map to different implementations, depending on whether you're compiling for "Unicode" (actually UTF-16) or not.
When you're compiling for Unicode (_UNICODE
is set), they map to the same function, wcslen
, which returns the length of the string in wide (two-byte) characters.
When you're not compiling for Unicode (_MBCS
is set), they map to different functions:
_tcslen
maps to strlen
, which returns the length of the string in bytes. This is intended so that you can allocate buffers of the correct size._tcsclen
maps to _mbslen
, the documentation for which is fairly sparse. I'm guessing, however that the c
in _tcsclen
is intended to mean characters.The difference between characters and byte is that, in a multi-byte encoding, a particular character can take between one and three bytes. Thus: _tcsclen
(_mbslen
) tells you how many characters are in the string, which is useful for rendering, and _tcslen
(strlen
) tells you how many bytes are in the string, which you need for memory allocation.
In general, if you're working primarily on Windows, you'll just compile for Unicode and be done with it. You only need to deal with other character encodings if you're talking to another system (reading/writing files, network messages, etc.), and you'll usually convert to and from UTF-8.
Note that when the Windows SDK documentation refers to "multi-byte", it means older multi-byte encodings, such as Shift-JIS, rather than UTF-8 (which is also a multi-byte encoding).