对码当歌,猿生几何?

std::mbrlen

From cppreference.com
< cpp‎ | string‎ | multibyte
Defined in header <cwchar>
std::size_t mbrlen( const char* s, std::size_t n, std::mbstate_t* ps);

Determines the size, in bytes, of the remainder of the multibyte character whose first byte is pointed to by s, given the current conversion state ps.

This function is equivalent to the call std::mbrtowc(nullptr, s, n, ps?ps:&internal) for some hidden object internal of type std::mbstate_t, except that the expression ps is evaluated only once.

Parameters

s - pointer to an element of a multibyte character string
n - limit on the number of bytes in s that can be examined
ps - pointer to the variable holding the conversion state

Return value

0 if the next n or fewer bytes complete the null character.

The number of bytes (between 1 and n) that complete a valid multibyte character

(size_t)-1 if encoding error occurs

(size_t)-2 if the next n bytes are part of a possibly valid multibyte character, which is still incomplete after examining all n bytes

Example

#include <clocale>
#include <string>
#include <iostream>
#include <cwchar>
 
int main()
{
    // allow mbrlen() to work with UTF-8 multibyte encoding
    std::setlocale(LC_ALL, "en_US.utf8");
    // UTF-8 narrow multibyte encoding
    std::string str = u8"水"; // or u8"\u6c34" or "\xe6\xb0\xb4"
    std::mbstate_t mb = std::mbstate_t();
    int len1 = std::mbrlen(&str[0], 1, &mb);
    if(len1 == -2) {
        std::cout << "The first 1 byte of " << str
                  << " is an incomplete multibyte char (mbrlen returns -2)\n";
    }
    int len2 = std::mbrlen(&str[1], str.size()-1, &mb);
    std::cout << "The remaining " << str.size()-1 << " bytes of " << str
              << " hold " << len2 << " bytes of the multibyte character\n";
    std::cout << "Attempting to call mbrlen() in the middle of " << str
              << " while in initial shift state returns "
              << (int)mbrlen(&str[1], str.size(), &mb) << '\n';
 
}

Output:

The first 1 byte of 水 is an incomplete multibyte char (mbrlen returns -2)
The remaining 2 bytes of 水 hold 2 bytes of the multibyte character
Attempting to call mbrlen() in the middle of 水 while in initial shift state returns -1

See also

converts the next multibyte character to wide character, given state
(function)
returns the number of bytes in the next multibyte character
(function)
[virtual]
calculates the length of the externT string that would be consumed by conversion into given internT buffer
(virtual protected member function of std::codecvt)
C documentation for mbrlen