I was trying to get sf::String
into std::filesystem::u8path
. My first method is converting it into an std::string
, (std::string)sfstringbar
but it sees it as a single byte character, i also tried auto x = sfstringbar.toUtf8()
std::string(x.begin(), x.end())
but the same. My second method is to pass it as a char
array which hopefully could read it with the UTF 8 encoding, but still the same happens.
EDIT:
char* makeutf8str(str string) {
std::basic_string<sf::Uint8> utf8 = string.toUtf8();
std::vector<char>* out = new std::vector<char>;
for (auto x = utf8.begin(); x != utf8.end(); x++) {
out->push_back(*x);
}
return &(out->at(0));
}
bool neaxfile::isfile(str file) {
std::cout << "\nThis: " << makeutf8str(file) << "\n";
return std::filesystem::is_regular_file(std::filesystem::u8path(makeutf8str(file)));
}
Here's about the second solution i tried. I have a file called Яyes.txt
as an example, but when i pass in to check if it exists, it says it doesn't. Because the makeutf8str()
function splits Я
into Ð
and ¯
. I can't seem to get the encoder to work properly.
EDIT 2:
str neaxfile::getcwd() {
std::error_code ec;
str path = std::filesystem::current_path(ec).u8string();
if (ec.value() == 0) {
return path;
} else {
return '\0';
}
}
std::vector<str> neaxfile::listfiles() {
std::vector<str> res;
for (auto entry : std::filesystem::directory_iterator((std::string)neaxfile::getcwd())) {
if (neaxfile::isfile(entry.path().wstring())) res.push_back(entry.path().wstring());
}
return res;
}
I tried the first solution below. It no longer prints Я
. But it still doesn't confirm that this is a file. I tried to list the files using that ^
std::filesystem::u8path()
"Constructs a path p
from a UTF-8 encoded sequence of char
s [or char8_t
s (since C++20)], supplied either as an std::string
, or as std::string_view
, or as a null-terminated multibyte string, or as a [first, last) iterator pair."
A std::string
can hold a UTF-8 encoded char sequence (better to use std::u8string
in C++20, though). sf::String::ToUtf8()
returns a UTF-8 encoded std::basic_string<Uint8>
. You can simply cast the UInt8
data to char
to construct a std::string
, there is no need for your makeutf8str()
function to use std::vector<char>
or return a raw char*
at all (especially since it is leaking the std::vector
anyway).
You can use the std::string
constructor that takes a char*
and a size_t
as input, eg:
std::string makeutf8str(const str &string) {
auto utf8 = string.toUtf8();
return std::string(reinterpret_cast<const char*>(utf8.c_str()), utf8.size());
}
Or, you can use the std::string
constructor that takes a range of iterators as input (despite your claim, this should work just fine), eg:
std::string makeutf8str(const str &string) {
auto utf8 = string.toUtf8();
return std::string(utf8.begin(), utf8.end());
}
Either way will work fine with std::cout
and std::filesystem::u8path()
, eg:
bool neaxfile::isfile(const str &file) {
auto utf8 = makeutf8str(file);
std::cout << "\nThis: " << utf8 << "\n";
return std::filesystem::is_regular_file(std::filesystem::u8path(utf8));
}
That being said, the Unicode character Я
is encoded in UTF-8 as bytes 0xD0 0xAF
, which when interpreted as Latin-1 instead of UTF-8 will appear as Я
. This means the std::string
data is properly UTF-8 encoded, it is just not being processed correctly. For instance, if your console cannot handle UTF-8 output, then you will see Я
instead of Я
. But, u8path()
should process the UTF-8 encoded std::string
just fine, and convert it to the filesystem's native encoding as needed. But then, there is no guarantee that the underlying filesystem will actually handle a Unicode filename like Яyes.txt
properly, but that would be an OS issue, not a C++ issue.
UPDATE: your listfiles()
function is not making use of UTF-8 at all when using directory_iterator
. It is type-casting the sf::String
from getcwd()
to an ANSI encoded std::string
(which is a lossy conversion), not to a UTF-8 encoded std::string
. But worse, that sf::String
is being constructed by getcwd()
from a UTF-8 encoded std::string
but the std::string
constructor of sf::String
requires ANSI by default, not UTF-8 (to fix that, you have to give it a UTF-8 std::locale
). So, you are passing through several lossy conversions trying to get a string from the std::filesystem::pathreturned from
std::filesystem::current_pathto
std::filesystem::directory_iterator`.
sf::String
can convert to/from std::wstring
, which std::filesystem::path
can also use, so there is no need to go through UTF-8 and std::filesystem::u8path()
at all, at least on Windows where std::wstring
uses UTF-16 and Windows underlying filesystem APIs also use UTF-16.
Try this instead:
bool neaxfile::isfile(const str &file) {
std::wstring wstr = file;
std::wcout << L"\nThis: " << wstr << L"\n";
return std::filesystem::is_regular_file(std::filesystem::path(wstr));
}
str neaxfile::getcwd() {
std::error_code ec;
str path = std::filesystem::current_path(ec).wstring();
if (ec.value() == 0) {
return path;
} else {
return L"";
}
}
std::vector<str> neaxfile::listfiles() {
std::vector<str> res;
std::filesystem::path cwdpath(neaxfile::getcwd().wstring());
for (auto entry : std::filesystem::directory_iterator(cwdpath) {
str filepath = entry.path().wstring();
if (neaxfile::isfile(filepath)) res.push_back(filepath);
}
return res;
}
If you really want to use UTF-8 for conversions between C++ strings and SFML strings, then try this instead to avoid any data loss:
std::string makeutf8str(const str &string) {
auto utf8 = string.toUtf8();
return std::string(reinterpret_cast<const char*>(utf8.c_str()), utf8.size());
}
str fromutf8str(const std::string &string) {
return str::fromUtf8(utf8.begin(), utf8.end());
}
bool neaxfile::isfile(const str &file) {
auto utf8 = makeutf8str(file);
std::cout << "\nThis: " << utf8 << "\n";
return std::filesystem::is_regular_file(std::filesystem::u8path(utf8));
}
str neaxfile::getcwd() {
std::error_code ec;
auto path = std::filesystem::current_path(ec).u8string();
if (ec.value() == 0) {
return fromutf8str(path);
} else {
return "";
}
}
std::vector<str> neaxfile::listfiles() {
std::vector<str> res;
auto cwdpath = std::filesystem::u8path(makeutf8str(neaxfile::getcwd()));
for (auto entry : std::filesystem::directory_iterator(cwdpath)) {
str filepath = fromutf8str(entry.path().u8string());
if (neaxfile::isfile(filepath)) res.push_back(filepath);
}
return res;
}
That being said, you are doing a lot of unnecessary conversions between C++ strings and SFML strings. You really shouldn't be using SFML strings when you are not directly interacting with SFML's API. You really should be using C++ strings as much as possible, especially with the <filesystem>
API, eg:
bool neaxfile::isfile(const std::string &file) {
std::cout << L"\nThis: " << file << L"\n";
return std::filesystem::is_regular_file(std::filesystem::u8path(file));
}
std::string neaxfile::getcwd() {
std::error_code ec;
std::string path = std::filesystem::current_path(ec).u8string();
if (ec.value() == 0) {
return path;
} else {
return "";
}
}
std::vector<std::string> neaxfile::listfiles() {
std::vector<std::string> res;
auto cwdpath = std::filesystem::u8path(neaxfile::getcwd());
for (auto entry : std::filesystem::directory_iterator(cwdpath)) {
auto filepath = entry.path().u8string();
if (neaxfile::isfile(filepath)) res.push_back(filepath);
}
return res;
}
Alternatively:
bool neaxfile::isfile(const std::wstring &file) {
std::wcout << L"\nThis: " << file << L"\n";
return std::filesystem::is_regular_file(std::filesystem::path(file));
}
std::wstring neaxfile::getcwd() {
std::error_code ec;
auto path = std::filesystem::current_path(ec).wstring();
if (ec.value() == 0) {
return path;
} else {
return L"";
}
}
std::vector<std::wstring> neaxfile::listfiles() {
std::vector<std::wstring> res;
std::filesystem::path cwdpath(neaxfile::getcwd());
for (auto entry : std::filesystem::directory_iterator(cwdpath)) {
auto filepath = entry.path().wstring();
if (neaxfile::isfile(filepath)) res.push_back(filepath);
}
return res;
}
A better option is to simply not pass around strings at all. std::filesystem::path
is an abstraction to help shield you from that, eg:
bool neaxfile::isfile(const std::filesystem::path &file) {
std::wcout << L"\nThis: " << file.wstring() << L"\n";
return std::filesystem::is_regular_file(file);
}
std::filesystem::path neaxfile::getcwd() {
std::error_code ec;
auto path = std::filesystem::current_path(ec);
if (ec.value() == 0) {
return path;
} else {
return {};
}
}
std::vector<std::filesystem::path> neaxfile::listfiles() {
std::vector<std::filesystem::path> res;
for (auto entry : std::filesystem::directory_iterator(neaxfile::getcwd())) {
auto filepath = entry.path();
if (neaxfile::isfile(filepath)) res.push_back(filepath);
}
return res;
}