I'm reading from files whose format is consistent across platforms, but may be big or little endian depending on the platform the file has been built for. Said platform is defined by a value in the file.
Currently, the way I'm handling endianness is with if statements, one reading the file normally, the other using byteswap intrinsics:
// source.h
class File {
public:
enum class Endian {
Little = 1,
Big = 2
};
};
// ...removed...
// source.cpp
#include "source.h"
#include <fstream>
std::ifstream file;
File::Endian endianness;
// ...removed...
bool GetPlatform() {
uint32_t platform;
file.read(reinterpret_cast<char*>(&platform), sizeof(platform));
if (platform == 1) {
endianness = File::Endian::Little;
}
else if (platform == 2 << 24) {
endianness = File::Endian::Big;
}
// ...removed...
}
void ReadData() {
uint32_t data;
uint32_t dataLittle;
if (endianness == File::Endian::Little) {
file.read(reinterpret_cast<char*>(&data), sizeof(data));
}
else if (endianness == File::Endian::Big) {
file.read(reinterpret_cast<char*>(&data), sizeof(data));
dataLittle = _byteswap_ulong(data);
}
}
My question is, would it be possible to forgo swapping of each value when big endian and instead set the endianness universally? Below is a potential example of what I mean:
bool GetPlatform() {
uint32_t platform;
file.read(reinterpret_cast<char*>(&platform), sizeof(platform));
if (platform == 1) {
// Universally set the endianness to little endian
}
else if (platform == 2 << 24) {
// Universally set the endianness to big endian
}
// ...removed...
}
void ReadData() {
uint32_t data;
file.read(reinterpret_cast<char*>(&data), sizeof(data)); // Data is now read correctly regardless of endianness
}
My main reason for asking this is it would essentially halve the amount of code per function as it would no longer require if statements for endianness.
Additionally, could std::endian be of use for this task? Its examples only indicate use in detecting host endianness, but I'm unsure as to whether it has any further uses or not.
If I understand your situation, your basic issue is that you lack a level of abstraction. You have a bunch of functions that read various data structures from your file. Since these functions directly call std::ifstream::read
, they all need to know both the structure they are reading and the layout of the file. That is two tasks, which is one more than ideal. You would be better off splitting this logic into two levels of abstraction. Let's call the functions for the new level ReadBytes
since they focus on getting bytes from the file. Since Microsoft provides three byteswap intrinsics, there would be three of these functions. Here's a first stab at the one for 4-byte values.
void ReadBytes(std::ifstream & file, File::Endian endianness, uint32_t & data) {
file.read(reinterpret_cast<char*>(&data), sizeof(data));
if (endianness == File::Endian::Big) {
data = _byteswap_ulong(data);
}
}
Note that I've returned the data via a parameter. This is to allow all three functions to have the same name; the type of this parameter tells the compiler which overload to use. (There are other approaches. Coding styles differ.)
There are other improvements to be made, but this is enough to create the new level of abstraction. Your various functions that read data from the file would change to look like the following.
void ReadData() {
uint32_t data;
ReadBytes(file, endianness, data);
// More processing here, maybe more reads.
}
With this small sample code, the savings are not readily apparent. However, you indicated that there could be numerous functions filling the role of ReadData
. This approach shifts the burden of correcting endianness from those functions down to the new ReadBytes
functions. The number of if
statements gets reduced from "hundreds, if not thousands" to three.
This change is motivated by a programming principle often called "don't repeat yourself". The same principle can motivate questions like "why is there more than one function that needs this code?"
Another issue complicating things for you is that you seem to have taken a procedural approach to the problem, rather than an object-oriented one. Symptoms of a procedural approach can include excessive function parameters (e.g. endianness
as a parameter) and global variables. The interface would be easier to use if it was wrapped in a class. Here is a start toward declaring such a class (i.e. a start to the header file). Note that the endianness is private, and that this header has no indication of how endianness is determined. If you have good encapsulation, code outside this class will not care which platform created the file.
// Designed as a drop-in replacement for an ifstream.
// (Non-public inheritance *might* be appropriate if you want to restrict the interface.)
class IFile : public std::ifstream {
private:
File::Endian endianness;
public:
// Mimic the constructors of std::ifstream that you need.
explicit IFile(const std::string & filename);
// It should be possible to use some template magic to simplify the
// definition of these three functions, but since there are only three:
void ReadBytes(uint16_t & data) {
file.read(reinterpret_cast<char*>(&data), sizeof(data));
if (endianness == File::Endian::Big) {
data = _byteswap_ushort(data);
}
}
void ReadBytes(uint32_t & data) {
file.read(reinterpret_cast<char*>(&data), sizeof(data));
if (endianness == File::Endian::Big) {
data = _byteswap_ulong(data);
}
}
void ReadBytes(uint64_t & data) {
file.read(reinterpret_cast<char*>(&data), sizeof(data));
if (endianness == File::Endian::Big) {
data = _byteswap_uint64(data);
}
}
};
This is just a start. The interface needs more work, for one thing. Furthermore, the ReadBytes
functions could be written a bit more portably, perhaps using std::endian
instead of assuming little-endian. (Boost has an endian library that could help you make truly portable code. It even defaults to using intrinsics when they are available.)
The determination of endianness is done in the implementation (source) file. It seems like this should be done as part of opening the file. I've put that as part of the constructor for this example, but you may want more flexibility (use the ifstream
interface as a guideline). In any event, the logic for detecting the platform should not need to be accessible outside the implementation of this class. Here is a start for the implementation.
// Helper function, not needed outside this class.
// This should be either static or put into an anonymous namespace.
static File::Endian ReadEndian(std::ifstream & file) {
uint32_t platform;
file.read(reinterpret_cast<char*>(&platform), sizeof(platform));
if (platform == 1) {
return File::Endian::Little;
}
else if (platform == 2 << 24) {
return File::Endian::Big;
}
// Handle unrecognized platform here
}
IFile::IFile(const std::string & filename) : std::ifstream(filename),
endianness(ReadEndian(file))
{}
At this point, your various ReadData
functions could look like the following (without using global variables).
void ReadData(IFile & file) {
uint32_t data;
file.ReadBytes(data);
}
This is even simpler than what you were looking for since there is even less repeated code. (Casting to char*
and getting the size no longer need to be repeated everywhere.)
In summary, there are two major areas for improvement.
Both of these contribute to making it easier to safely make sweeping changes like supporting a new endianness. There is no pre-built switch to set the endianness, but it's not that hard to build one when your code is better organized.