Today I talked to a friend about the differences between statically and dynamically typed languages (more info about the difference between static and dynamic typed languages in this SO question). After that, I was wondering what kind of trick can be used in C++ to emulate such dynamic behavior.
In C++, as in other statically typed languages, the variable type is specified at compile time. For example, let's say I have to read from a file a big amount of numbers, which are in the majority of the cases quite small, small enough to fit in an unsigned short
type. Here comes the tricky thing, a small amount of these values are much bigger, bigger enough to need an unsigned long long
to be stored.
Since I assume I'm going to do calculations with all of them I want all of them stored in the same container in consecutive positions of memory in the same order than I read them from the input file.. The naive approach would be to store them in a vector
of type unsigned long long
, but this means having typically up to 4 times extra space of what is actually needed (unsigned short
2 bytes, unsigned long long
8 bytes).
In dynamically typed languages, the type of a variable is interpreted at runtime and coerced to a type where it fits. How can I achieve something similar in C++?
My first idea is to do that by pointers, depending on its size I will store the number with the appropriate type. This has the obvious drawback of having to also store the pointer, but since I assume I'm going to store them in the heap anyway, I don't think it matters.
I'm totally sure that many of you can give me way better solutions than this ...
#include <iostream>
#include <vector>
#include <limits>
#include <sstream>
#include <fstream>
int main() {
std::ifstream f ("input_file");
if (f.is_open()) {
std::vector<void*> v;
unsigned long long int num;
while(f >> num) {
if (num > std::numeric_limits<unsigned short>::max()) {
v.push_back(new unsigned long long int(num));
}
else {
v.push_back(new unsigned short(num));
}
}
for (auto i: v) {
delete i;
}
f.close();
}
}
Edit 1: The question is not about saving memory, I know in dynamically typed languages the necessary space to store the numbers in the example is going to be way more than in C++, but the question is not about that, it's about emulating a dynamically typed language with some c++ mechanism.
Options include...
The code specifies a set of distinct, supported types T0, T1, T2, T3..., and - conceptually - creates a management type to
struct X
{
enum { F0, F1, F2, F3... } type_;
union { T0 t0_; T1 t1_; T2 t2_; T3 t3_; ... };
};
Because there are limitations on the types that can be placed into union
s, and if they're bypassed using placement-new
care needs to be taken to ensure adequate alignment and correct destructor invocation, a generalised implementation becomes more complicated, and it's normally better to use boost::variant<>
. Note that the type_
field requires some space, the union
will be at least as large as the largest of sizeof t0_
, sizeof t1_
..., and padding may be required.
It's also possible to have a templated constructor and assignment operator that call typeid
and record the std::type_info
, allowing future operations like "recover-the-value-if-it's-of-a-specific-type". The easiest way to pick up this behaviour is to use boost::any
.
You can create a base type with virtual destructor and whatever functions you need (e.g. virtual void output(std::ostream&)
), then derive a class for each of short
and long long
. Store pointers to the base class.
In your particular scenario, you've only got a few large numbers: you could do something like reserve one of the short
values to be a sentinel indicating that the actual value at this position can be recreated by bitwise shifting and ORing of the following 4 values. For example...
10 299 32767 0 0 192 3929 38
...could encode:
10
299
// 32767 is a sentinel indicating next 4 values encode long long
(0 << 48) + (0 << 32) + (192 << 16) + 3929
38
The concept here is similar to UTF-8 encoding for international character sets. This will be very space efficient, but it suits forward iteration, not random access indexing a la [123]
.