I'm pretty new to C++ and recently I ran across some info on what it means for a variable to be volatile
. As far as I understood, it means a read or write to the variable can never be optimized out of existence.
However a weird situation arises when I declare a volatile
variable that isn't 1, 2, 4, 8 bytes large: the compiler(gnu with C++11 enabled) seemingly ignores the volatile
specifier
#define expand1 a, a, a, a, a, a, a, a, a, a
#define expand2 // ten expand1 here, expand3 to expand5 follows
// expand5 is the equivalent of 1e+005 a, a, ....
struct threeBytes { char x, y, z; };
struct fourBytes { char w, x, y, z; };
int main()
{
// requires ~1.5sec
foo<int>();
// doesn't take time
foo<threeBytes>();
// requires ~1.5sec
foo<fourBytes>();
}
template<typename T>
void foo()
{
volatile T a;
// With my setup, the loop does take time and isn't optimized out
clock_t start = clock();
for(int i = 0; i < 100000; i++);
clock_t end = clock();
int interval = end - start;
start = clock();
for(int i = 0; i < 100000; i++) expand5;
end = clock();
cout << end - start - interval << endl;
}
Their timings are
foo<int>()
: ~1.5sfoo<threeBytes>()
: 0I've tested it with different variables (user-defined or not) that is 1 to 8 bytes and only 1, 2, 4, 8 takes time to run. Is this a bug only existing with my setup or is volatile
a request to the compiler and not something absolute?
PS the four byte versions always take half the time as others and is also a source of confusion
The struct version will be optimized out probably, as the compiler realizes that there's no side effects (no read or write into the variable a
), regardless of the volatile
. You basically have a no-op, a;
, so the compiler can do whatever it pleases it; it is not forced to unroll the loop or to optimize it out, so the volatile
doesn't really matter here. In the case of int
s, there seems to be no optimizations, but this is consistent with the use case of volatile
: you should expect non-optimizations only when you have a possible "access to an object" (i.e. read or write) in the loop. However what constitutes "access to an object" is implementation-defined (although most of the time it follows common-sense), see EDIT 3 at the bottom.
Toy example here:
#include <iostream>
#include <chrono>
int main()
{
volatile int a = 0;
const std::size_t N = 100000000;
// side effects, never optimized
auto start = std::chrono::steady_clock::now();
for (std::size_t i = 0 ; i < N; ++i)
++a; // side effect (write)
auto end = std::chrono::steady_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " ms" << std::endl;
// no side effects, may or may not be optimized out
start = std::chrono::steady_clock::now();
for (std::size_t i = 0 ; i < N; ++i)
a; // no side effect, this is a no-op
end = std::chrono::steady_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " ms" << std::endl;
}
EDIT
The no-op is not actually optimized out for scalar types, as you can see in this minimal example. For struct
's though, it is optimized out. In the example I linked, clang
doesn't optimize the code with no optimization, but optimizes both loops with -O3
. gcc
doesn't optimize out the loops either with no optimizations, but optimizes only the first loop with optimizations on.
EDIT 2
clang
spits out an warning: warning: expression result unused; assign into a variable to force a volatile load [-Wunused-volatile-lvalue]
. So my initial guess was correct, the compiler can optimize out no-ops, but it is not forced. Why does it do it for struct
s and not scalar types is something that I don't understand, but it is the compiler's choice, and it is standard compliant. For some reason it gives this warning only when the no-op is a struct
, and doesn't give the warning when it's a scalar type.
Also note that you don't have a "read/write", you only have a no-op, so you shouldn't expect anything from volatile
.
EDIT 3
From the golden book (C++ standard)
7.1.6.1/8 The cv-qualifiers [dcl.type.cv]
What constitutes an access to an object that has volatile-qualified type is implementation-defined. ...
So it is up to the compiler to decide when to optimize out the loops. In most cases, it follows the common sense: when reading or writing into the object.