I tested the following simple code with NVIDIA's nvcc compiler. When I try to run the program,if the value of N is less than or equal to 512, it runs okay. But when I try to set N greater than 512 and run, it gives a segmentation fault. What's the reason for this?
#define N 1024 //changing value
int main(int argc, char *argv[]) {
float hA[N][N], hB[N][N], hC[N][N];
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
hA[i][j] = 1;
hB[i][j] = 1;
}
}
}
There are basically two ways you can allocate the matrices, the most common is to use a pointer-to-pointer-to-float, and then allocate first the outer dimension, and then allocate the inner dimension in a loop:
float** hA = new float*[N];
for (size_t i = 0; i < N; ++i)
hA[i] = new float[N];
The second way is to have a pointer to an array, and allocate that:
float (*hA)[N] = new (float[N])[N];
But all that is moot since you might as well use std::vector
instead:
std::vector<std::vector<float>> hA(N);
for (size_t i = 0; i < N; ++i)
hA[i].push_back(std::vector<float>(N));