CSCMM for Armadillo Sparse dense multiplications

Environment : armadillo 4.320.0 and 4.400
Compiler : Intel CPP compiler
OS : Ubuntu 12.04

I am trying to replace the Armadillo's native sparse dense multiplication with Intel MKL's CSCMM call. I wrote the following code.

#include <mkl.h>  
#define ARMA_64BIT_WORD
#include <armadillo>

using namespace std;
using namespace arma;

int  main(int argc, char *argv[])
{
   long long m = atoi(argv[1]);
   long long k = atoi(argv[2]);
   long long n = atoi(argv[3]);
   float density = 0.3;
   sp_fmat A = sprandn<sp_fmat>(m,k,density);
   fmat B = randu<fmat>(k,n);
   fmat C(m,n);
   C.zeros();
 //C = alpha * A * B + beta * C;
 //mkl_scscmm (char *transa, MKL_INT *m, MKL_INT *n, MKL_INT *k, float *alpha, char *matdescra,       
 //float *val, MKL_INT *indx, MKL_INT *pntrb, MKL_INT *pntre, float *b, MKL_INT *ldb, float *beta, 
//float *c, MKL_INT *ldc);
  char transa = 'N';
  float alpha = 1.0;
  float beta = 0.0;
  char* matdescra = "GUUC";
  long long ldb = k;
  long long ldc = m;
  cout << "b4 Input A:" << endl << A;
  cout << "b4 Input B:" << endl << B;
  mkl_scscmm (&transa,&m,&n,&k,&alpha,matdescra,
              const_cast<float *>(A.values), (long long *)A.row_indices,
             (long long *)A.col_ptrs,(long long *)(A.col_ptrs + 1),
             B.memptr(),&ldb,
             &beta, C, &ldc);
  cout << "Input A:" << endl << A;
  cout << "Input B:" << endl << B;
  cout << "Input C:" << endl << C;
  return 0;
}

I compiled the above code and ran it as "./testcscmm 10 4 6". I am getting a segmentation fault (core dumped).

[matrix size: 10x4; n_nonzero: 12; density: 30.00%]

 (0, 0)         1.1123
 (4, 0)        -0.3453
 (8, 0)         0.6081
 (1, 1)         0.6410
 (4, 1)        -0.7121
 (5, 1)         1.1592
 (9, 1)        -1.7189
 (0, 2)         0.4175
 (2, 2)        -0.4001
 (4, 2)         2.2809
 (4, 3)        -2.2717
 (9, 3)         0.2251

b4 Input B:
0.1567   0.9989   0.6126   0.4936   0.5267   0.2833
0.4009   0.2183   0.2960   0.9728   0.7699   0.3525
0.1298   0.5129   0.6376   0.2925   0.4002   0.8077
0.1088   0.8391   0.5243   0.7714   0.8915   0.9190
Input A:
[matrix size: 13715672716573367337x13744746204899078486; n_nonzero: 12; density: 0.00%]

Segmentation fault (core dumped)

For some reason the structure of A is getting corrupted. I have the following questions.

Does MKL_CSCMM modify the input array? If not why should A get corrupted?
I changed the matrix C to native float. Still the error persists.
Valgrind shows some memory errors.

Let me know how to make an intel MKL call using Armadillo's matrix data structures. Especially Sparse dense multiplication.

Solution

Libarmadillo library also must be linked with mkl_ilp64 instead of mkl_lp64. Follow the instructions below.

Building and installing armadillo :

export CXX=icpc
export CC=icpc
export PATH=$PATH:/home/ramki/intel/bin:
Edit $armadillo_root/cmake_aux/Modules/ARMA_FindMKL.cmake, include the PATHS correctly.
Edit $armadillo_root/cmake_aux/Modules/ARMA_FindMKL.cmake, change mkl_lp64 to mkl_ilp64
Edit $armadillo_root/CMakeLists.txt and (1) Change CMAKE_SHARED_LINKER_FLAGS to include the link line by intel link advisor and (2) Change CMAKE_CXX_FLAGS as given by intel link advisor
Run ./configure and make sure MKL library is used for blas and lapack, icpc to be the compiler and the rest to be alright.
Run make .
Verify the linked libraries by running ldd libarmadillo.so. Mainly verify whether it is linked with mkl_ilp64 library and mkl blas and lapack libraries.
Now run make install DESTDIR=local path.

In the C++ program

Don't type case the const ptr of A.values to ptr A.values using const_cast function. It distorts the pointer and not sure whether we are passing the right pointer to cscmm. Instead duplicate just the values of A alone in a separate array.
Make sure you are setting the correct ldb and ldc.
pntrb and pntre could be A.col_ptrs and A.col_ptrs+1.
Use MKL_INT in the place of long long.

Hope this helps everyone.