performance-testing benchmarking intel cpu-usage linpack

Understanding linpack input configuration

        5                         # number of tests
        1000 2000 3000 4000 5000  # number of equations (problem sizes)
        1000 2008 3000 4008 5000  # leading dimensions
        4 4 2 1 1                 # number of times to run a test (trials)
        4 4 4 4 4                 # alignment values (in KBytes)

I have read the documentation, but 2,3, 5 is not clear(I dont know FORTRAN).

Line 2 - Is it asking to create 1000*1000, 2000*2000 ... 5000*5000 matrix? If yes what does an equation has to do with creating a matrix? If No, how complex are that equation is it simple like solving a = 1.2+2.2 or some other complex problems

Line 3 - It may be referring to a submatrix. But what's the point of creating a submatrix? What will happen if all the LDA values are equal to corresponding problem size

Line5- What's exactly alignment values?

Solution

This is the setup for Intel optimized Linpack benchmark. The parameters that you seem to be confused are all related to the way matrices are represented and accessed.

Input parameters

Linpack benchmark solves a system of N simultaneous linear equations.

a11 * x1 + a12 * x2 + .. + a1N * xN = b1
a21 * x1 + a22 * x2 + .. + a2N * xN = b2
...
aN1 * x1 + aN2 * x2 + .. + aNN * xN = bN

This is equivalent to solving a vector equation Ax=b where x and b are N-dimensional vectors and A is an N*N matrix.

An N*N matrix is represented in the memory as an N*N array where individual columns are stored at offsets 0,n,2*n etc. Note that we use a different symbol n instead of N. The reason is that when n=N the algorithm running in several parallel threads may run into a phenomenon known as cache thrashing. Do avoid this it is advised to set n>N inserting some padding between the column data. Often n is selected to be the smallest integer divisible by 8 that is greater than N. So we are done with lines 2 and 3. Line 2 is N and line 3 is n.

Linpack benchmark uses several arrays. Once again to use the cache efficiently it is advised to have all arrays start at the boundary of memory pages. So they are aligned to the 4k boundary. With larger pages it might make sense to set this value to a large number, e.g. 16, or 64. This is our line 5.

Output quantities

To check the solution the Linpack benchmark computes the residiual vector r = Ax - b. The maximum norm of the vector r is the maximum of the absolute values of its elements max(|r_1|,..,|r_N|). This value is called the residual value. It should be on the order of machine epsilon eps, i.e. the smallest number such that 1 + eps > eps. For 64-bit floating point numbers eps is about 1e-15.

To have a measure that is independent of the machine architecture the normalized residual is computed. Linpack documentation gives the following formula for normalized residual.

|| Ax - b ||_oo / ( eps * ( || A ||_oo * || x ||_oo + || b ||_oo ) * n )

Here || X ||_oo denotes the maximum norm. The funny looking subscript _oo represents the infinity symbol. That is || Ax - b ||_oo is the residual, || A ||_oo is the maximum of the absolute values of the elements of the matrix A and || b ||_oo is the maximum absolute value of the right hand side vector.

The notation || X ||_oo comes from analysis. There || X ||_1 denotes the sum of absolute values of the components of X, || X ||_1 = |x1| + ... + |xN|. || X ||_2 = sqrt(|x1|^2 + ... + |xN|^2), || X ||_k = (|x1|^k + ... + |xN|^k)^(1/k). One can prove that when k goes to infinity || X ||_k goes towards max(|x1|,...,|xk|).

You should also have a look at the original High Performance LINPACK.