Why do I get two pivot values of same type for LU-factorization? LAPACK sgetrf

I'm using LAPACK and I'm getting two types of identical pivot values. I'm using the LAPACK routine sgetrf to compute the LU-factorization

A = L*U*P

This C code below gives the same result as my MATLAB code. Except for one thing, the pivot array have two identical values.

>> A
A =

   0.4746   0.7468   0.3101   0.6307
   0.3254   0.4958   0.5093   0.2149
   0.4385   0.9884   0.5404   0.2465
   0.6281   0.7259   0.2024   0.9674

>> LU = lu(A)
LU =

   0.628080   0.725910   0.202440   0.967430
   0.698239   0.481581   0.399058  -0.429027
   0.518087   0.248672   0.305204  -0.179606
   0.755668   0.411650  -0.023492   0.072064

>>


bool lup(float A[], float LU[], size_t P[], size_t row) {
    integer m = row, lda = row, n = row;
    size_t rowrow = row * row;
    integer* ipiv = (integer*)malloc(row * sizeof(integer));
    integer info;
    memcpy(LU, A, row * row * sizeof(float));
    tran(LU, row, row); /* Transpose - Make it column major */
    sgetrf_(&m, &n, LU, &lda, ipiv, &info);
    tran(LU, row, row); /* Transpose - Make it row major */
    size_t i;
    printf("P:\n");
    for (i = 0; i < row; i++) {
        P[i] = ipiv[i] - 1;
        printf("%i ", P[i]);
    }
    printf("\n");
    printf("LU:\n");
    print(LU, row, row);

    free(ipiv);
    return info == 0;
}

Output:

P:
3 2 2 3

LU:
0.6280800       0.7259100       0.2024400       0.9674300
0.6982390       0.4815813       0.3990585       -0.4290273
0.5180869       0.2486716       0.3052040       -0.1796059
0.7556680       0.4116502       -0.0234923      0.0720639

Question:

Why do I get two identical values as pivot? This don't make sense.

P:
3 2 2 3

The pivots should be index how I will read the rows of the LU matrix. With this one, I could solve the linear equation system

Ax = b

By using this C code

/*
 * This solves Ax=b with LUP-decomposition
 * A [m*n]
 * x [n]
 * b [m]
 * n == m
 * Returns true == Success
 * Returns false == Fail
 */
bool linsolve_lup(float A[], float x[], float b[], size_t row) {
    /* Decleration */
    int32_t i, j;

    float* LU = (float*)malloc(row * row * sizeof(float));
    size_t* P = (size_t*)malloc(row * sizeof(size_t));
    bool ok = lup(A, LU, P, row);

    /* forward substitution with pivoting */
    for (i = 0; i < row; ++i) {
        x[i] = b[P[i]];
        for (j = 0; j < i; ++j) {
            x[i] = x[i] - LU[row * P[i] + j] * x[j];
        }
    }

    /* backward substitution with pivoting */
    for (i = row - 1; i >= 0; --i) {
        for (j = i + 1; j < row; ++j) {
            x[i] = x[i] - LU[row * P[i] + j] * x[j];
        }
        x[i] = x[i] / LU[row * P[i] + i];
    }

    /* Free */
    free(LU);
    free(P);

    return ok;
}

But the problem is that just because the pivot P, I won't get the same result as this MATLAB-code.

A = [0.47462,   0.74679,   0.31008,   0.63073,
        0.32540,   0.49584,   0.50932,   0.21492,
        0.43855,   0.98844,   0.54041,   0.24647,
        0.62808,   0.72591,   0.20244,   0.96743];

    b = [1.588964,
         0.901248,
         0.062029,
         0.142180];

    x = linsolve(A, b)

Output: 

  -44.1551
    6.1363
   15.1259
   21.0440

Solution

"A common error is to think of piv as a permutation vector - it isn't. Rather it is a sequential record of the swaps that have taken place during factorization. An acceptable piv vector could have all the values the same, namely the number of equations. So the first row is swapped with the last. Next the second row is swapped with the last. Then the third and so on".

Text copied from here.

The pseudo-code below should help clarify how to convert this sequence of row changes into a vector of (unique) indices (the thing you are looking for).

    CVector<int, N> ExtractAsIndices ( )
    {
        // Here, we are generating from the "sequential record", the indices that mirror the 1's in the permutation matrix

        CVector<int, N> piv;
        LinSpace ( piv, 0, N-1 ) ;
        for ( int n = 0; n < N; ++n )
        {
            const int i = GetPivot ( n ) ;   // GetPivot accesses your sequential record (ie 3 2 2 3 in your example)
            _ASSERTE ( i >= n ) ;   // We never Pivot with a Row prior to the current one
            if ( i != n )
                swap ( piv ( n ), piv ( i ) ) ;
        }
        return piv;
    }

This example returns a permutation matrix instead

    CMatrix<T, N, N> ExtractAsPermutation ( )
    {
        CMatrix<T, N, N> piv ( 0 ) ;
        const CVector<int, N> indices = ExtractAsIndices ( ) ;
        for ( int n = 0; n < N; ++n )
            piv ( n, indices ( n ) ) = T ( 1 ) ;
        return piv;
    }

Note: these examples use 0-based indexing (from C++) but Matlab uses 1-based indexing, so you will need to take this into account.