How to access data in this model class?

I'm using package libmf to do parallel non-negative matrix factorization, i.e., X = WH. I use the method fit from the class MF. As mentioned in below description, the resulting matrices are stored in MF.model.

def fit(self, X):
    """
    factorize the i x j data matrix X into (j, k) (k, i) sized matrices stored in MF.model
    :param X: (n, 3) shaped numpy array [known index and values of the data matrix]
    """
    ensure_width(X, 3)
    d = X.astype(np.float32)
    data_p = d.ctypes.data_as(c_float_p)
    nnx = ctypes.c_int(X.shape[0])
    mf.fit_interface.restype = ctypes.POINTER(MFModel)
    mf.fit_interface.argtypes = (ctypes.c_int, c_float_p, options_ptr)
    out = mf.fit_interface(nnx, data_p, self._options)
    self.model = out.contents

From the GitHub page of the package, the class MFModel is

class MFModel(ctypes.Structure):
    _fields_ = [("fun", ctypes.c_int),
                ("m", ctypes.c_int),
                ("n", ctypes.c_int),
                ("k", ctypes.c_int),
                ("b", ctypes.c_float),
                ("P", c_float_p),
                ("Q", c_float_p)]

Could you explain how to extract information from this class?

# !pip install libmf
import numpy as np
from libmf import mf

X = np.array([[1, 2, 3],
              [0, 11, 0],
              [5, 0, 7]])

row, col = X.nonzero()
values = X[np.nonzero(X)]
res = np.array(list(zip(row.tolist(), col.tolist(), values.tolist())))

engine = mf.MF(k = 2)
engine.fit(res)
engine.model

For convenience, I also put the notebook on Colab here.

Solution

I'm not that deep in that library but here are a few observations that might be interesting: (Building on top of the code provided)

TL;DR

You can either use the engine.q_factors;engine.p_factors to obtain the P;Q matrices or you can iterate through engine.model.P[i]:

print(engine.p_factors())
# [[0.37909135 0.70226544]
#  [2.561905   2.0429273 ]
#  [1.7700745  2.0010414 ]]
print(engine.model.P[0:(engine.model.m * engine.model.k)])
# [0.37909135222435, 0.7022654414176941, 2.5619049072265625, 2.0429272651672363, 1.770074486732483, 2.0010414123535156]

1. P/Q-Factors Methods

The object engine has two interesting methods: p_factors;q_factors. In our setup, these methods spit out two (3, 2) matrices:

P = engine.p_factors()
P
# array([[0.37909135, 0.70226544],
#        [2.561905  , 2.0429273 ],
#        [1.7700745 , 2.0010414 ]], dtype=float32)
Q = engine.q_factors()
Q
# array([[0.87586826, 1.6112198 ],
#        [2.5359864 , 2.095469  ],
#        [1.6843219 , 2.0822709 ]], dtype=float32)

The immediate reaction is: Let's multiply!

RES = np.matmul(P, Q.transpose())
RES
# array([[ 1.463538 ,  2.432946 ,  2.1008186],
#        [ 5.535496 , 10.777846 ,  8.569    ],
#        [ 4.7744694,  8.682005 ,  7.1480856]], dtype=float32)

Now, I'm not deep enough in the (usage of the) library and topic to give an educated assessment of that product.

2. Methods Source Code

The next step is to investigate the two methods p-q_factors. Here is the source code (MF.p_factors):

def p_factors(self):
    if self.model is None:
        return LookupError("no model data is saved, try running model.mf_fit(...) first")
    out = np.zeros(self.model.m * self.model.k)
    out = out.astype(np.float32)
    mf.get_P(ctypes.c_void_p(out.ctypes.data), ctypes.byref(self.model))
    return out.reshape((self.model.m, self.model.k))

This is somewhat underwhelming as it seems the interesting stuff happens in mf.get_P where mf is the underlying c++ library.

3. C++

Continuing our quest, the source code of mf.get_P (in `libmf_interface.cpp) reads:

#ifdef __cplusplus
extern "C" float* get_P(float *out, mf::mf_model *model)
#else
float* get_P(float *out, mf::mf_model *model)
#endif
{
    for (int i = 0; i < model->m; i++){
        for(int j = 0; j < model->k; j++){
            int idx = i * model->k + j;
            out[idx] = model->P[idx];
        }
    }
    return out;
}

This code (very) roughly translates to

def get_P(out, model: mf.MFMODEL) -> np.ndarray:
    for i in range(model.m):
        for j in range(model.k):
            idx = i * model.k + j
            out[idx] = model.P[idx]
    return out

which seems to access data from model.P via indices.

And hence you can access the data as follows engine.model.P[i].