Search code examples
pythonscikit-learnsparse-matrixpython-2.x

Sorting in sparse matrix (Python 2.*)


I'm solving a task in coursera and get stuck with sorting in sparse matrix. The problem is: i make a support vector classification (sklearn.svm.SVC)

    clf = SVC(C=1, kernel='linear', random_state=241)
    clf.fit(X, y)

and as a result got a matrix clf.coef_ of [index_id; weight]. Now i need to extract top N weight and their indices, but weights sorting with clf.coef_.argsort() does not lead to simultaneous index_id sorting. How can i sort this matrix not breaking [index_id; weight] link?


Solution

  • Since by calling argsort you get the sorted indices instead of the sorted array, you can use the result of argsort directly as feature indices.

    So if you have an array [1.5, 2.5, 0.5], the result of argsort is [2, 0, 1], representing that element at index 2 is the lowest element, index 0 is the second lowest, and index 1 is the highest.

    So if you want to extract top-2, you take the last two entries of the array returned by argsort and reversed it as the feature indices, in this case [1, 0]

    This is what I usually do to extract top-N weights from linear SVM:

    coefs = clf.coef_
    if len(set(labels)) == 2:
        coefs = np.array([coefs[0, :], (1-coefs)[0, :]])
    for cls, coef in zip(sorted(set(labels)), coefs):
        top_k = reversed(np.argsort(coef)[-k:])
        keywords = [mapping[idx] for idx in top_k]
        print('%s: %s' % (cls, keywords))
    

    Where labels is the set of classes, and mapping is the feature index to feature name (usually words) mapping.