I'm solving a task in coursera and get stuck with sorting in sparse matrix.
The problem is: i make a support vector classification (sklearn.svm.SVC
)
clf = SVC(C=1, kernel='linear', random_state=241)
clf.fit(X, y)
and as a result got a matrix clf.coef_
of [index_id; weight]
.
Now i need to extract top N weight and their indices, but weights
sorting with clf.coef_.argsort()
does not lead to simultaneous index_id
sorting.
How can i sort this matrix not breaking [index_id; weight]
link?
Since by calling argsort
you get the sorted indices instead of the sorted array, you can use the result of argsort
directly as feature indices.
So if you have an array [1.5, 2.5, 0.5]
, the result of argsort
is [2, 0, 1]
, representing that element at index 2 is the lowest element, index 0 is the second lowest, and index 1 is the highest.
So if you want to extract top-2, you take the last two entries of the array returned by argsort
and reversed it as the feature indices, in this case [1, 0]
This is what I usually do to extract top-N weights from linear SVM:
coefs = clf.coef_
if len(set(labels)) == 2:
coefs = np.array([coefs[0, :], (1-coefs)[0, :]])
for cls, coef in zip(sorted(set(labels)), coefs):
top_k = reversed(np.argsort(coef)[-k:])
keywords = [mapping[idx] for idx in top_k]
print('%s: %s' % (cls, keywords))
Where labels
is the set of classes, and mapping
is the feature index to feature name (usually words) mapping.