I downloaded the data.
news = datasets.fetch_20newsgroups(subset='all', categories=['alt.atheism', 'sci.space'])
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(newsgroups.data)
y = news.target
print(X.shape)
The shape of X is (1786, 28382)
Next I trained the model and got the coef_ shape
clf = svm.SVC(kernel='linear', random_state=241, C = 1.0000000000000001e-05)
clf.fit(X, y)
data = clf.coef_[0].data
print(data.shape)
The shape is (27189,)
Why the number of features are different?
So in short everything is fine, your weight matrix is in clf.coef_. And it has valid shape, it is a regular numpy array (or scipy sparse array if data is sparse). You can do all needed operations on it, index it etc. What you tried, the .data field is attribute which holds internal storage of the array, which can be of different shape (since it might ignore some redundancies etc.), but the point is you should not use this internal attribute of numpy array for your purpose. It is exposed for low level methods, not for just reading out