Given a Graphlab.SFrame
object with the following column names:
>>> import graphlab
>>> sf = graphlab.SFrame.read_csv('some.csv')
>>> s.column_names()
['Dataset', 'Domain', 'Score', 'Sent1', 'Sent2']
One could easily drop the rows with "not applicable" (NA) / None value in a particular column, e.g. to drop rows with NA values for the "Score" column, I could do this:
>>> sf.dropna('Score')
Or to replace the None value with a certain value (let's say -1), I could do this:
>>> sf.fillna('Score', -1)
After checking the SFrame docs from https://dato.com/products/create/docs/generated/graphlab.SFrame.html, there isn't a built-in function to find the rows that contains None for a certain column, something like sf.findna('Score')
. Or possibly I might have missed it.
If there is such a function, what is it called?
If there isn't how should I extract the rows where there's a specified column in that row with NA values?
I think you can use a boolean array to identify the rows with missing values for a given column.
>>> import graphlab
>>> sf = graphlab.SFrame({'a': [1, 2, None, 4],
... 'b': [None, 3, 1, None]})
>>> mask = sf['a'] == None
>>> mask
dtype: int
Rows: 4
[0, 0, 1, 0]