I'm working with imported lat/lon data from NASA's fireball data API (https://cneos.jpl.nasa.gov/fireballs/)
lat
/lon
data only have positive valueslat-dir
/lon-dir
lat
values to negative (multiply by -1) if "lat-dir" == 'S'lon
values to negative if "lon-dir" == 'W'Below is roughly how I created my dataframe:
import requests
import pandas as pd
response = requests.get('https://ssd-api.jpl.nasa.gov/fireball.api')
j = response.json()
df = pd.DataFrame.from_dict(j[u'data'])
print( j[u'fields'] )
[u'date', u'energy', u'impact-e', u'lat', u'lat-dir', u'lon', u'lon-dir', u'alt', u'vel']
print( df.head() )
0 1 2 3 4 5 6 7 8
0 2019-12-06 10:19:57 4.6 0.15 3.3 S 37.7 W 19.5 None
1 2019-12-03 06:46:27 4.2 0.14 5.6 N 52.2 W 61.5 None
2 2019-11-28 20:30:54 2.7 0.095 35.7 N 31.7 W 35 13.0
3 2019-11-28 13:22:10 2.6 0.092 None None None None None None
4 2019-11-28 11:55:02 2.5 0.089 22.1 S 25.7 E 22.5 24.7
Lines of code I've attempted:
Attempted to use df.apply()
- though through my searching, I don't think you can easily reference two columns in this manner...
df['lat'] = df['lat'].apply(lambda x: x * -1 if (df['lat-dir'][x] == 'S'))
for i, row in df.iterrows():
if (row['lat-dir'] == 'S'):
df['lat'][i].apply(lambda x: x*-1)
For this, I get 'numpy.float64' object has no attribute 'apply' ?
Attempted to use masking:
if( df['lon-dir'] == 'W'):
df['lon'] * -1
But frankly, I'm stumped on what to do next regarding applying the mask.
EDIT:
dfDate['lat'] = dfDate['lat'].apply(lambda row: row['lon'] * -1 , axis = 1 )
Attempted this as well per comments.
Yes, by either of the following:
A) using a vectorized mask. ==
isn't vectorized; .eq(...)
is. For a vectorized expression, use dfDate['lon-dir'].eq('W')
. Then negate 'lon' column on those rows.
B) using apply()
row-wise: dfDate['lon'] = dfDate.apply(lambda row: ..., axis=1)
- and in your lambda selectively negate row['lon']
based on value row['lon-dir']
- the reason your apply
call failed is you need to apply to the entire column/Series, not individual entries. So: df['lat'].apply(lambda: ..., axis=1)
lat-dir
/lon-dir
are essentially sign columns, you could convert them to +1/-1 when you read them in.
Code:
First some issues with your code that you'll want to fix:
df.columns = j['fields']
response.json()
into pd.DataFrame.from_dict()
is a pain; your dataframe columns become string/'object' rather than the float ones being converted to float. Ideally we should be using pandas.read_json(..., dtype=...)
for this and other convenience reasons.None
-> pandas/numpy nan
(for the sake of the vectorized code we're going to write gracefully handling nan
and not constantly throwing annoying TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
). You can [do this with either astype(...)
, pd.to_numeric()
or df.fillna(value=pd.np.nan, inplace=True)
df2 = df.dropna(how='any', inplace=False)
# probably not with ..., inplace=True
. Note that that preserves the row-indices, so you can always insert the result of prpcessing df2 back into df at the end. Read the dropna
doc and figure out at what exact point you want to drop the nan's.df2 = df[['date','energy','impact-e','lat','lat-dir','lon','lon-dir']].dropna(how='any', inplace=False)
Several ways to translate the lat/lon-dir columns to +/-1 signs:
A1) if you want the 'correct', nan-aware way which doesn't choke on nans...
df2['lat'] = df2['lat-dir'].map({'N': +1, 'S': -1})
df2['lon'] = df2['lon-dir'].map({'E': +1, 'W': -1})
A2) ...or a fast-and-dirty way:
(-1) ** df2['lat-dir'].eq('S')
(-1) ** df2['lon-dir'].eq('W')
B) But you can do this all in one row-wise apply()
function:
def fixup_latlon_signs(row):
row['lat'] = row['lat'] * (-1) ** (row['lat-dir'] == 'S')
row['lon'] = row['lon'] * (-1) ** (row['lon-dir'] == 'W')
return row
df2.apply(fixup_latlon_signs, axis=1)
# Then insert the non-NA rows we processed back into the parent dataframe:
df.update(df2)
# Strictly we can drop 'lat-dir','lon-dir' now...