I have a numpy array with thousands of rows and columns, and I'm wondering how to update each value based on the values in a pandas DataFrame.
For example, let's say my array contains a list of years (here's an incredibly small sample just to give you the basic idea):
[[2020, 2015, 2017],
[2015, 2016, 2016],
[2019, 2018, 2020]]
I want to change each value in the array to "Lat" based on the "Year". So if my pandas dataframe looks like this:
Year | Lat | Lon |
---|---|---|
2020 | 37.2 | 103.45 |
2019 | 46.1 | 107.82 |
2018 | 35.2 | 101.45 |
2017 | 38.6 | 110.62 |
2016 | 29.1 | 112.73 |
2015 | 33.8 | 120.92 |
Then the output array should look like:
[[37.2, 33.8, 38.6],
[33.8, 29.1, 29.1],
[46.1, 35.2, 37.2]]
If my dataset were truly this small, it wouldn't be a problem, but considering I have millions of values in the array and thousands of values in the DataFrame, I'm a little overwhelmed on how to go about this efficiently.
Update:
Perhaps my question might be a bit more complicated than I anticipated. Rather than matching up the years, I'm matching up GPS time, so the numbers don't match up as nicely. Is there a way to take a number in the array and match it up to the closest value in the DataFrame column? In reality, my array would look more like this:
[[2019.99, 2015.2, 2017.1],
[2015.33, 2016.01, 2015.87],
[2019.2, 2018.3, 2020.00]]
Maybe setting Year
as index and using at
(or loc
) would help
# Data
arr = np.array([[2020, 2015, 2017], [2015, 2016, 2016], [2019, 2018, 2020]])
df = pd.DataFrame({'Year': {0: 2020, 1: 2019, 2: 2018, 3: 2017, 4: 2016, 5: 2015},
'Lat': {0: 37.2, 1: 46.1, 2: 35.2, 3: 38.6, 4: 29.1, 5: 33.8},
'Lon': {0: 103.45, 1: 107.82, 2: 101.45, 3: 110.62, 4: 112.73, 5: 120.92}})
df = df.set_index("Year")
np.array([df.loc[years, "Lat"] for years in arr])
# array([[37.2, 33.8, 38.6],
# [33.8, 29.1, 29.1],
# [46.1, 35.2, 37.2]])