I am doing a data visualization assignment where I need to take in a dataset and make certain visualizations. Consider the following about the dataset:
So I have to read the dataset, convert the latitudes with 'N' attached to them into positive float values and 'S' attached to them as negative float values (the whole data is in string).
Similarly, I have to convert the longitudes with 'E' attached to them into positive float values and 'W' attached to them as negative float values.
Since I am new to Python, Pandas, Numpy I am having a lot of difficulties to achieve the same. I have so far been able to convert the latitudes and longitudes in string format into float format and get rid of the 'N', 'S', 'E', 'W' characters respectively. However, I am unable to figure out how do I make the float values positive or negative based on the characters ('N', 'S', 'E', 'W') prior to float conversion.
Below is the code I have written so far:
import pandas as pd
df = pd.read_csv("Aug-2016-potential-temperature-180x188.txt", skiprows = range(7))
df.columns = ["longitude"]
df = df.longitude.str.split("\t", expand = True)
smaller = df.iloc[::10,:]
print(df.head(10), end = "\n")
print(smaller, end = "\n")
print(df.iloc[1][3], end = "\n")
print(smaller.iloc[2][175], end = "\n")
import numpy as np
import pandas as pd
data = pd.read_csv('~/documents/datasets/viz_a1/Aug-2016-potential-temperature-180x188.txt', skiprows=7)
data.columns = ['longitudes']
data = data['longitudes'].str.split('\t', expand=True)
df = data.iloc[::10,:]
df.head()
# replace 'E' with '' and 'W' with ''
df.loc[0] = df.loc[0].str.replace('E', '').str.replace('W', '')
# convert the longitude values to float values (THIS ONE WORKS)
df.loc[0] = df.loc[0][1:].astype(float)
# replace 'S' with '' and 'N' with ''
df.loc[:][0] = df.loc[:][0].str.replace('S', '').str.replace('N', '')
# convert latitude values into float values (THIS ONE DOES NOT WORK!!)
df.loc[:][0] = df.loc[:][0].astype(float)
# checking if the float values exist
print(df.loc[0][2], ' data-type ', type(df.loc[0][2])) # columns converted into float
print(df.loc[30][0], ' data-type ', type(df.loc[30][0])) # rows not converted into float
Doubts:
P.S. The conversions for longitudes was generating a lot of warnings. Would be nice if someone could explain why am I getting those warnings and how to prevent them? (again, I am new to Python and Pandas!)
The dataset can be found here
I would add a few more arguments in the read_csv function to get a dataframe in which the columns are the longitudinal strings and the index is the latitude. The data in your dataframe is now the raster data
df = pd.read_csv(r'Aug-2016-potential-temperature-180x188.txt',
skiprows=8, delimiter='\t', index_col=0)
Then I would convert the longitudinal strings, the columns of the dataframe, to floats with the following code:
column_series = pd.Series(df.columns)
df.columns = column_series.apply(lambda x: float(x.replace('E','')) if x.endswith('E') else -float(x.replace('W','')))
After I convert the latitude strings, the index of the dataframe, to floats with this code:
index_series = pd.Series(df.index)
df.index = index_series.apply(lambda x: float(x.replace('N','')) if x.endswith('N') else -float(x.replace('S','')))