Search code examples
pythonpython-3.xpandasdataframedata-processing

How do I modify my Python code below to append a character to the beginning of the string in Pandas?


I am doing a data visualization assignment where I need to take in a dataset and make certain visualizations. Consider the following about the dataset:

  • The columns are represented by longitude (list of strings with a 'E' or 'W' attached to them denoting eastern or western longitude respectively)
  • The rows are represented by the latitude (a column of strings with 'N' or 'S' denoting the northern or southern latitudes respectively)

So I have to read the dataset, convert the latitudes with 'N' attached to them into positive float values and 'S' attached to them as negative float values (the whole data is in string).

Similarly, I have to convert the longitudes with 'E' attached to them into positive float values and 'W' attached to them as negative float values.

Since I am new to Python, Pandas, Numpy I am having a lot of difficulties to achieve the same. I have so far been able to convert the latitudes and longitudes in string format into float format and get rid of the 'N', 'S', 'E', 'W' characters respectively. However, I am unable to figure out how do I make the float values positive or negative based on the characters ('N', 'S', 'E', 'W') prior to float conversion.
Below is the code I have written so far:

import pandas as pd

df = pd.read_csv("Aug-2016-potential-temperature-180x188.txt", skiprows = range(7))
df.columns = ["longitude"]
df = df.longitude.str.split("\t", expand = True)
smaller = df.iloc[::10,:]

print(df.head(10), end = "\n")
print(smaller, end = "\n")
print(df.iloc[1][3], end = "\n")
print(smaller.iloc[2][175], end = "\n")

import numpy as np
import pandas as pd

data = pd.read_csv('~/documents/datasets/viz_a1/Aug-2016-potential-temperature-180x188.txt', skiprows=7)
data.columns = ['longitudes']
data = data['longitudes'].str.split('\t', expand=True)
df = data.iloc[::10,:]
df.head()

# replace 'E' with '' and 'W' with ''
df.loc[0] = df.loc[0].str.replace('E', '').str.replace('W', '')

# convert the longitude values to float values (THIS ONE WORKS)
df.loc[0] = df.loc[0][1:].astype(float)

# replace 'S' with '' and 'N' with ''
df.loc[:][0] = df.loc[:][0].str.replace('S', '').str.replace('N', '')

# convert latitude values into float values (THIS ONE DOES NOT WORK!!)
df.loc[:][0] = df.loc[:][0].astype(float)

# checking if the float values exist
print(df.loc[0][2], ' data-type ', type(df.loc[0][2])) # columns converted into float
print(df.loc[30][0], ' data-type ', type(df.loc[30][0])) # rows not converted into float  

Doubts:

  • How do I convert the values into positive and negative float values based on symbol ('S', 'W' represent -ve float values and 'E', 'N' represent positive float values)
  • How do I successfully convert the latitudes into float values (the code I wrote did not convert the rows into floats; did not throw any error also!)

P.S. The conversions for longitudes was generating a lot of warnings. Would be nice if someone could explain why am I getting those warnings and how to prevent them? (again, I am new to Python and Pandas!)

The dataset can be found here

Here is a screenshot of the dataset:
screenshot of the data just after putting it in data-frame


Solution

  • I would add a few more arguments in the read_csv function to get a dataframe in which the columns are the longitudinal strings and the index is the latitude. The data in your dataframe is now the raster data

    df = pd.read_csv(r'Aug-2016-potential-temperature-180x188.txt',
                     skiprows=8, delimiter='\t', index_col=0)
    

    Then I would convert the longitudinal strings, the columns of the dataframe, to floats with the following code:

    column_series = pd.Series(df.columns)
    df.columns = column_series.apply(lambda x: float(x.replace('E','')) if x.endswith('E') else -float(x.replace('W','')))
    

    After I convert the latitude strings, the index of the dataframe, to floats with this code:

    index_series  = pd.Series(df.index)
    df.index = index_series.apply(lambda x: float(x.replace('N','')) if x.endswith('N') else -float(x.replace('S','')))