Search code examples
pythonif-statementselectconditional-statementsrow

How to launch a conditinated selection of dataframe's row with mixed values


I am trying to use the conditioned selection of interested rows/columns into the followng dataset:

import pandas as pd

already_read = [("Il nome della rosa","Umberto Eco", 1980), 
        ("L'amore che ti meriti","Daria Bignardi", 2014), 
        ("Memorie dal sottsuolo", " Fëdor Dostoevskij", 1864), 
        ("Oblomov", "Ivan Alexandrovich Goncharov ", '/')]

index = range(1,5,1)
data = pd.DataFrame(already_read, columns = ["Books'Title", "Authors", "Publishing Year"], index = index)
data

In the following way:

data[(data['Publishing Year'] >= 1850) & (data['Publishing Year'] <= 1950)]

As you could see, the column I have chosen contains mixed data (int and str) and indeed I have this error after running the code:

TypeError: '>=' not supported between instances of 'str' and 'int'

If please, since I'm moving my very first step with Python, could you please suggest some way to run that code in a way that the string value is excluded or it is read as an integer, possibly by implementing *if statement?* (or another method)?

Thanks


Solution

  • One way to go, would be to use df.apply with a custom function. Something like this:

    def check_int(x):
        if isinstance(x, int):
            return (x >= 1850) & (x <= 1950)
        return False
    
    data[data['Publishing Year'].apply(lambda x: check_int(x))]
    

    Here check_int will return False for every value that is not an int, and apply the evaluation just on the ints. So, we are getting:

    data['Publishing Year'].apply(lambda x: check_int(x))
    
    1    False
    2    False
    3     True
    4    False
    Name: Publishing Year, dtype: bool
    

    And next we use this pd.Series with booleans to select from the data:

    data[data['Publishing Year'].apply(lambda x: check_int(x))]
    
                 Books'Title             Authors Publishing Year
    3  Memorie dal sottsuolo   Fëdor Dostoevskij            1864