Search code examples
pythonpandasdataframesubsetisin

How do I subset with .isin (seems like it doesn't work properly)?


I'm a student from Moscow State University and I'm doing a small research about suburban railroads. I crawled information from wikipedia about all stations in Moscow region and now I need to subset those, that are Moscow Central Diameter 1 (railway line) station. I have a list of Diameter 1 stations (d1_names) and what I'm trying to do is to subset from whole dataframe (suburban_rail) with isin pandas method. The problem is it returns only 2 stations (the first one and the last one), though I'm pretty sure there are some more, because using str.contains with absent stations returns what I was looking for (so they are in dataframe). I've already checked spelling and tried to apply strip() to each element of both dataframe and stations' list. Attached several screenshots of my code.

suburban_rail dataframe

stations' list I use to subset

what isin returns

checking manually for Bakovka station

checking manually for Nemchinovka station

Thanks in advance!


Solution

  • Next time provide a minimal reproducible example, such as the one below:

    suburban_rail = pd.DataFrame({'station_name': ['a','b','c','d'], 'latitude': [1,2,3,4], 'longitude': [10,20,30,40]})
    d1_names = pd.Series(['a','c','d'])
    
    suburban_rail
    
        station_name    latitude    longitude
    0   a               1           10
    1   b               2           20
    2   c               3           30
    3   d               4           40
    

    Now, to answer your question: using .loc the problem is solved:

    suburban_rail.loc[suburban_rail.station_name.isin(d1_names)]
    
        station_name    latitude    longitude
    0   a               1           10
    2   c               3           30
    3   d               4           40