Search code examples
pythonpandasdataframedata-preprocessing

Preprocessing of rows of a DataFrame by numeric characters of specified size


Let it be the following Python Panda DataFrame:

                NAME  NUM_OWNERS             NUM_DOCS       NUM_RESIDENTS
               Total   23900137              21028886         44571130.0   
        Macael-04062     366607                324413           727945.0   
               Spain    4283950               3642683          8464411.0   
      Badalona-08911       5829                  6250            15480.0   
      Vallecas-28031       5691                  5215            10358.0   

I want to keep the rows containing a 5-digit number and modify the value of the NAME column by that number.

Resulting DataFrame:

                NAME  NUM_OWNERS             NUM_DOCS       NUM_RESIDENTS
               04062     366607                324413           727945.0     
               08911       5829                  6250            15480.0   
               28031       5691                  5215            10358.0   

Solution

  • Let us try use contains filter then split assign the new value

    out = df[df.NAME.str.contains('-')].assign(NAME = lambda x : x['NAME'].str.split('-').str[-1])
    Out[83]: 
        NAME  NUM_OWNERS  NUM_DOCS  NUM_RESIDENTS
    1  04062      366607    324413       727945.0
    3  08911        5829      6250        15480.0
    4  28031        5691      5215        10358.0