Search code examples

Get data from pandas on specifics string

So here is my code.

data = pd.read_csv('cast.csv')
data = pd.DataFrame(data)

The data look like this.

                          title  year                        name     type  \
0                Closet Monster  2015                    Buffy #1    actor   
1               Suuri illusioni  1985                      Homo $    actor   
2           Battle of the Sexes  2017                     $hutter    actor   
3          Secret in Their Eyes  2015                     $hutter    actor   
4                    Steve Jobs  2015                     $hutter    actor   
...                         ...   ...                         ...      ...   
74996  Mia fora kai ena... moro  2011     Penelope Anastasopoulou  actress   
74997         The Magician King  2004       Tiannah Anastassiades  actress   
74998        Festival of Lights  2010             Zoe Anastassiou  actress   
74999                Toxic Tutu  2016             Zoe Anastassiou  actress   
75000           Fugitive Pieces  2007  Anastassia Anastassopoulou  actress   

                     character     n  
0                      Buffy 4  31.0  
1                       Guests  22.0  
2              Bobby Riggs Fan  10.0  
3              2002 Dodger Fan   NaN  
4      1988 Opera House Patron   NaN  
...                        ...   ...  
74996       Popi voulkanizater  11.0  
74997  Unicycle Race Attendant   NaN  
74998       Guidance Counselor  20.0  
74999        Demon of Toxicity   NaN  
75000             Laundry Girl  25.0  

[75001 rows x 6 columns]

First I group the data because I want to take only data that have the type="actor" and I sort it by year.

grouped = data.sort_values(['year'],ascending=True).groupby(["type"])
data_2 = pd.DataFrame(grouped.get_group('actor'))

Here is the result.

                                                 title  year  \
21879  From the Manger to the Cross; or, Jesus of Naz...  1912   
20819                               Katastrofen i Dokken  1913   
44273                                   Prins for en dag  1913   
44272                                   Prins for en dag  1913   
17190                                  Ballettens Datter  1913   
...                                                  ...   ...   
44824                                       Devil's Cove  2019   
7343                    Bses Slwl I: The Musical Journey  2020   
35687                              Roses in the Concrete  2020   
24732              Nostradamus Mission 3: Alien Invasion  2020   
28874                                          Inside Me  2023   

                      name   type             character    n  
21879     James D. Ainsley  actor      John the Baptist  6.0  
20819  Hakon Ahnfelt-R?nne  actor        Mental Patient  4.0  
44273         Carl Alstrup  actor    Journalist Herbert  1.0  
44272         Carl Alstrup  actor  Prince Karl Heinrich  1.0  
17190      Svend Aggerholm  actor     Count de Croisset  NaN  
...                    ...    ...                   ...  ...  
44824          Ron Althoff  actor       Officer Bradley  NaN  
7343     Sudarshan Acharya  actor     Sudarshan Acharya  NaN  
35687        Darren Alford  actor                  Seth  NaN  
24732          Misan Akuya  actor      Anunnaki warrior  NaN  
28874       Antonio Alcala  actor                   Max  NaN  

[50000 rows x 6 columns]

Then I want to get data that have the First Name "Aaron". I'm thinking to group the data by name first and then split it so I get the first name.

grouped_2 = data_2.groupby(["name"])
for keys, group in grouped_2:
  letter_name = keys.split(" ")
  if (letter_name[0] == "Aaron"):

The result looks like this.

                 title  year               name   type character     n
8266      The Slingers  2013  Aaron (II) Acosta  actor   Bradley   1.0
8267  Unbreakable Bond  2017  Aaron (II) Acosta  actor     James  13.0
        title  year              name   type character   n
9431  Hitters  2017  Aaron (II) Adair  actor     Sonny NaN
                  title  year              name   type character    n
10426  Night Shift (II)  2009  Aaron (II) Adams  actor      Paul  7.0
           title  year               name   type              character     n
27366  Detention  2011  Aaron (II) Albert  actor  Young Principal Verge  51.0
                  title  year               name   type      character   n
10427  The Standard Man  2009  Aaron (III) Adams  actor  Guest of Zach NaN
             title  year                   name   type  \
32555  Patch Adams  1998  Aaron (III) Alexander  actor   

                     character     n  
32555  Children's Ward Patient  43.0  
                      title  year               name   type character     n
10428  Blood on the Highway  2008  Aaron (VII) Adams  actor   Vampire  76.0
               title  year                   name   type character    n
32556  Show of Hands  2008  Aaron (VII) Alexander  actor     Vince  9.0
                 title  year                  name   type   character   n
32558      Big Mistake  2014  Aaron (XI) Alexander  actor       Nemec NaN
32559  Two Men in Town  2014  Aaron (XI) Alexander  actor  Bar Patron NaN
32557   After the Fall  2014  Aaron (XI) Alexander  actor   Bartender NaN
         title  year        name   type   character     n
585  Surrender  2003  Aaron (XV)  actor  Submissive  19.0
           title  year           name   type character    n
586  Two Coyotes  2001  Aaron (XVIII)  actor   Lorenzo  8.0
                title  year                  name   type character     n
32560  Love Like This  2014  Aaron (XX) Alexander  actor   Bernard  11.0
                      title  year         name   type         character   n
478  Bloodshed and Emeralds  1999  Aaron Aames  actor  Cardinal Feelito NaN
              title  year          name   type                character     n
1875  Blood Justice  1995  Aaron Abbott  actor           Anthony's Thug   NaN
1876     Dead Tides  1997  Aaron Abbott  actor  Lt. Quartermaster Green  17.0
              title  year            name   type      character   n
2929  In the Closet  2009  Aaron Abdullah  actor  Spirit Boy #2 NaN
                   title  year             name   type character     n
4227  The Rhino Brothers  2002  Aaron Abernethy  actor     Extra  61.0
        title  year         name   type     character     n
4517  Dagitab  2014  Aaron Abion  actor  Grad Student  60.0
                                     title  year          name   type  \
5784                           The In-Laws  2003  Aaron Abrams  actor   
5785  The Visual Bible: The Gospel of John  2003  Aaron Abrams  actor   
5778             Resident Evil: Apocalypse  2004  Aaron Abrams  actor   
5780                              Siblings  2004  Aaron Abrams  actor   
5779                                 Sabah  2005  Aaron Abrams  actor   
5769                        Cinderella Man  2005  Aaron Abrams  actor   
5787                                  Zoom  2006  Aaron Abrams  actor   
5786                  Young People Fucking  2007  Aaron Abrams  actor   
5772                         Firehouse Dog  2007  Aaron Abrams  actor   
5773                       Flash of Genius  2008  Aaron Abrams  actor   
5767                                Amelia  2009  Aaron Abrams  actor   
5768         At Home by Myself... with You  2009  Aaron Abrams  actor   
5776                    Jesus Henry Christ  2011  Aaron Abrams  actor   
5775                    Jesus Henry Christ  2011  Aaron Abrams  actor   
5766                    388 Arletta Avenue  2011  Aaron Abrams  actor   
5781                       Take This Waltz  2011  Aaron Abrams  actor   
5782                         The Chicago 8  2011  Aaron Abrams  actor   
5774                    It Was You Charlie  2013  Aaron Abrams  actor   
5777                            Regression  2015  Aaron Abrams  actor   
5770                        Closet Monster  2015  Aaron Abrams  actor   
5783                        The Go-Getters  2017  Aaron Abrams  actor   
5765                         #FromJennifer  2017  Aaron Abrams  actor   
5771                                Code 8  2018  Aaron Abrams  actor   

                   character     n  
5784                 Student  17.0  
5785  Man in Temple Crowd #3   NaN  
5778               Assistant  20.0  
5780                  Pastor   9.0  
5779               Paramedic   8.0  
5769                1928 Fan  67.0  
5787      Corporal Lipscombe   NaN  
5786                    Matt   1.0  
5772     Policeman at Bridge  32.0  
5773             Ian Meillor  44.0  
5767             Slim Gordon   8.0  
5768                     Guy   2.0  
5776           Nurse Stewart  23.0  
5775           Malcolm's Dad  23.0  
5766                    Alex   4.0  
5781                   Aaron  10.0  
5782              Lee Weiner   9.0  
5774                     Tom   3.0  
5777                 Farrell  12.0  
5770             Peter Madly   1.0  
5783                    Owen   1.0  
5765          Ralph Sinclair   NaN  
5771                   Actor   NaN  
              title  year         name   type character     n
7639  Night and Day  2003  Aaron Acker  actor  Teenager  15.0
          title  year            name   type character   n
13552  Director  2008  Aaron Addicott  actor    Cop #6 NaN
                     title  year            name   type       character     n
18754              Slammed  2004  Aaron Aguilera  actor  The Eradicator  19.0
18755  The Dead Sleep Easy  2007  Aaron Aguilera  actor        El Tezca   NaN
18752               Avenge  2014  Aaron Aguilera  actor          Vinnie   NaN
18753  Minutes to Midnight  2017  Aaron Aguilera  actor           Angus   NaN
                   title  year           name   type character     n
18893  Vale Tudo Project  2009  Aaron Aguirre  actor      Lupo   NaN
18892     Know Thy Enemy  2009  Aaron Aguirre  actor     Snaps  24.0
         title  year           name   type character     n
19186  Babagwa  2013  Aaron Agustin  actor   Boatman  17.0
                 title  year         name   type      character     n
23895  Split Decisions  1988  Aaron Akins  actor  Man #2 at Bar  34.0
                           title  year          name   type     character   n
26879  George Takei's Allegiance  2016  Aaron Albano  actor  Tom Maruyama NaN
           title  year           name   type            character   n
27706  Del Playa  2015  Aaron Alberti  actor  High School Student NaN
                   title  year            name   type            character   n
28478  Missouri Trippin'  2016  Aaron Albright  actor  Trail Snakes Leader NaN
             title  year                 name   type character   n
28894  Troubadours  2010  Aaron Alcala-Mosley  actor     Jesse NaN
             title  year             name   type             character     n
30807  Pornography  2009  Aaron Aldorisio  actor  Video Store Customer  36.0
                      title  year           name   type  \
31409  Gospel of Wonderland  2008  Aaron Aleiner  actor   

                    character   n  
31409  Second Plainclothesman NaN  
                   title  year             name   type character     n
31439  Deep in the Heart  2012  Aaron Alejandro  actor   Himself  32.0
             title  year             name   type        character     n
32554  The In-Laws  2003  Aaron Alexander  actor  Frat Brother #1  48.0
                title  year       name   type character     n
36180  Mischief Night  2006  Aaron Ali  actor     Ifzah  63.0
                title  year         name   type character   n
38401  Mr. Dungbeetle  2005  Aaron Allen  actor      Tony NaN
                       title  year            name   type character     n
41468  In the Dead of Winter  2013  Aaron Allister  actor  Guard #1  20.0
               title  year          name   type character   n
43792  Use Your Head  1996  Aaron Alpern  actor    Duncan NaN
                  title  year           name   type     character   n
44433  Die Unsichtbaren  2017  Aaron Altaras  actor  Eugen Friede NaN
         title  year          name   type character   n
47791  Swamper  2005  Aaron Amaral  actor  Loud Man NaN
                                title  year                name   type  \
32562                   Fall to Grace  2005  Aaron D. Alexander  actor   
32561                     Cherry Bomb  2011  Aaron D. Alexander  actor   
32563                         Fess Up  2015  Aaron D. Alexander  actor   
32565              Last Girl Standing  2015  Aaron D. Alexander  actor   
32566               Second Impression  2016  Aaron D. Alexander  actor   
32564  Fun with Hackley: Axe Murderer  2017  Aaron D. Alexander  actor   

                  character   n  
32562  Basketball Player #3 NaN  
32561            Ed Randall NaN  
32563         Alan Chambers NaN  
32565      Police Officer 2 NaN  
32566         Store Manager NaN  
32564            Sugar Duke NaN  
                  title  year                       name   type  character  \
587  Rose of Santa Rosa  1947  Aaron Gonzales' Orchestra  actor  Orchestra   

587  11.0  
              title  year              name   type  \
27947  Terror Tract  2000  Aaron J. Alberts  actor   

                                        character    n  
27947  Lawnmower Man (segment "Make Me An Offer")  6.0  
              title  year             name   type        character   n
4026  I Before Thee  2016  Aaron M. Abelto  actor  Jeffery Douglas NaN
4025   Fight Within  2017  Aaron M. Abelto  actor             Omar NaN
     title  year                  name   type    character   n
4306   LBJ  2016  Aaron Michael Abeyta  actor  Senate Aide NaN
                         title  year                 name   type  character  \
43883  Under the Blood-Red Sun  2014  Aaron Scott Alpeter  actor  Principal   

43883  37.0  

The problem is the data is not sorted by year anymore and the header (title, year, name, type) showed multiple times so the data looks not tidy like the initial data (variable data). How to make the data keep sorted by year and the header showed just one time as the initial data (variable data)?


  • import pandas as pd
    data = pd.read_csv('cast.csv')
    data_2 = data[data['type'] == 'actor']
    output = data_2[data['name'].str.startswith('Aaron')]