I am currently trying to get countries from rows of data frame. Here is the code that i currently have:
l = [
['[Aydemir, Deniz\', \' Gunduz, Gokhan\', \' Asik, Nejla] Bartin
Univ, Fac Forestry, Dept Forest Ind Engn, TR-74100 Bartin,
Turkey\', \' [Wang, Alice] Lulea Univ Technol, Wood Technol,
Skelleftea, Sweden',1990],
['[Fang, Qun\', \' Cui, Hui-Wang] Zhejiang A&F Univ, Sch Engn, Linan
311300, Peoples R China\', \' [Du, Guan-Ben] Southwest Forestry
Univ, Kunming 650224, Yunnan, Peoples R China',2005],
['[Blumentritt, Melanie\', \' Gardner, Douglas J.\', \' Shaler
Stephen M.] Univ Maine, Sch Resources, Orono, ME USA\', \' [Cole,
Barbara J. W.] Univ Maine, Dept Chem, Orono, ME 04469 USA',2012],
['[Kyvelou, Pinelopi; Gardner, Leroy; Nethercot, David A.] Univ
London Imperial Coll Sci Technol & Med, London SW7 2AZ,
England',1998]]
dataf = pd.DataFrame(l, columns = ['Authors', 'Year'])
This is the data frame. And here is the code:
df = (dataf['Authors']
.replace(r"\bUSA\b", "United States", regex=True)
.apply(lambda x: geotext.GeoText(x).countries))
The problem was that GeoText didn't recognize "USA", but now I also saw that I need to change "England", "Scotland", "Wales" and "Northern Ireland" to "United Kingdom".
How can I extend .replace
to achieve this?
You can use the translate
method of the Series.str
module and pass a dictionary of replacements.
dataf.Authors.str.translate({
'USA': 'United States',
"England": "United Kingdom",
"Scotland": "United Kingdom",
"Wales": "United Kingdom",
"Northern Ireland": "United Kingdom"
})