I wrote the code below to look for the word "nationality" in a job postings dataset, where I am essentially trying to see how many employers specify that a given candidate must of a particular visa type or nationality.
I know that in the raw data itself (in excel), there are several cases where the job description where the word "nationality" is mentioned.
nationality_finder = function(string){
nationality = c(" ")
split_string = strsplit(string, split = NULL)
split_string = split_string[[1]]
flag = 0
for(letter in split_string){
if(flag > 0){nationality = append(nationality, letter)}
if(letter == "nationality "){flag = 1}
if(letter == " "){flag = flag-0.5}
}
nationality = paste(nationality, collapse = '')
return(nationality)
}
for(n in 1:length(df2$description)){
df2$nationality[n] <- nationality_finder(df2$description[n])
}
df2%>%
view()
Furthermore, the code is working w/out errors, but it is not producing what I am looking for. I am essentially looking to create another variable where 1 indicates that the word "nationality" is mention, and 0 otherwise. Specifically, I am looking for words such as "citizen" and "nationality" under the job description variable. And the text under each job description is extremely long but here, I just gave a summarized version for brevity.
Text example for a job description in the dataset
Title: Event Planner
Nationality: Saudi National
Location: Riyadh, Saudi Arabia
Salary: Open
Salary depends on the candidates skills, experience, and other attributes.
Another job description:
- Have recently graduated or looking for a career change and be looking for
an entry level role (we will offer full training)
- Priority will be taken for applications by U.S. nationality holders
You can try something like this. I'm assuming you've a data.frame
as data, and you want to add a new column.
dats$check <- as.numeric(grepl("nationality",dats$description,ignore.case=TRUE))
dats$check
[1] 1 1 0 1
grepl()
is going to detect in the column dats$description
the string nationality, ignoring case (ignore.case = TRUE
) and as.numeric()
is going to convert TRUE
FALSE
into 1
0
.
With fake data:
dats <- structure(list(description = c("Title: Event Planner\n \n Nationality: Saudi National\n \n Location: Riyadh, Saudi Arabia\n \n Salary: Open\n \n Salary depends on the candidates skills, experience, and other attributes.",
"- Have recently graduated or looking for a career change and be looking for\n an entry level role (we will offer full training) \n \n - Priority will be taken for applications by U.S. nationality holders ",
"do not have that word here", "aaaaNationalitybb"), check = c(1,
1, 0, 1)), row.names = c(NA, -4L), class = "data.frame")