I have a tibble with 28 rows:
> al
# A tibble: 28 x 1
lang_name
<chr>
1 Objective-C,Swift,Other
2 Ruby,Shell
3 Ruby,HTML,Shell
4 Java,HTML,Kotlin,Other
5 TypeScript,JavaScript,CSS,Inno Setup,Shell,HTML
6 Vue,JavaScript,CSS,HTML
7 HTML,JavaScript,CSS
8 JavaScript,HTML,CSS,Other
9 NA
10 Vim script,Ruby,Shell,Python,CoffeeScript,Makefile,Other
# ... with 18 more rows
Whicy I got by slicing the other data frame with al <- gh[,'lang_name']
. I want to extract data from every row and place it all in a single list, so I can find unique values.
How do I do that?
I have tried splitting with al <- str_split(al, ",")
, but it returns the following list:
[[1]]
[1] "c(\"Objective-C" "Swift" "Other\"" " \"Ruby"
[5] "Shell\"" " \"Ruby" "HTML" "Shell\""
[9] " \"Java" "HTML" "Kotlin" "Other\""
[13] " \"TypeScript" "JavaScript" "CSS" "Inno Setup"
[17] "Shell" "HTML\"" " \"Vue" "JavaScript"
[21] "CSS" "HTML\"" " \"HTML" "JavaScript"
[25] "CSS\"" " \"JavaScript" "HTML" "CSS"
[29] "Other\"" " NA" " \"Vim script" "Ruby"
[33] "Shell" "Python" "CoffeeScript" "Makefile"
[37] "Other\"" " \"PHP\"" " \"JavaScript" "TypeScript"
[41] "Other\"" " \"JavaScript" "Other\"" " \"JavaScript"
[45] "CSS" "Shell\"" " \"Ruby" "JavaScript"
[49] "HTML" "Vue" "CSS" "Shell\""
[53] " \"Go" "Assembly" "HTML" "C"
[57] "Shell" "Perl\"" " \"Go" "HCL"
[61] "Other\"" " \"JavaScript\"" " \"C++" "JavaScript"
[65] "Python" "Go" "Shell" "C\""
[69] " \n\"JavaScript" "CSS" "HTML" "Other\""
[73] " \"C++" "Cuda" "C" "CMake"
[77] "Java" "Python" "Other\"" " \"JavaScript"
[81] "GLSL\"" " \"JavaScript" "TypeScript" "CSS\""
[85] " \"Kotlin" "C" "Makefile" "HTML"
[89] "C++" "Java" "Other\"" " \"Java"
[93] "Other\"" " \"Python" "Jupyter Notebook" "C++"
[97] "HTML" "Shell" "JavaScript\"" " \"CSS"
[101] "JavaScript" "HTML" "Other\"" " \"HTML"
[105] "CSS" "JavaScript\")"
And unique(al)
simply returns the same string.
I have also tried to put it all as a character:
al <- gh[1,'lang_name']
i = 2
while(i < nrow(gh)) {
al <- paste(al, ",", gh[i+1,'lang_name'])
i = i + 1
}
}
Which results in the following character: [1] "Objective-C,Swift,Other , Ruby,HTML,Shell , Java,HTML,Kotlin,Other , TypeScript,JavaScript,CSS,Inno Setup,Shell,HTML , Vue,JavaScript,CSS,HTML , HTML,JavaScript,CSS , JavaScript,HTML,CSS,Other , NA , Vim script,Ruby,Shell,Python,CoffeeScript,Makefile,Other , PHP , JavaScript,TypeScript,Other , JavaScript,Other , JavaScript,CSS,Shell , Ruby,JavaScript,HTML,Vue,CSS,Shell , Go,Assembly,HTML,C,Shell,Perl , Go,HCL,Other , JavaScript , C++,JavaScript,Python,Go,Shell,C , JavaScript,CSS,HTML,Other , C++,Cuda,C,CMake,Java,Python,Other , JavaScript,GLSL , JavaScript,TypeScript,CSS , Kotlin,C,Makefile,HTML,C++,Java,Other , Java,Other , Python,Jupyter Notebook,C++,HTML,Shell,JavaScript , CSS,JavaScript,HTML,Other , HTML,CSS,JavaScript"
Which I don't know how to convert into string to run unique
on.
If you like tidyverse
/purrr
functions, you can do this in one piped step. stringr::str_split
is a convenient wrapper around stringi::stri_split
. purrr::reduce
lets you apply a function, in this case c
, repeatedly until you have the entire list of vectors that was returned by str_split
reduced into one character vector. unlist
from base R also works well in place of reduce
—I have very purrr
-focused habits with tasks like this, but that doesn't need to be the default for a simple task.
library(tidyverse)
al$lang_name %>%
str_split(",") %>%
reduce(c) %>%
unique()
#> [1] "Objective-C" "Swift" "Other" "Ruby"
#> [5] "Shell" "HTML" "Java" "Kotlin"
#> [9] "TypeScript" "JavaScript" "CSS" "Inno Setup"
#> [13] "Vue" NA "Vim script" "Python"
#> [17] "CoffeeScript" "Makefile"
Created on 2018-06-03 by the reprex package (v0.2.0).