I have this dataset, and I am trying to create a new variable (n_commitments) that will give me an aggregate number of paragraphs per country. I know this is super basic but I have somehow been stuck for an hour now. I think it is something to do with the fact that both variables are character classes and I want a numeric as an output.
Please help so I can finally move on. Thank you.
structure(list(country = c("Afghanistan", "Afghanistan"), paragraphs = c("The representative of Afghanistan confirmed that his Government would ensure the transparency of its ongoing privatization programme. He stated that his Government would provide reports to WTO Members on developments in its privatisation programme, periodically and upon request, as long as the programme would be in existence, and along the lines of the information already provided to the Working Party during the accession process. The Working Party took note of this commitment. ",
"The representative of Afghanistan confirmed that from the date of accession, State-trading enterprises (including State-owned and State-controlled enterprises, enterprises with special or exclusive privileges, and unitary enterprises) in Afghanistan would make any purchases or sales, which were not for the Government's own use or consumption, solely in accordance with commercial considerations, including price, quality, availability, marketability, transportation and other conditions of purchase or sale. He further confirmed that these State trading enterprises would afford the enterprises of other Members adequate opportunity, in accordance with customary business practice, to compete for participation in purchases from or sales to Afghanistan's State enterprises. The Working Party took note of these commitments. "
)), row.names = 1:2, class = "data.frame")
Columns: 8
$ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanis…
$ category <chr> "State Ownership and Privatization; State-Trading Entities", "State Ownership and Pr…
$ paragraphs <chr> "The representative of Afghanistan confirmed that his Government would ensure the tr…
$ year_complete <int> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, …
$ year_start <int> 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, …
$ accession_duration <int> 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, …
$ wto <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ n_commitments <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", …
Here's how to count the unique paragraphs by country:
df %>%
group_by(country) %>%
summarize(n_unique_paragraphs = n_distinct(paragraphs))
If, as you say, "each row of the data is a unique paragraph", then we can simplify and just count rows:
df %>% group_by(country) %>%
summarize(n = n())
There's also built-in utility function for this:
df %>% count(country)