Search code examples
rdataframecounting

Counting different characters in a large dataframe


I'm looking to count the number of times different words appear in a data frame, and then to remake that into a new data frame that shows the counts for each word.

For example, I have a data table like this:

Col1 Col2 Col3 Col4 Col5 Continues...
Passwords1 GHSME12 POWDER2 JOHNC PLOW01 PLANE
Usercode20 HUNG1 GHSME12 PLOW01 GORGE09 JOHNC
Usercode15 PLOW01 GORGE09 JOHNC POWDER2 SYRUP9
Continues... ... ... ... ... ...

I want to be able to count the number of times each word in the data appears for each Col1. While I could do things such as the number of items WordX = wordX, there are hundreds of passwords, making manual counting difficult, so I wonder if I have to use a for loop and a blank data frame in this situation to achieve something like this:

Passwords Passwords1 Usercode20 Usercode15 Continues...
GHSME12 1 1 0 ...
POWDER2 1 0 1 ...
JOHNC 1 1 1 ...
PLOW01 1 1 1 ...
PLANE 1 0 0 ...
HUNG1 0 1 0 ...
GORGE09 0 1 1 ...
SYRUP9 0 0 1 ...

I would appreciate it if someone has a good idea about tackling this. Thank you!


Solution

  • table(cbind(stack(df, -Col1)['values'], df['Col1']))
    
             Col1
    values    Passwords1 Usercode15 Usercode20
      GHSME12          1          0          1
      GORGE09          0          1          1
      HUNG1            0          0          1
      JOHNC            1          1          1
      PLANE            1          0          0
      PLOW01           1          1          1
      POWDER2          1          1          0
      SYRUP9           0          1          0
    

    tidyverse:

    library(tidyverse)
    df %>%
       pivot_longer(-Col1) %>%
       pivot_wider(names_from = Col1, values_from = name, 
                   values_fn = length, values_fill = 0)
    
    # A tibble: 8 x 4
      value   Passwords1 Usercode20 Usercode15
      <chr>        <int>      <int>      <int>
    1 GHSME12          1          1          0
    2 POWDER2          1          0          1
    3 JOHNC            1          1          1
    4 PLOW01           1          1          1
    5 PLANE            1          0          0
    6 HUNG1            0          1          0
    7 GORGE09          0          1          1
    8 SYRUP9           0          0          1