Search code examples
rbinary-datafrequency-distribution

Binary database into a frequency table


I am using R to write a report for a class, and I have a pretty big binary database (1 and NA) to indicate presence or absence.

`# A tibble: 149 × 31
    Vide Copé. Ca…¹ Copé.…² Copé.…³ Copé.…⁴ Polyc…⁵ Néréi…⁶ Pecti…⁷ Crang…⁸ Mysid…⁹
   <dbl>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1     0          0       0       0       0       0       0       0       0       0
 2     0          0       0       0       0       0       1       0       0       0
 3     0          0       0       0       0       0       1       0       0       0
 4     0          0       0       0       0       0       0       0       0       1
 5     0          0       0       0       0       0       1       0       0       0
 6     0          0       0       0       0       0       0       0       0       0
 7     0          0       0       0       0       0       1       0       0       0
 8     0          0       0       0       0       0       1       0       0       0
 9     0          0       0       0       0       0       0       0       0       0
10     0          0       0       0       0       0       0       0       0       0
# … with 139 more rows, 21 more variables: `Carides sp.` <dbl>, Amphipodes <dbl>,
#   `Pandalidés(crevette nordique)` <dbl>, Cumacés <dbl>, Isopodes <dbl>,
#   `Crustacés sp.` <dbl>, Éperlan...17 <dbl>, Capucette <dbl>,
#   `Épinoche sp.` <dbl>, `Poisson sp.` <dbl>, Gastéropode <dbl>, Bivalve <dbl>,
#   `Poulamon Atlantique` <dbl>, `Éperlan arc-en-ciel` <dbl>, Éperlan...25 <dbl>,
#   HARENG <dbl>, OSMÉRIDÉ <dbl>, Moronidé <dbl>, `Bar rayé` <dbl>, Baret <dbl>,
#   `Alose savoureuse` <dbl>, and abbreviated variable names ¹​`Copé. Cala.`, …
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names`

I need to represent the frequency of presence for each category :

           Frequency
Vide           0
Copépodes      2
Néréidés sp.   5
etc.

Is there a way for me to do this without recreating a database from scratch? I can't seem to find how online... It's my first time posting a question here, and I'm quite new with R, so I'm not sure how I could fix this.


Solution

  • If we are using the tidyverse, we can summarise, (and pivot_longer if needed):

    library(dplyr)
    library(tidyr)
    
    dat |> 
        summarise(across(everything(), \(x) sum(x, na.rm = TRUE))) |> 
        pivot_longer(everything(), values_to = "Frequency")
    

    with @r2evans' data:

    # A tibble: 5 × 2
      name  Frequency
      <chr>     <int>
    1 a             5
    2 b             6
    3 c             1
    4 d             8
    5 e             7