Search code examples
rsortingpraatphonetics

Making a new column based on two existing columns


The dummy dataframe is from some Linguistics data. Looking to make four new columns based on Column type for 'C1', V1', 'C2', 'V2', 'B', and 'E' from the numeric values in column duration.

     file sentence iSentence wordType FCPhrase V1Vowel sampleQuality phoneme type   duration
1 PS_G.TextGrid    kalla         2        G        F       S             B       B    B 0.18929576
2 PS_G.TextGrid    kalla         2        G        F       S             B       k   C1 0.09941696
3 PS_G.TextGrid    kalla         2        G        F       S             B       ə   V1 0.05025876
4 PS_G.TextGrid    kalla         2        G        F       S             B      ll   C2 0.12619718
5 PS_G.TextGrid    kalla         2        G        F       S             B      a:   V2 0.13646904
6 PS_G.TextGrid    kalla         2        G        F       S             B       E    E 0.53416178

Any help shall be appreciated. If it is a duplicate question, kindly provide the link. Thanks.


Solution

  • It´s always a good idea to create an unique id per row. Address phoneme if its values vary along with duration. Also, spread work just fine, but it's superseded. You can use pivot_wider.

    library(tidyverse)
    
    # Toy data `aux` at the end
    new_aux <- aux %>% 
      
      pivot_wider(
      id_cols     = file:sampleQuality,
      names_from  = type, 
      values_from = c(phoneme, duration), 
      names_sep   = "_",
      names_vary  = "slowest") %>% 
      
      rowid_to_column("unique_id")
    

    Output:

    > glimpse(new_aux)
    Rows: 1
    Columns: 20
    $ unique_id     <int> 1
    $ file          <chr> "PS_G.TextGrid"
    $ sentence      <chr> "kalla"
    $ iSentence     <dbl> 2
    $ wordType      <chr> "G"
    $ FCPhrase      <lgl> FALSE
    $ V1Vowel       <chr> "S"
    $ sampleQuality <chr> "B"
    $ phoneme_B     <chr> "B"
    $ duration_B    <dbl> 0.1892958
    $ phoneme_C1    <chr> "k"
    $ duration_C1   <dbl> 0.09941696
    $ phoneme_V1    <chr> "ə"
    $ duration_V1   <dbl> 0.05025876
    $ phoneme_C2    <chr> "ll"
    $ duration_C2   <dbl> 0.1261972
    $ phoneme_V2    <chr> "a:"
    $ duration_V2   <dbl> 0.136469
    $ phoneme_E     <chr> "E"
    $ duration_E    <dbl> 0.5341618
    

    Toy data:

    aux <- tibble::tribble(
      ~file, ~sentence, ~iSentence, ~wordType, ~FCPhrase, ~V1Vowel, ~sampleQuality, ~phoneme, ~type,  ~duration,
      "PS_G.TextGrid",   "kalla",          2,       "G",     FALSE,      "S",            "B",      "B",   "B", 0.18929576,
      "PS_G.TextGrid",   "kalla",          2,       "G",     FALSE,      "S",            "B",      "k",  "C1", 0.09941696,
      "PS_G.TextGrid",   "kalla",          2,       "G",     FALSE,      "S",            "B",      "ə",  "V1", 0.05025876,
      "PS_G.TextGrid",   "kalla",          2,       "G",     FALSE,      "S",            "B",     "ll",  "C2", 0.12619718,
      "PS_G.TextGrid",   "kalla",          2,       "G",     FALSE,      "S",            "B",     "a:",  "V2", 0.13646904,
      "PS_G.TextGrid",   "kalla",          2,       "G",     FALSE,      "S",            "B",      "E",   "E", 0.53416178
    )
    

    Created on 2024-05-09 with reprex v2.1.0