Search code examples
rstringcpu-wordletter

letter numbers of all words in a given text and sorting by few letters to many


i need to use for example sentences in tidyverse and taking 5 sample. after taking those 5 sample i need a function that finds letter numbers of all words in that sample and sort the text according to those numbers from words with few letters to words with many letters.


Solution

  • 1. Sorted by length of the words only

    s       <- "The first worm gets snapped early. The sink is the thing in which we pile dishes. A big wet stain was on the round carpet. A fence cuts through the corner lot. Peep under the tent and see the clowns. Next Sunday is the twelfth of the month."
    s_split <- s %>% str_extract_all(stringr::boundary("word")) %>% unlist()
    
    s_split %>% 
      str_length() %>% 
      order() %>% 
      s_split[.] %>% 
      str_c(collapse = " ") %>% 
      str_to_lower()
    
    [1] "a a is in we on is of the the the big wet was the the lot the and see the the the worm gets sink pile cuts peep tent next first early thing which stain round fence under month dishes carpet corner clowns sunday snapped through twelfth"
    

    If you want to analyse multiple strings, use a function:

    order_by_length <- function(input) {
      
      s_split <- input %>% str_extract_all(stringr::boundary("word")) %>% unlist()
      
      s_split %>% 
        str_length() %>% 
        order() %>% 
        s_split[.] %>% 
        str_c(collapse = " ") %>% 
        str_to_lower()
      
    }
    
    string_1 <- "This is a test string"
    string_2 <- "Here we have another sentence as an example"
    string_3 <- "Let's demonstrate even a third string"
    
    string_list <- list(string_1, string_2, string_3)
    map(string_list, order_by_length)
    [[1]]
    [1] "a is this test string"
    
    [[2]]
    [1] "we as an here have another example sentence"
    
    [[3]]
    [1] "a even let's third string demonstrate"
    

    2. Sorted first by length and then alphabetically

    Use split() to sort by length and str_sort() to sort alphabetically:

    order_by_length2 <- function(input) {
      
      input %>% 
        str_extract_all(stringr::boundary("word")) %>% 
        unlist() %>% 
        split(f=str_length(.)) %>% 
        map(str_sort) %>% 
        unlist(use.names = F) %>% 
        str_c(collapse = " ") %>% 
        str_to_lower()
      
    }
    # 1. One string
    order_by_length2(s)
    [1] "a a in is is of on we and big lot see the the the the the the the the the was wet cuts gets next peep pile sink tent worm early fence first month round stain thing under which carpet clowns corner dishes sunday snapped through twelfth"
    
    # 2. Multiple strings
    map(string_list, order_by_length2)
    [[1]]
    [1] "a is test this string"
    
    [[2]]
    [1] "an as we have here another example sentence"
    
    [[3]]
    [1] "a even let's third string demonstrate"