Search code examples
rcsvdataframefor-looptibble

for loop over a tibble in R


WORKING ON RSTUDIO

SO, I have the basic dataset of titanic.csv, which has the fifth column as Age. What I'm trying to do is store that entire column of age in a variable and run a for loop on it. When I try doing so, it shows that the variable is a tibble.

the command that I used to read the csv file and store it in a variable named tata is:

tata <- read_csv("titanic.csv")

the csv file is in the same directory as the .r file, so reading the file ain't any issue here.

getting the fifth column of age in a variable x

x <- tata[,5]

when I print x I get this in the console:

console window

Then I try to get a multiple line print statement that says: The nth person has age: (the variable_value)

for (age in x) {
  print(paste("The", n , "th person has age:", age))
  n = n + 1
}

I GET THE OUTPUT AS:

  [1] "The 1 th person has age 22"   "The 1 th person has age 38"  
  [3] "The 1 th person has age 26"   "The 1 th person has age 35"  
  [5] "The 1 th person has age 35"   "The 1 th person has age 27"  
  [7] "The 1 th person has age 54"   "The 1 th person has age 2"   
  [9] "The 1 th person has age 27"   "The 1 th person has age 14"  
 [11] "The 1 th person has age 4"    "The 1 th person has age 58"

and this goes on till 887 rows

I hope you understand what I need here. Any help would be appreciated.


Solution

  • As you have casted the data to a tibble (i.e., read_csv and not read.csv) you need to call

    x <- tata$Age
    

    instead of

    x <- tata[, 5]
    

    This is as the latter returns a tibble again and thus paste(..., print(x)) works differently than what you'd expect.

    Addendum

    for loops are usually a bad idea in R. Have a look at the *apply family of functions or the purrr package.

    E.g.,

    library(dplyr)
    tata %>%
      pull(Age) %>%
      sapply(function(age) paste("Person is", age, "years old"))