I'm trying to write a for loop that creates a new variable from an existing variable in a dataframe, and does so by iterating over each row in turn. I've tried using for (i in seq_along(data))
, but this only created the new variable correctly for the first 19 rows, and I realised that seq_along
wasn't working as I had expected: instead of creating the sequence based on the number of rows, it had done so based on the number of columns:
seq_along(data)
returns
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
while nrow(data)
returns
[1] 82
and ncol(data)
returns
[1] 19
Additionally, the output for seq(data)
is the same as that for seq_along
, and length(data)
returns [1] 19
.
While I've got a workaround that resolves the issue for the for loop (for (i in 1:nrow(data))
), I'm curious to know what the reason is for seq_along
(and seq
and length
) not behaving the way I'd expected.
Formalizing the comments into a community answer, seq_along(aDataFrame)
sequences along columns in a data frame because a data frame is also a list()
. We can demonstrate this with the typeof()
function as follows with the Motor Trend Cars data frame.
> typeof(mtcars)
[1] "list"
Each element in the list contains one column from a data frame. We can use the names()
function to extract the element names from the list.
> names(mtcars)
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
Therefore, seq_along(mtcars)
will produce a vector of 1:11, corresponding to the number of elements in the list()
.
> seq_along(mtcars)
[1] 1 2 3 4 5 6 7 8 9 10 11