Search code examples
rdataframe

Duplicate subscripts for columns on R dataframe when `within`


under R version 4.4.2 (2024-10-31) -- "Pile of Leaves", latest macos

$ R --vanilla
> load(file="tttdf")
> str(ttt)
'data.frame':   3 obs. of  17 variables:
 $ .mn.r      : num  0 0 0
 $ .sd.r      : num  0 0 0
 $ .mn.g      : num  0 0 0
 $ .sd.g      : num  0 0 0
 $ .cor.r.g   : num  1 1 1
 $ sep        : num  -1 -1 -1
 $ beta.g.ldp : num  0 0 0
 $ beta.dp.ldp: num  1 1 1
 $ beta.r.ldp : num  0 0 0
 $ sep        : num  -2 -2 -2
 $ lastdpr    : num  -3 -5 -6
 $ declinedpr : num  0 2 3
 $ sep        : num  -3 -3 -3
 $ beta.r.lr  : num  0 0 0
 $ beta.g.lg  : num  0 0 0
 $ beta.g.lr  : num  0 0 0
 $ beta.r.lg  : num  0 0 0

ttt <- within(ttt, hello <- 22)

Error in `[<-.data.frame`(`*tmp*`, nl, value = list(hello = 22, .mn.r = c(0,  : 
  duplicate subscripts for columns
> ## make it work
> xxx <- ttt[,1:ncol(ttt)]
> xxx <- within(xxx, hello <- 22)

I have no idea what could be causing this. This is why I can't shorten the example, either --- e.g., by removing columns.


Solution

  • The sep column is duplicated. Subsetting the dataframe using ttt[, 1:ncol(ttt)] automatically repairs the column names, which resolves the issue.

    In the following example, I create a dataframe with two identical column names. It produces the same error you get. When I subset the columns, their names are fixed.

    df <- data.frame(a = 1, a = 2, check.names = FALSE)
    
    within(df, hello <- 22)
    # Error in `[<-.data.frame`(`*tmp*`, nl, value = list(hello = 22, a = 1,  : 
    #  duplicate subscripts for columns
    
    df[1:ncol(df)]
    #   a a.1
    # 1 1   2
    

    Explanation:

    The behavior that subsetting produces unique names is documented in help(`[.data.frame`); column names will be transformed to be unique, using make.unique() , if necessary (e.g., if columns are selected more than once, or if more than one column of a given name is selected if the data frame has duplicate column names). Also see help(make.names) which additionally produces 'valid' names.

    > make.unique(names(df))
    [1] "a"   "a.1"