Is it bad practice to rely on column names in tables passed as function arguments?

Talking mainly about R.

Is it bad practice to rely on specific column names being present in a data.frame or tibble that is passed as a function argument? Or should the function also accept relevant column names as arguments?

Are there any widely used libraries that follow this convention?

Solution

I think it's ok in scenarios where it's reasonable to expect the input conforms to a specification. Especially in packages for small audiences, it doesn't make sense for you to spend a lot of time developing a very general function when the input isn't going to vary.

If you need to expand the generality of the function in the future consider (a) accepting other variable names in your function's parameters, with the default of your current name or (b) something more ambitious like formulas.

Regardless if the variables are hard-coded, consider using something like checkmate. You can provide better context for the user with stop(), but I prefer checkmate for smaller audiences.

lm_nonmissing_only <- function( d, predictor_name="x" ) {
  checkmate::assert_numeric(d[[predictor_name]], any.missing = F)
  checkmate::assert_numeric(d$y                , any.missing = F) # This variable name is still hard-coded

  lm(d$y ~ d[[predictor_name]])
}