Better looping script in R

I have a working script that does the following: It loops through a dataframe with taxon names, finding the corresponding numerical ID for each taxon name. If the ID is NA it keeps the name from the dataframe. It writes this into a new dataframe. It works, but I think its a little messy and I am looking for any suggestions to improve it or make it simpler. I am using the package taxize to get the IDs but the suggestions don't need to necessarily use it. Here is the example data and script:

Kingdom	Phylum	Class	Order	Family	Genus
Bacteria	Firmicutes	Clostridia	Eubacteriales	Lachnospiraceae	Dorea
Bacteria	Firmicutes	Clostridia	Eubacteriales	Oscillospiraceae	GGB9634
Bacteria	Firmicutes	Clostridia	Eubacteriales	Clostridiaceae	Clostridiaceae_unclassified

structure(list(Kingdom = c("Bacteria", "Bacteria", "Bacteria"
), Phylum = c("Firmicutes", "Firmicutes", "Firmicutes"), Class = c("Clostridia", 
"Clostridia", "Clostridia"), Order = c("Eubacteriales", "Eubacteriales", 
"Eubacteriales"), Family = c("Lachnospiraceae", "Oscillospiraceae", 
"Clostridiaceae"), Genus = c("Dorea", "GGB9634", "Clostridiaceae_unclassified"
)), row.names = c(NA, -3L), class = "data.frame")

script:

sad<-taxa[1:3,] # dataframe shown above
numID <- data.frame(sad$Kingdom) # dataframe to store the IDs
taxize::taxize_options(ncbi_sleep = 0.9) # adjust http request rate

for(r in 1:length(sad[,1])){
  for(c in 2:length(sad[1,])){
    sadID<-taxize::get_uid(sad[r,c], ask=F)[1]
    if(is.na(sadID)){
      numID[r,c]<- sad[r,c]
    }
    else{
      numID[r,c] <- sadID
    }
  }}
names(numID)<-names(sad)

#numID (wanted output)
structure(list(Kingdom = c("Bacteria", "Bacteria", "Bacteria"),
Phylum = c("Firmicutes", "Firmicutes", "Firmicutes"), 
Class = c("186801","186801", "186801"),
Order = c("186802", "186802", "186802"), 
Family = c("186803", "216572", "31979"), 
Genus = c("189330","GGB9634", "Clostridiaceae_unclassified")), row.names = c(NA,3L), class = "data.frame")

If I were to use this script (or any other), but wanted to start with column 4 instead of 2, how could I do that?, since I am using c both for sad and numID.

The script is working but I want to improve it.

Solution

get_uid takes vectors. You can simply do:

numID <- sad
numID[] <- lapply(
  sad, 
  \(x) dplyr::coalesce(taxize::get_uid(x, ask = FALSE) |> unclass(), x)
)

This passes each column to get_uid and writes the results to numID. unclass will drop the uid class and convert to character. coalesce will replace NA values with the original names. You could improve things by only requesting IDs for unique values.

  Kingdom     Phylum  Class  Order Family                       Genus
1       2 Firmicutes 186801 186802 186803                      189330
2       2 Firmicutes 186801 186802 216572                     GGB9634
3       2 Firmicutes 186801 186802  31979 Clostridiaceae_unclassified