Search code examples
rfunctioncoercion

Apply function converting data frame to character when error returned from API


I have written a function to query the CMS National Plan and Provider Enumeration System (NPPES) API.

I wish to pass in a data frame of NPI values and return their addresses.

Some of the NPI values are no longer valid and I have tried to build some error handling for these scenarios.

My error handling if else statement specifies to use a data frame of dimensions 1 row by 6 columns and I have inserted the errored NPI value into row 1 column 1.

When I use the apply function against my data frame I get a list [1x6] for all the successful API calls but the errored values are just a single character vector.

I have tried to debug this issue but I can't figure out where the conversion from data frame to character is happening. I would be very grateful if anyone could help me please.

Here is a dataframe of values I wish to query:

install.packages("pacman")
library(pacman)

pacman::p_load(tidyverse,data.table,httr,jsonlite)

values <- c(1598727430,
            1083632731,
            1710983663) # LAST VALUE PRODUCES THE ERROR CASE

npi_values <- data.frame(values)

here is the URL for the API:

path <- "https://npiregistry.cms.hhs.gov/api/?"

My function:

# CREATE A FUNCTION TO PULL NPI INFORMATION FROM THE NPI REGISTRY
getNPI <- function(object) {
  request <- httr::GET(
    url = path,
    query = list(
      version = "2.0",
      number = object
    )
  )
  Sys.sleep(0.25)
  
  warn_for_status(request)

  npi_details <- content(request,
    as = "text",
    encoding = "UTF-8"
  ) %>%
    fromJSON(.,
      flatten = TRUE
    ) %>%
    data.frame()
  # IF THE API THROWS BACK A RESULT WHERE THE COLUMN NAMES CONTAIN 'ERROR'
  # THEN ASSIGN ALL THE ROW VALUES TO NA AND ADD THE NPI VALUE TO THE FIRST
  # COLUMN
  if (any(grepl("ERROR", toupper(colnames(npi_details))))) {
    
    npi_details <- as.data.frame(matrix(NA,ncol = 6,nrow = 1)) %>% 
      dplyr::rename(`NPI NUMBER` = V1,
             `CMS REF ADDRESS 1` = V2,
             `CMS REF ADDRESS 2` = V3,
             `CMS REF CITY` = V4,
             `CMS REF STATE` = V5,
             `CMS REF ZIP` = V6)
    
    npi_details[1,1] <- object
    
    # ELSE IF THE DATA FRAME DOES NOT CONTAIN 'ERROR' THEN RUN THIS CHUNK
  } else {
    select(npi_details, contains(c("addresses", "number"))) %>%
      unnest(c(contains("address"))) %>%
      filter(address_purpose == "MAILING") %>%
      rename_all(.funs = toupper) %>%
      select(
        `NPI NUMBER` = RESULTS.NUMBER,
        -COUNTRY_CODE,
        -COUNTRY_NAME,
        -ADDRESS_PURPOSE,
        -ADDRESS_TYPE,
        `CMS REF ADDRESS 1` = ADDRESS_1,
        `CMS REF ADDRESS 2` = ADDRESS_2,
        `CMS REF CITY` = CITY,
        `CMS REF STATE` = STATE,
        `CMS REF ZIP` = POSTAL_CODE
      )
  }
}

I then apply this function against the data frame of NPI values above:

out <- apply(npi_values, 1, getNPI)

When I apply this to my real dataset you can see below that the error case is converted to a character even though I specify a data frame of size 1 row by 6 columns

Based on feedback from @akrun I have modified my apply statement to include wrapping the getNPI function in a list, see below:

out <- apply(npi_values, 1, function(x) list(getNPI(x)))

The structure of out now looks as follows:

str(out)
List of 3
 $ :List of 1
  ..$ : tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
  .. ..$ NPI NUMBER       : int 1598727430
  .. ..$ CMS REF ADDRESS 1: chr "PO BOX 17567"
  .. ..$ CMS REF ADDRESS 2: chr ""
  .. ..$ CMS REF CITY     : chr "PENSACOLA"
  .. ..$ CMS REF STATE    : chr "FL"
  .. ..$ CMS REF ZIP      : chr "325227567"
 $ :List of 1
  ..$ : tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
  .. ..$ NPI NUMBER       : int 1083632731
  .. ..$ CMS REF ADDRESS 1: chr "PO BOX 17326"
  .. ..$ CMS REF ADDRESS 2: chr ""
  .. ..$ CMS REF CITY     : chr "DENVER"
  .. ..$ CMS REF STATE    : chr "CO"
  .. ..$ CMS REF ZIP      : chr "802170326"
 $ :List of 1
  ..$ : Named num 1.71e+09
  .. ..- attr(*, "names")= chr "values"

When I try to collapse these lists into data frame of 3 rows by 6 columns, the last case (errored case) drops into a 7th column which is not desired. I wish to have the value from the 3 case stored in the first column with remaining values filled with NA.

Desired outcome:

`NPI NUMBER` <- c(1598727430,1083632731,1710983663)
`CMS REF ADDRESS 1` <- c("PO BOX 17567","PO BOX 17326",NA)
`CMS REF ADDRESS 2` <- c("","",NA)
`CMS REF CITY` <- c("PENSACOLA","DENVER",NA)
`CMS REF STATE` <- c("FL","CO",NA)
`CMS REF ZIP` <- c("325227567","802170326",NA)
desired <- data.frame(`NPI NUMBER`,`CMS REF ADDRESS 1`,`CMS REF ADDRESS 2`,`CMS REF CITY`,`CMS REF STATE`,`CMS REF ZIP`)

enter image description here

enter image description here


Solution

  • Turns out that I needed to return the value of npi_details within the if part of the if else statement in order to keep the work around tibble I created for the errored cases!

    # CREATE A FUNCTION TO PULL NPI INFORMATION FROM THE NPI REGISTRY
    getNPI <- function(object) {
      request <- httr::GET(
        url = path,
        query = list(
          version = "2.0",
          number = object
        )
      )
      Sys.sleep(0.25)
    
      warn_for_status(request)
    
      npi_details <- content(request,
        as = "text",
        encoding = "UTF-8"
      ) %>%
        fromJSON(.,
          flatten = TRUE
        ) %>%
        data.frame()
      # IF THE API THROWS BACK A RESULT WHERE THE COLUMN NAMES CONTAIN 'ERROR'
      # THEN ASSIGN ALL THE ROW VALUES TO NA AND ADD THE NPI VALUE TO THE FIRST
      # COLUMN
      if (any(grepl("ERROR", toupper(colnames(npi_details))))) {
        npi_details <- as.data.frame(matrix("error", ncol = 6, nrow = 1), stringsAsFactors = FALSE) %>%
          dplyr::rename(
            `NPI NUMBER` = V1,
            `CMS REF ADDRESS 1` = V2,
            `CMS REF ADDRESS 2` = V3,
            `CMS REF CITY` = V4,
            `CMS REF STATE` = V5,
            `CMS REF ZIP` = V6
          ) %>% as_tibble()
        
        npi_details[1,1] <- as.character(object)
        return(npi_details)
    
        # ELSE IF THE DATA FRAME DOES NOT CONTAIN 'ERROR' THEN RUN THIS CHUNK
      } else {
        select(npi_details, contains(c("addresses", "number"))) %>%
          unnest(c(contains("address"))) %>%
          filter(address_purpose == "MAILING") %>%
          rename_all(.funs = toupper) %>%
          select(
            `NPI NUMBER` = RESULTS.NUMBER,
            -COUNTRY_CODE,
            -COUNTRY_NAME,
            -ADDRESS_PURPOSE,
            -ADDRESS_TYPE,
            `CMS REF ADDRESS 1` = ADDRESS_1,
            `CMS REF ADDRESS 2` = ADDRESS_2,
            `CMS REF CITY` = CITY,
            `CMS REF STATE` = STATE,
            `CMS REF ZIP` = POSTAL_CODE
          ) %>% 
          mutate(`NPI NUMBER` = as.character(`NPI NUMBER`))
      }
    }