I have written a function to query the CMS National Plan and Provider Enumeration System (NPPES) API.
I wish to pass in a data frame of NPI values and return their addresses.
Some of the NPI values are no longer valid and I have tried to build some error handling for these scenarios.
My error handling if else
statement specifies to use a data frame of dimensions 1 row by 6 columns and I have inserted the errored NPI value into row 1 column 1.
When I use the apply function against my data frame I get a list [1x6] for all the successful API calls but the errored values are just a single character vector.
I have tried to debug this issue but I can't figure out where the conversion from data frame to character is happening. I would be very grateful if anyone could help me please.
Here is a dataframe of values I wish to query:
install.packages("pacman")
library(pacman)
pacman::p_load(tidyverse,data.table,httr,jsonlite)
values <- c(1598727430,
1083632731,
1710983663) # LAST VALUE PRODUCES THE ERROR CASE
npi_values <- data.frame(values)
here is the URL for the API:
path <- "https://npiregistry.cms.hhs.gov/api/?"
My function:
# CREATE A FUNCTION TO PULL NPI INFORMATION FROM THE NPI REGISTRY
getNPI <- function(object) {
request <- httr::GET(
url = path,
query = list(
version = "2.0",
number = object
)
)
Sys.sleep(0.25)
warn_for_status(request)
npi_details <- content(request,
as = "text",
encoding = "UTF-8"
) %>%
fromJSON(.,
flatten = TRUE
) %>%
data.frame()
# IF THE API THROWS BACK A RESULT WHERE THE COLUMN NAMES CONTAIN 'ERROR'
# THEN ASSIGN ALL THE ROW VALUES TO NA AND ADD THE NPI VALUE TO THE FIRST
# COLUMN
if (any(grepl("ERROR", toupper(colnames(npi_details))))) {
npi_details <- as.data.frame(matrix(NA,ncol = 6,nrow = 1)) %>%
dplyr::rename(`NPI NUMBER` = V1,
`CMS REF ADDRESS 1` = V2,
`CMS REF ADDRESS 2` = V3,
`CMS REF CITY` = V4,
`CMS REF STATE` = V5,
`CMS REF ZIP` = V6)
npi_details[1,1] <- object
# ELSE IF THE DATA FRAME DOES NOT CONTAIN 'ERROR' THEN RUN THIS CHUNK
} else {
select(npi_details, contains(c("addresses", "number"))) %>%
unnest(c(contains("address"))) %>%
filter(address_purpose == "MAILING") %>%
rename_all(.funs = toupper) %>%
select(
`NPI NUMBER` = RESULTS.NUMBER,
-COUNTRY_CODE,
-COUNTRY_NAME,
-ADDRESS_PURPOSE,
-ADDRESS_TYPE,
`CMS REF ADDRESS 1` = ADDRESS_1,
`CMS REF ADDRESS 2` = ADDRESS_2,
`CMS REF CITY` = CITY,
`CMS REF STATE` = STATE,
`CMS REF ZIP` = POSTAL_CODE
)
}
}
I then apply this function against the data frame of NPI values above:
out <- apply(npi_values, 1, getNPI)
When I apply this to my real dataset you can see below that the error case is converted to a character even though I specify a data frame of size 1 row by 6 columns
Based on feedback from @akrun I have modified my apply statement to include wrapping the getNPI function in a list, see below:
out <- apply(npi_values, 1, function(x) list(getNPI(x)))
The structure of out
now looks as follows:
str(out)
List of 3
$ :List of 1
..$ : tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
.. ..$ NPI NUMBER : int 1598727430
.. ..$ CMS REF ADDRESS 1: chr "PO BOX 17567"
.. ..$ CMS REF ADDRESS 2: chr ""
.. ..$ CMS REF CITY : chr "PENSACOLA"
.. ..$ CMS REF STATE : chr "FL"
.. ..$ CMS REF ZIP : chr "325227567"
$ :List of 1
..$ : tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
.. ..$ NPI NUMBER : int 1083632731
.. ..$ CMS REF ADDRESS 1: chr "PO BOX 17326"
.. ..$ CMS REF ADDRESS 2: chr ""
.. ..$ CMS REF CITY : chr "DENVER"
.. ..$ CMS REF STATE : chr "CO"
.. ..$ CMS REF ZIP : chr "802170326"
$ :List of 1
..$ : Named num 1.71e+09
.. ..- attr(*, "names")= chr "values"
When I try to collapse these lists into data frame of 3 rows by 6 columns, the last case (errored case) drops into a 7th column which is not desired. I wish to have the value from the 3 case stored in the first column with remaining values filled with NA.
Desired outcome:
`NPI NUMBER` <- c(1598727430,1083632731,1710983663)
`CMS REF ADDRESS 1` <- c("PO BOX 17567","PO BOX 17326",NA)
`CMS REF ADDRESS 2` <- c("","",NA)
`CMS REF CITY` <- c("PENSACOLA","DENVER",NA)
`CMS REF STATE` <- c("FL","CO",NA)
`CMS REF ZIP` <- c("325227567","802170326",NA)
desired <- data.frame(`NPI NUMBER`,`CMS REF ADDRESS 1`,`CMS REF ADDRESS 2`,`CMS REF CITY`,`CMS REF STATE`,`CMS REF ZIP`)
Turns out that I needed to return the value of npi_details within the if
part of the if else statement in order to keep the work around tibble I created for the errored cases!
# CREATE A FUNCTION TO PULL NPI INFORMATION FROM THE NPI REGISTRY
getNPI <- function(object) {
request <- httr::GET(
url = path,
query = list(
version = "2.0",
number = object
)
)
Sys.sleep(0.25)
warn_for_status(request)
npi_details <- content(request,
as = "text",
encoding = "UTF-8"
) %>%
fromJSON(.,
flatten = TRUE
) %>%
data.frame()
# IF THE API THROWS BACK A RESULT WHERE THE COLUMN NAMES CONTAIN 'ERROR'
# THEN ASSIGN ALL THE ROW VALUES TO NA AND ADD THE NPI VALUE TO THE FIRST
# COLUMN
if (any(grepl("ERROR", toupper(colnames(npi_details))))) {
npi_details <- as.data.frame(matrix("error", ncol = 6, nrow = 1), stringsAsFactors = FALSE) %>%
dplyr::rename(
`NPI NUMBER` = V1,
`CMS REF ADDRESS 1` = V2,
`CMS REF ADDRESS 2` = V3,
`CMS REF CITY` = V4,
`CMS REF STATE` = V5,
`CMS REF ZIP` = V6
) %>% as_tibble()
npi_details[1,1] <- as.character(object)
return(npi_details)
# ELSE IF THE DATA FRAME DOES NOT CONTAIN 'ERROR' THEN RUN THIS CHUNK
} else {
select(npi_details, contains(c("addresses", "number"))) %>%
unnest(c(contains("address"))) %>%
filter(address_purpose == "MAILING") %>%
rename_all(.funs = toupper) %>%
select(
`NPI NUMBER` = RESULTS.NUMBER,
-COUNTRY_CODE,
-COUNTRY_NAME,
-ADDRESS_PURPOSE,
-ADDRESS_TYPE,
`CMS REF ADDRESS 1` = ADDRESS_1,
`CMS REF ADDRESS 2` = ADDRESS_2,
`CMS REF CITY` = CITY,
`CMS REF STATE` = STATE,
`CMS REF ZIP` = POSTAL_CODE
) %>%
mutate(`NPI NUMBER` = as.character(`NPI NUMBER`))
}
}