I'm at the final stage of tidying my data before analysis and have encountered an issue i'm not really able to understand when removing whitespace in the data table. See complete code below for description of the steps in the code.
Started from the following page (How to remove all whitespace from a string?) and have attempted to troubleshoot through other pages talking about errors/warning with atomic vectors without luck.
At step 6 I recieved the flowing warning
In stri_replace_all_fixed(allData, " ", "") :
argument is not an atomic vector; coercing
And at step 7 the following warning
> #Change sold and taxed columes from character to numerical
> allData$SoldAmount <- as.numeric(allData$SoldAmount)
Warning message:
NAs introduced by coercion
> allData$Tax <- as.numeric(allData$Tax)
Warning message:
NAs introduced by coercion
Both step 6 and 7 seem to run, but the result ends up being NA in two of the colums(see image)
Result after wihtespace are removed
The complete code is listed below and I would love some advice on how to get step 6 and 7 to give me colums that are without whitespace and are numerical.
#Step 1: Load needed library
library(tidyverse)
library(rvest)
library(jsonlite)
library(stringi)
#Step 2: Access the URL
url <- "https://www.forsvarsbygg.no/ListApi/ListContent/78635/SoldEstates/0/10/"
#Step 3: Direct JSON as format of data in URL
data <- jsonlite::fromJSON(url, flatten = TRUE)
#Step 4: Access all items in API
totalItems <- data$TotalNumberOfItems
#Step 5: Summarize all data from API
allData <- paste0('https://www.forsvarsbygg.no/ListApi/ListContent/78635/SoldEstates/0/', totalItems,'/') %>%
jsonlite::fromJSON(., flatten = TRUE) %>%
.[1] %>%
as.data.frame() %>%
rename_with(~str_replace(., "ListItems.", ""), everything())
#Step 6: removing colums not needed
allData <- allData[, -c(1,4,8,9,11,12,13,14,15)]
#Step 6: remove whitespace in all colums
stri_replace_all_fixed(allData, " ", "")
#Step 7: Change sold and taxed columes from character to numerical
allData$SoldAmount <- as.numeric(allData$SoldAmount)
allData$Tax <- as.numeric(allData$Tax)
You call stri_replace_all_fixed(allData, " ", "")
but ignore/discard its output. Save it somewhere.
#Step 6: remove whitespace in all colums
allData[] <- lapply(allData, gsub, pattern = " ", replacement = "")
#Step 7: Change sold and taxed columes from character to numerical
allData$SoldAmount <- as.numeric(allData$SoldAmount)
allData$Tax <- as.numeric(allData$Tax)
head(allData)
# County Municipality Tax SoldAmount Type Date
# 1 Akershus FROGN 2400000 2550000 Bolig 2004
# 2 Akershus FROGN 2225000 2100000 Bolig 2004
# 3 Akershus SKI 7600000 18000000 Næringstomt 2006
# 4 Østfold SARPSBORG 3000000 3815000 Tomt 2004
# 5 Østfold RYGGE 10000000 16000000 Næringseiendom 2006
# 6 Vestfold LARVIK 61950 61950 Tomt 2013
Alternatively, do it once, and only to the columns you need:
# allData <- paste0(...) %>% ...
allData <- allData[, -c(1,4,8,9,11,12,13,14,15)]
allData[c("Tax", "SoldAmount")] <- lapply(allData[c("Tax", "SoldAmount")], function(z) as.numeric(gsub(" ", "", z)))
head(allData)
# County Municipality Tax SoldAmount Type Date
# 1 Akershus FROGN 2400000 2550000 Bolig 2004
# 2 Akershus FROGN 2225000 2100000 Bolig 2004
# 3 Akershus SKI 7600000 18000000 Næringstomt 2006
# 4 Østfold SARPSBORG 3000000 3815000 Tomt 2004
# 5 Østfold RYGGE 10000000 16000000 Næringseiendom 2006
# 6 Vestfold LARVIK 61950 61950 Tomt 2013
The specificity of replacing only for those two columns is important, as there are many values in other columns that have spaces, and I don't know that it was your intention to compress them all:
str(sapply(allData, function(z) unique(grep(" ", z, value = TRUE)), simplify = FALSE))
# List of 6
# $ County : chr [1:2] "Møre og Romsdal" "Sogn- og fjordane"
# $ Municipality: chr [1:4] "EVJE OG HORNNES" "VESTRE TOTEN" "ØSTRE TOTEN" "NORDRE LAND"
# $ Tax : chr [1:414] " 2 400 000" " 2 225 000" " 7 600 000" " 3 000 000" ...
# $ SoldAmount : chr [1:538] " 2 550 000" " 2 100 000" " 18 000 000" " 3 815 000" ...
# $ Type : chr "Annen kategori"
# $ Date : chr(0)