I have a data set: (can be seen from the below link) https://drive.google.com/file/d/0B4Mldbnr1-avMDIxYmZLSnRfUDA/view?usp=sharing and I want to make data correction using subset & levels function. Here is what I have been trying to apply but it does not seem to work:
# Setting working directory
setwd("F:/Intro Data Science/Assignment Part B/Assignment Part B-20170902")
plot.new()
options(digits=2)
# Reading data set
installed.packages("lubridate")
library(lubridate)
# Reading data set
power <- read.csv("data set 6.csv", na.strings="")
# SUBSETTING
Area <- as.numeric(power$Area)
City <- as.character(power$City)
P.Winter <- as.numeric(power$P.Winter)
P.Summer <- as.numeric(power$P.Summer)
#Data Cleaning
levels(power$City)<- c(levels(power$City),"Auckland")
power$City[power$City == "Ackland"] <- "Auckland"
I really need your help guys. This was supposed to be easy because I have followed exactly what was given in the lecture but it doesn't do anything when I run the code. Appreciate your help Nelson
The output requested:
> dput(head(power, 30))
structure(list(Area = c(144.38, 176.83, 268.71, 208.67, 123.61,
199.3, 109.46, 183.28, 110.61, 146.91, 77.451, 232.65, 270.94,
49.191, 234.5, 280.93, 192.18, 95.918, 230.74, 72.698, 129.26,
110.76, 199.44, 129.75, 146.8, 287.97, 162.1, 249.03, 159.3,
272.51), City = c("Auckland ", "Auckland ", "Auckland ", "Auckland ",
"Auckland ", "Auckland ", "Auckland ", "Auckland ", "Auckland ",
"Auckland ", "Auckland ", "Auckland ", "Auckland ", "Auckland ",
"Auckland ", "Auckland ", "Auckland ", "Ackland ", "Auckland ",
"Auckland ", "Auckland ", "Auckland ", "Auckland ", "Auckland ",
"Auckland ", "Auckland ", "Auckland ", "Auckland ", "Auckland ",
"Auckland "), P.Winter = c(1684.9, 1926.7, 2026.9, 1938.1, 1579.9,
1991.4, 1572.5, 1691.2, 1684.2, 1743.6, 1234.6, 2043, 1986.7,
1259.7, 1870.4, 2115.6, 18000, 1452, 1936.2, 1430.2, 1587.3,
1614.3, 1993.2, 1746.4, 1807.6, 2009.4, 1859.1, 1985.5, 1909.4,
1892.7), P.Summer = c(1194.5, 1487.3, 1737.3, -158, 1148.1, 1445.8,
885.77, 1393, 1191.5, 1149.9, 813.38, 1623.8, 1708, 874.48, 1635.7,
1826.1, 1596.6, 793.71, 1668.8, 905.6, 1227.3, 938.38, 1523.1,
1012.6, 1122.8, 1829.5, 1223.3, 1653.2, 1175.5, 1882)), .Names = c("Area",
"City", "P.Winter", "P.Summer"), row.names = c(NA, 30L), class = "data.frame")
I believe that the function you want is droplevels
.
First, make up some data.
set.seed(5295) # make the results reproducible
cities <- factor(sample(c("Ackland", "Auckland", "Wellington", "Sidney"), 100, TRUE))
power <- data.frame(City = cities)
Now the code, starting with yours.
power$City[power$City == "Ackland"] <- "Auckland"
power$City <- droplevels(power$City)
levels(power$City) # check if it worked
#[1] "Auckland" "Sidney" "Wellington"
EDIT.
After seen the output of dput(head(power, 30))
, the solution became onvious. The column City
is of class character
, not factor
, and there are no values "Ackland"
or "Auckland"
, they have a trailing white space that is messing things up. So all we need to do is to remove "Ackland "
and remove the trailing white spaces.
str(power)
#'data.frame': 30 obs. of 4 variables:
# $ Area : num 144 177 269 209 124 ...
# $ City : chr "Auckland " "Auckland " "Auckland " "Auckland " ...
# $ P.Winter: num 1685 1927 2027 1938 1580 ...
# $ P.Summer: num 1194 1487 1737 -158 1148 ...
which(power$City == "Ackland ") # note the white space
#[1] 18
which(power$City == "Auckland ") # note the white space
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 20 21 22 23 24 25 26
#[26] 27 28 29 30
# remove the value "Ackland ", with white space
power$City[power$City == "Ackland "] <- "Auckland"
power$City <- trimws(power$City) # remove white spaces from all of them
And no columns vanish, just run str(power)
to see it.