I know many posts have already answered similar questions like mine, but I've tried to figure it out for 2 days now and it seems as if I'm not seeing the picture here...
I got this csv file looking like this:
Werteformat: wertabh. (Q)
Werte:
01.01.76 00:00 0,363
02.01.76 00:00 0,464
...
31.12.10 00:00 1,03
01.01.11 00:00 Lücke
I wanna create a timeline with the data, but I can't import the csv properly.
I've tried this so far:
data<-read.csv2(file,
header = FALSE,
sep = ";",
quote="\"",
dec=",",
col.names=c("Datum", "Abfluss"),
skip=2,
nrows=length(strs)-2,
colClasses=c("date","numeric"))`
But then I get
"Fehler in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() erwartete 'a real', bekam 'L�cke'"
so i delete the colClasses and it works, I got rid of all unwanted rows. But: everything is in factors. So i use as.numeric
Abfluss1<-as.numeric(data$Abfluss)
Know i can calculate with Abfluss 1, but the values are totally different than in the original csv...
Abfluss1
[1] 99 163 250 354 398 773 927 844 796 772 1010 1468 1091 955 962 933 881 844 803 772 773 803 1006 969 834 779 755
[28] 743 739
Where did I go wrong?! I really would appreciate some helpful hints. By the way, the files I'm working on can be downloaded here: http://ehyd.gv.at/#
Just click on one of these blue-ish triangles and download "Q-Tagesmittel"
First of all, there seems a problem with the file encoding. The downloaded file has obviously a Latin-encoding which is not correctly recognizes, why it says L�cke
and not Lücke
:
encoding = "latin1"
Secondly, Your example seems to be not reproducible: From my understanding you want to skip 28 lines (maybe I am wrong). And the variable strs
is not declared in your example. From what I understood you want to skip 28 lines and leave the last one out so in total
nrows = length( readLines( file ) ) - 29
Finally you bumped into this common R issue: How to convert a factor to an integer\numeric without a loss of information?. The entire column is interpreted as character
vector because not all elements could be interpreted as numeric
. And when adding a character
vector to a data.frame it is per default casted to a factor
column. Although it is not necessary, if you specify the correct range of lines, you can avoid this with
stringsAsFactors = FALSE
So in total:
f <- readLines("Q-Tagesmittel-204586.csv")
df <- read.csv2(
text = f,
header = FALSE,
sep = ";",
quote="\"",
dec=",",
skip=28,
col.names=c("Datum", "Abfluss"),
nrows = length(f) -29,
encoding = "latin1",
stringsAsFactors = FALSE
)
Oh, and just in case you want to convert as next step the Datum
column to a date object, one method to achieve this would be
df$Datum <- strptime( df$Datum, "%d.%m.%Y %H:%M:%S" )
str(df)
'data.frame': 12784 obs. of 2 variables:
$ Datum : POSIXlt, format: "1976-01-01" "1976-01-02" "1976-01-03" "1976-01-04" ...
$ Abfluss: num 0.691 0.799 0.814 0.813 0.795 0.823 0.828 0.831 0.815 0.829 ...