I want to create a data frame from a files. First I have downloaded the ftp
files in to my pc from internet. Files can be found in this link: data from ftp. I want to create a data frame with date (taken from the Launch date
line of each file) on one column and data on other columns as well.
For grabbing all files I used the following code:
`setwd("C:/Users/")
path = "~C:/Users/"
files <- lapply(list.files(pattern = '\\.l100'), readLines)
test.sample<-do.call(rbind, lapply(files, function(lines){
# for each file, return a data.frame of the datetime, pulled with regex
data.frame(datetime = lubridate::dmy(sub('^.*Launch Date* : ', '', lines[6])),
# and the data, read in as text
read.table(text = lines[13:length(lines)]))
}))`
I am wondering what is wrong with the above code. If you can write a new code would be great too. Thank you in advance.
The data looks like this:
National Oceanic and Atmospheric Administration, U.S. Department of Commerce
Station : Pago Pago, American Samoa
Station Height : 5 meters
Latitude : -14.33
Longitude : -170.71
Flight Number : ASA016
Launch Date : 18 July 1986
Launch Time : 02:31:00 GMT
Radiosonde Type:
Radiosonde Num :
O3 Sonde ID : 4A1834 Background: 0.050 microamps (0.20 mPa) Flowrate: 29.90 sec/100ml RH Corr: 2.00 %
Sonde Total O3 : 268 (51) DU Sonde Total O3 (SBUV): 251 (35) DU
Level Press Alt Pottp Temp FtempV Hum Ozone Ozone Ozone Ptemp O3 # DN O3 Res
Num hPa km K C C % mPa ppmv atmcm C 10^11/cc DU
0 1007.7 0.005 304.0 31.5 26.0 73 1.63 0.016 0.0000 38.4 3.883 268
1 9999.9 0.100 9999.9 999.9 999.9 999 99.90 99.999 99.9990 999.9 999.999 9999
2 982.0 0.200 301.2 26.5 21.4 74 1.69 0.017 0.0003 38.4 4.096 267
3 9999.9 0.300 9999.9 999.9 999.9 999 99.90 99.999 99.9990 999.9 999.999 9999
4 960.0 0.400 301.2 24.6 20.6 79 1.64 0.017 0.0006 38.4 3.983 267
5 9999.9 0.500 9999.9 999.9 999.9 999 99.90 99.999 99.9990 999.9 999.999 9999
6 946.0 0.600 301.1 23.2 20.2 83 1.65 0.017 0.0008 38.4 4.021 267
7 932.0 0.700 300.9 21.8 19.9 89 1.65 0.018 0.0010 38.5 4.040 267
8 920.9 0.800 301.1 21.0 16.4 75 1.65 0.018 0.0012 38.5 4.051 266
9 9999.9 0.900 9999.9 999.9 999.9 999 99.90 99.999 99.9990 999.9 999.999 9999
10 9999.9 1.000 9999.9 999.9 999.9 999 99.90 99.999 99.9990 999.9 999.999 9999
11 893.0 1.100 302.0 19.2 12.1 64 1.58 0.018 0.0016 38.5 3.923 266
12 9999.9 1.200 9999.9 999.9 999.9 999 99.90 99.999 99.9990 999.9 999.999 9999
I have many files like this. Thank you again.
Though I like the metadata and somewhat self-documentation present in this fixed-width table file, it can be interesting to import them programmatically. Excel happens to do alright since it doesn't complain about lines that don't match the pattern, but that doesn't help import numerous files.
So we need something a little home-baked. I only downloaded two of the files, but this should work equally well with many more. (I'm assuming the format is consistent, otherwise we'll need to fix some of the magic constants such as 15 (rows to ignore) and 29 (rows that include the table headers).)
(files <- list.files(pattern = "asa*"))
# [1] "asa001_1986_04_01_20.l100" "asa002_1986_04_14_18.l100"
This is the workhorse. Instead of "just" taking the date as you requested, it's about the same effort to grab all the metadata headers. They made it a little problematic by putting multiples on some lines (and some labels with a space), but I think I found a suitable pattern to find them.
alldfs <- lapply(files, function(f) {
# read in all but the data; two MAGIC CONSTANTS
txt <- readLines(f, n = 29)[-(1:15)]
i <- which(! nzchar(trimws(txt)))
hdrs <- sapply(strsplit(txt[1:(i-1)], ": "), trimws)
hdrsdf <- as.data.frame(Reduce(c, lapply(hdrs, function(hdr) {
z <- strsplit(hdr, " ")
if (length(z) == 1) {
setNames(NA_character_, nm = z[[1]])
} else {
setNames(lapply(z[-1], head, n=1), nm = trimws(sapply(z[-length(z)], tail, n=1)))
}
})), stringsAsFactors = FALSE)
df <- as.data.frame(lapply(read.table(f, skip=27, stringsAsFactors=F)[-1,],
as.numeric))
cbind.data.frame(hdrsdf, df)
})
length(alldfs)
# [1] 2
The result is a wide data.frame. The first 13 columns are constant, the metadata headers. If you only want Launch.Date
, feel free to drop the rest.
str(alldfs[[1]])
# 'data.frame': 361 obs. of 27 variables:
# $ Station : chr "Pago Pago, American Samoa" "Pago Pago, American Samoa" "Pago Pago, American Samoa" "Pago Pago, American Samoa" ...
# $ Station.Height : chr "5 meters" "5 meters" "5 meters" "5 meters" ...
# $ Latitude : chr "-14.33" "-14.33" "-14.33" "-14.33" ...
# $ Longitude : chr "-170.71" "-170.71" "-170.71" "-170.71" ...
# $ Flight.Number : chr "ASA001" "ASA001" "ASA001" "ASA001" ...
# $ Launch.Date : chr "01 April 1986" "01 April 1986" "01 April 1986" "01 April 1986" ...
# $ Launch.Time : chr "20:40:00 GMT" "20:40:00 GMT" "20:40:00 GMT" "20:40:00 GMT" ...
# $ Radiosonde.Type : chr NA NA NA NA ...
# $ Radiosonde.Num : chr NA NA NA NA ...
# $ O3.Sonde.ID : chr "4A1911" "4A1911" "4A1911" "4A1911" ...
# $ Background : chr "0.050 microamps (0.19 mPa)" "0.050 microamps (0.19 mPa)" "0.050 microamps (0.19 mPa)" "0.050 microamps (0.19 mPa)" ...
# $ Flowrate : chr "29.10 sec/100ml" "29.10 sec/100ml" "29.10 sec/100ml" "29.10 sec/100ml" ...
# $ RH.Corr : chr "2.00 %" "2.00 %" "2.00 %" "2.00 %" ...
# $ Sonde.Total.O3 : chr "242" "242" "242" "242" ...
# $ Sonde.Total.O3..SBUV.: chr "233 (36) DU" "233 (36) DU" "233 (36) DU" "233 (36) DU" ...
# $ Level : num 1010 10000 10000 10000 971 ...
# $ Press : num 0.005 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ...
# $ Alt : num 302 10000 10000 10000 302 ...
# $ Pottp : num 30.2 999.9 999.9 999.9 25.8 ...
# $ Temp : num 24.6 999.9 999.9 999.9 20.7 ...
# $ FtempV : num 72 999 999 999 74 999 999 999 75 999 ...
# $ Hum : num 1.65 99.9 99.9 99.9 1.68 99.9 99.9 99.9 1.48 99.9 ...
# $ Ozone : num 0.016 99.999 99.999 99.999 0.017 ...
# $ Ozone.1 : num 0e+00 1e+02 1e+02 1e+02 5e-04 ...
# $ Ozone.2 : num 38.1 999.9 999.9 999.9 38.1 ...
# $ Ptemp : num 3.93 1000 1000 1000 4.07 ...
# $ O3 : num 242 9999 9999 9999 241 ...