Here's what I want to do:
> library(parallel)
> library(bigmemory)
> big.mat=read.big.matrix("cp2006.csv",header=T)
Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("cp2006.csv", header = T) :
Because type was not specified, we chose double based on the first line of data.
> jobs <- lapply(1:10, function(x) mcparallel(colMeans(is.na(big.mat))*100, name = big.mat))
Error in as.character.default(name) :
no method for coercing this S4 class to a vector
> res <- mccollect(jobs)
However the problem is that is.na
is not apparently applicable to big.matrix
objects. I did a search on web and found mwhich
which is the parallel version of which
in bigmemory
but unfortunately couldn't find a good tutorial on it to find the missing (NA
) values in the column. So I am not sure what function I should feed to my mcparallel
to make it work with big.matrix
objects.
In addition:
> col.NA.mean<-colMeans(is.na(big.mat))*100
Error in colMeans(is.na(big.mat)) :
'x' must be an array of at least two dimensions
In addition: Warning message:
In is.na(big.mat) : is.na() applied to non-(list or vector) of type 'S4'
I got the answer. When we call big.mat we should use [,]
so here's the partial answer.
> colMeans(is.na(big.mat[,]))
Year Month DayofMonth DayOfWeek
0.00000000 0.00000000 0.00000000 0.00000000
DepTime CRSDepTime ArrTime CRSArrTime
0.02102102 0.00000000 0.02402402 0.00000000
UniqueCarrier FlightNum TailNum ActualElapsedTime
1.00000000 0.00000000 0.97997998 0.02402402
CRSElapsedTime AirTime ArrDelay DepDelay
0.00000000 0.02402402 0.02402402 0.02102102
Origin Dest Distance TaxiIn
1.00000000 1.00000000 0.00000000 0.00000000
TaxiOut Cancelled CancellationCode Diverted
0.00000000 0.00000000 1.00000000 0.00000000
CarrierDelay WeatherDelay NASDelay SecurityDelay
0.00000000 0.00000000 0.00000000 0.00000000
LateAircraftDelay
0.00000000
Here's the answer:
library(parallel)
library(bigmemory)
big.mat=read.big.matrix("cp2006.csv",header=T)
Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("cp2006.csv", header = T) :
Because type was not specified, we chose double based on the first line of data.
jobs <- lapply(1:10, function(x) mcparallel(colMeans(is.na(big.mat[,]))*100, name = big.mat))
Error in as.character.default(name) :
no method for coercing this S4 class to a vector
jobs <- lapply(1:10, function(x) mcparallel(colMeans(is.na(big.mat[,]))*100, name = big.mat[,]))
res <- mccollect(jobs)
> res
$`2006`
Year Month DayofMonth DayOfWeek
0.000000 0.000000 0.000000 0.000000
DepTime CRSDepTime ArrTime CRSArrTime
2.102102 0.000000 2.402402 0.000000
UniqueCarrier FlightNum TailNum ActualElapsedTime
100.000000 0.000000 97.997998 2.402402
CRSElapsedTime AirTime ArrDelay DepDelay
0.000000 2.402402 2.402402 2.102102
Origin Dest Distance TaxiIn
100.000000 100.000000 0.000000 0.000000
TaxiOut Cancelled CancellationCode Diverted
0.000000 0.000000 100.000000 0.000000
CarrierDelay WeatherDelay NASDelay SecurityDelay
0.000000 0.000000 0.000000 0.000000
LateAircraftDelay
0.000000
$`2006`
Year Month DayofMonth DayOfWeek
0.000000 0.000000 0.000000 0.000000
DepTime CRSDepTime ArrTime CRSArrTime
2.102102 0.000000 2.402402 0.000000
UniqueCarrier FlightNum TailNum ActualElapsedTime
100.000000 0.000000 97.997998 2.402402
CRSElapsedTime AirTime ArrDelay DepDelay
0.000000 2.402402 2.402402 2.102102
Origin Dest Distance TaxiIn
100.000000 100.000000 0.000000 0.000000
TaxiOut Cancelled CancellationCode Diverted
0.000000 0.000000 100.000000 0.000000
CarrierDelay WeatherDelay NASDelay SecurityDelay
0.000000 0.000000 0.000000 0.000000
LateAircraftDelay
0.000000
$`2006`
Year Month DayofMonth DayOfWeek
0.000000 0.000000 0.000000 0.000000
DepTime CRSDepTime ArrTime CRSArrTime
2.102102 0.000000 2.402402 0.000000
UniqueCarrier FlightNum TailNum ActualElapsedTime
100.000000 0.000000 97.997998 2.402402
CRSElapsedTime AirTime ArrDelay DepDelay
0.000000 2.402402 2.402402 2.102102
Origin Dest Distance TaxiIn
100.000000 100.000000 0.000000 0.000000
TaxiOut Cancelled CancellationCode Diverted
0.000000 0.000000 100.000000 0.000000
CarrierDelay WeatherDelay NASDelay SecurityDelay
0.000000 0.000000 0.000000 0.000000
LateAircraftDelay
0.000000
$`2006`
Year Month DayofMonth DayOfWeek
0.000000 0.000000 0.000000 0.000000
DepTime CRSDepTime ArrTime CRSArrTime
2.102102 0.000000 2.402402 0.000000
UniqueCarrier FlightNum TailNum ActualElapsedTime
100.000000 0.000000 97.997998 2.402402
CRSElapsedTime AirTime ArrDelay DepDelay
0.000000 2.402402 2.402402 2.102102
Origin Dest Distance TaxiIn
100.000000 100.000000 0.000000 0.000000
TaxiOut Cancelled CancellationCode Diverted
0.000000 0.000000 100.000000 0.000000
CarrierDelay WeatherDelay NASDelay SecurityDelay
0.000000 0.000000 0.000000 0.000000
LateAircraftDelay
0.000000
$`2006`
Year Month DayofMonth DayOfWeek
0.000000 0.000000 0.000000 0.000000
DepTime CRSDepTime ArrTime CRSArrTime
2.102102 0.000000 2.402402 0.000000
UniqueCarrier FlightNum TailNum ActualElapsedTime
100.000000 0.000000 97.997998 2.402402
CRSElapsedTime AirTime ArrDelay DepDelay
0.000000 2.402402 2.402402 2.102102
Origin Dest Distance TaxiIn
100.000000 100.000000 0.000000 0.000000
TaxiOut Cancelled CancellationCode Diverted
0.000000 0.000000 100.000000 0.000000
CarrierDelay WeatherDelay NASDelay SecurityDelay
0.000000 0.000000 0.000000 0.000000
LateAircraftDelay
0.000000
$`2006`
Year Month DayofMonth DayOfWeek
0.000000 0.000000 0.000000 0.000000
DepTime CRSDepTime ArrTime CRSArrTime
2.102102 0.000000 2.402402 0.000000
UniqueCarrier FlightNum TailNum ActualElapsedTime
100.000000 0.000000 97.997998 2.402402
CRSElapsedTime AirTime ArrDelay DepDelay
0.000000 2.402402 2.402402 2.102102
Origin Dest Distance TaxiIn
100.000000 100.000000 0.000000 0.000000
TaxiOut Cancelled CancellationCode Diverted
0.000000 0.000000 100.000000 0.000000
CarrierDelay WeatherDelay NASDelay SecurityDelay
0.000000 0.000000 0.000000 0.000000
LateAircraftDelay
0.000000
$`2006`
Year Month DayofMonth DayOfWeek
0.000000 0.000000 0.000000 0.000000
DepTime CRSDepTime ArrTime CRSArrTime
2.102102 0.000000 2.402402 0.000000
UniqueCarrier FlightNum TailNum ActualElapsedTime
100.000000 0.000000 97.997998 2.402402
CRSElapsedTime AirTime ArrDelay DepDelay
0.000000 2.402402 2.402402 2.102102
Origin Dest Distance TaxiIn
100.000000 100.000000 0.000000 0.000000
TaxiOut Cancelled CancellationCode Diverted
0.000000 0.000000 100.000000 0.000000
CarrierDelay WeatherDelay NASDelay SecurityDelay
0.000000 0.000000 0.000000 0.000000
LateAircraftDelay
0.000000
$`2006`
Year Month DayofMonth DayOfWeek
0.000000 0.000000 0.000000 0.000000
DepTime CRSDepTime ArrTime CRSArrTime
2.102102 0.000000 2.402402 0.000000
UniqueCarrier FlightNum TailNum ActualElapsedTime
100.000000 0.000000 97.997998 2.402402
CRSElapsedTime AirTime ArrDelay DepDelay
0.000000 2.402402 2.402402 2.102102
Origin Dest Distance TaxiIn
100.000000 100.000000 0.000000 0.000000
TaxiOut Cancelled CancellationCode Diverted
0.000000 0.000000 100.000000 0.000000
CarrierDelay WeatherDelay NASDelay SecurityDelay
0.000000 0.000000 0.000000 0.000000
LateAircraftDelay
0.000000
$`2006`
Year Month DayofMonth DayOfWeek
0.000000 0.000000 0.000000 0.000000
DepTime CRSDepTime ArrTime CRSArrTime
2.102102 0.000000 2.402402 0.000000
UniqueCarrier FlightNum TailNum ActualElapsedTime
100.000000 0.000000 97.997998 2.402402
CRSElapsedTime AirTime ArrDelay DepDelay
0.000000 2.402402 2.402402 2.102102
Origin Dest Distance TaxiIn
100.000000 100.000000 0.000000 0.000000
TaxiOut Cancelled CancellationCode Diverted
0.000000 0.000000 100.000000 0.000000
CarrierDelay WeatherDelay NASDelay SecurityDelay
0.000000 0.000000 0.000000 0.000000
LateAircraftDelay
0.000000
$`2006`
Year Month DayofMonth DayOfWeek
0.000000 0.000000 0.000000 0.000000
DepTime CRSDepTime ArrTime CRSArrTime
2.102102 0.000000 2.402402 0.000000
UniqueCarrier FlightNum TailNum ActualElapsedTime
100.000000 0.000000 97.997998 2.402402
CRSElapsedTime AirTime ArrDelay DepDelay
0.000000 2.402402 2.402402 2.102102
Origin Dest Distance TaxiIn
100.000000 100.000000 0.000000 0.000000
TaxiOut Cancelled CancellationCode Diverted
0.000000 0.000000 100.000000 0.000000
CarrierDelay WeatherDelay NASDelay SecurityDelay
0.000000 0.000000 0.000000 0.000000
LateAircraftDelay
0.000000
>