I'm doing a kmeans clustering in Azure / R-server and need to be able to write a file that does not have a header.
So far I've tried:
k1 <- rxKmeans(formula = ~ var1 + var2 + var3, data = df, seed = 10, numClusters = 5
, outFile = dfOut, extraVarsToWrite = c('CUST_ID'), overwrite = T
, outColName = F
)
And I get this error :
Error in rxuHandleClusterJobTryFailure(retObject, hpcServerJob, autoCleanup) :
Error completing job on cluster:
Error : rxIsCharacterScalarNonEmpty(outColName) is not TRUE
I've also tried:
k1 <- rxKmeans(formula = ~ var1 + var2 + var3, data = df, seed = 10, numClusters = 5
, outFile = dfOut, extraVarsToWrite = c('CUST_ID'), overwrite = T
, header = F
)
Which returns:
Error in rxuHandleClusterJobTryFailure(retObject, hpcServerJob, autoCleanup) :
Error completing job on cluster:
Error in rxKmeansBase(formula = formula, data = data, outDataSource = outDataSource, :
unused argument (header = FALSE)
Any other suggestions?
The problem was that I was giving conflicting instructions in the file definition and the rxKmeans function.
I fixed it by omitting the header
argument from the rxKmeans function and set firstRowIsColNames
to FALSE.
kmeansFile <- paste('~/clusters/ClusterOutput.tsv', sep = '')
dfOut <- RxTextData(kmeansFile, fileSystem = hdfsFS, firstRowIsColNames = F)
k1 <- rxKmeans(formula = ~ var1 + var2 + var3, data = df, seed = 10, numClusters = 5
, outFile = dfOut, extraVarsToWrite = c('id_num'), overwrite = T
# , outColName = F
# , header = F
)