Search code examples
razureazure-data-lakerevoscalerrxkmeans

Azure / R-server - rxKmeans write file with no header


I'm doing a kmeans clustering in Azure / R-server and need to be able to write a file that does not have a header.

So far I've tried:

k1 <- rxKmeans(formula = ~ var1 + var2 + var3, data = df, seed = 10, numClusters = 5
               , outFile = dfOut, extraVarsToWrite = c('CUST_ID'), overwrite = T
                , outColName = F
)

And I get this error :

Error in rxuHandleClusterJobTryFailure(retObject, hpcServerJob, autoCleanup) : 
  Error completing job on cluster:
Error : rxIsCharacterScalarNonEmpty(outColName) is not TRUE

I've also tried:

k1 <- rxKmeans(formula = ~ var1 + var2 + var3, data = df, seed = 10, numClusters = 5
               , outFile = dfOut, extraVarsToWrite = c('CUST_ID'), overwrite = T
                , header = F
)

Which returns:

Error in rxuHandleClusterJobTryFailure(retObject, hpcServerJob, autoCleanup) : 
  Error completing job on cluster:
Error in rxKmeansBase(formula = formula, data = data, outDataSource = outDataSource,  : 
  unused argument (header = FALSE)

Any other suggestions?


Solution

  • The problem was that I was giving conflicting instructions in the file definition and the rxKmeans function.

    I fixed it by omitting the header argument from the rxKmeans function and set firstRowIsColNames to FALSE.

    kmeansFile <- paste('~/clusters/ClusterOutput.tsv', sep = '')
    dfOut <- RxTextData(kmeansFile, fileSystem = hdfsFS, firstRowIsColNames = F)
    
    k1 <- rxKmeans(formula = ~ var1 + var2 + var3, data = df, seed = 10, numClusters = 5
                   , outFile = dfOut, extraVarsToWrite = c('id_num'), overwrite = T
                   # , outColName = F
                   # , header = F
    )