Search code examples
rapache-sparkapache-spark-sqlsparklyr

How to rename columns in Sparklyr in R?


This is the code I have used in R via Spark Cluster, and error also given below

mydata<-spark_read_csv(spark_cluster,name = "rd_1",path = "IAF_Extracted_Data_Zipped.csv",header = F,delimiter = "|")

mydata %>% select(customer=V1,device_subscriber_id=V2,user_subscriber_id=V3,user_id=V4,location_id=V5) 

Error in .f(.x[[i]], ...) : object 'V1' not found


Solution

  • If you want specific names just provide a vector of names on read:

    columns <- c("customer", "device_subscriber_id", 
                 "user_subscriber_id", "user_id", "location_id")
    
    spark_read_csv(
       spark_cluster, name = "rd_1",path = "IAF_Extracted_Data_Zipped.csv",
       header = FALSE, columns = columns, delimiter = "|"
    )
    

    The number of columns should match the number of columns in the input.