This is the code I have used in R via Spark Cluster, and error also given below
mydata<-spark_read_csv(spark_cluster,name = "rd_1",path = "IAF_Extracted_Data_Zipped.csv",header = F,delimiter = "|")
mydata %>% select(customer=V1,device_subscriber_id=V2,user_subscriber_id=V3,user_id=V4,location_id=V5)
Error in .f(.x[[i]], ...) : object 'V1' not found
If you want specific names just provide a vector of names on read:
columns <- c("customer", "device_subscriber_id",
"user_subscriber_id", "user_id", "location_id")
spark_read_csv(
spark_cluster, name = "rd_1",path = "IAF_Extracted_Data_Zipped.csv",
header = FALSE, columns = columns, delimiter = "|"
)
The number of columns
should match the number of columns in the input.