I am using Spark 2.x version.
I am trying to map schema dynamically after reading content in spark variable from a pipe delimited text file without header in Spark Scala.
Text File Content - File.txt:
12345678910|abc|234567
54182124852|def|784964
Schema to be mapped:
FS1|FS2|FS3
Below is the code I tried. Also I tried the following code from an example from the below link but it is not working. https://sparkbyexamples.com/spark/spark-read-text-file-rdd-dataframe/#dataframe-read-text
import org.apache.spark.sql.{DataFrame, Dataset}
val df = spark.read.text("dbfs:/FileStore/tables/Sample1-1.txt")
import spark.implicits.
val dataRDD = df.map(x => {
val elements = x.getString(0).split("|")
(elements(0),elements(1),elements(2))
}).toDF("FS1","FS2","FS3")
dataRDD.printSchema()
dataRDD.show(false)
After executing the above code, I am getting the below output which is not expected,
\+---+---+---+
|fs1|fs2|fs3|
\+---+---+---+
|1 |2 |3 |
|5 |4 |1 |
\+---+---+---+
I want the New File to be saved as - File1.txt which will contain the file content along with Header
FS1|FS2|FS3
12345678910|abc|234567
54182124852|def|784964
You just need to add a header to your csv file.
You have a text file and you already know the delimiter which is |
You should write something like this
import org.apache.spark.sql.DataFrame
val df = spark.read.option( "delimiter", "|" ).csv("dbfs:/FileStore/tables/Sample1-1.txt")
val columns = Seq("FS1", "FS2", "FS3")
val resultDF = df.toDF(columns :_*)
// If you want your result as one file, you can use coalesce.
resultDF.coalesce(1)
.write
.option("header","true")
.option("delimiter","|")
.mode("overwrite")
.csv("output/path")