I am working with Spark scala shell and trying to create dataframe and datasets from a text file.
For getting datasets from a text file, there are two options, text and textFile methods as follows:
scala> spark.read.
csv format jdbc json load option options orc parquet schema table text textFile
Here is how i am gettting datasets and dataframe from both these methods:
scala> val df = spark.read.text("/Users/karanverma/Documents/logs1.txt")
df: org.apache.spark.sql.DataFrame = [value: string]
scala> val df = spark.read.textFile("/Users/karanverma/Documents/logs1.txt")
df: org.apache.spark.sql.Dataset[String] = [value: string]
So my question is what is the difference between the two methods for text file?
When to use which methods?
As I've noticed that they are almost having the same functionality,
It just that spark.read.text
transform data to Dataset
which is a distributed collection of data, while spark.read.textFile
transform data to Dataset[Type]
which is consist of Dataset organized into named columns.
Hope it helps.