scala apache-spark dataframe dataset spark-shell

Difference between sparksession text and textfile methods?

I am working with Spark scala shell and trying to create dataframe and datasets from a text file.

For getting datasets from a text file, there are two options, text and textFile methods as follows:

scala> spark.read.
csv   format   jdbc   json   load   option   options   orc   parquet   schema   table   text   textFile

Here is how i am gettting datasets and dataframe from both these methods:

scala> val df = spark.read.text("/Users/karanverma/Documents/logs1.txt")
df: org.apache.spark.sql.DataFrame = [value: string]

scala> val df = spark.read.textFile("/Users/karanverma/Documents/logs1.txt")
df: org.apache.spark.sql.Dataset[String] = [value: string]

So my question is what is the difference between the two methods for text file?

When to use which methods?

Solution

As I've noticed that they are almost having the same functionality,

It just that spark.read.text transform data to Dataset which is a distributed collection of data, while spark.read.textFile transform data to Dataset[Type] which is consist of Dataset organized into named columns.

Hope it helps.