Search code examples
apache-sparkdelimiterdatabricks

How to properly separate columns


I'm having trouble with Spark SQL. I tried to import a CSV file into spark DB. My columns are separated by semicolons. I have tried to separate the columns by using sep to do so, but to my dismay, the columns are not separated properly.

Is this how Spark SQL works or is there a difference the conventional Spark SQL and the one in DataBricks. I am new to SparkSQL, a whole new environment from the original SQL language, sorry pardon me for my knowledge for SparkSQL.

USE CarSalesP1935727;
CREATE TABLE IF NOT EXISTS Products
USING CSV
OPTIONS (path "/FileStore/tables/Products.csv", header "true", inferSchema
"true", sep ";");

SELECT * FROM Products LIMIT 10


Solution

  • Not sure about the problem, working well -

    Please note that the env is not databricks

      val path = getClass.getResource("/csv/test2.txt").getPath
        println(path)
    
        /**
          * file data
          * -----------
          * id;sequence;sequence
          * 1;657985;657985
          * 2;689654;685485
          */
        spark.sql(
          s"""
            |CREATE TABLE IF NOT EXISTS Products
            |USING CSV
            |OPTIONS (path "$path", header "true", inferSchema
            |"true", sep ";")
          """.stripMargin)
    
        spark.sql("select * from Products").show(false)
        /**
          * +---+---------+---------+
          * |id |sequence1|sequence2|
          * +---+---------+---------+
          * |1  |657985   |657985   |
          * |2  |689654   |685485   |
          * +---+---------+---------+
          */