Search code examples
pysparkdatabricks

pyspark convert comma seperated string into dataframe


I have string like below

Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,[email protected]"

Is there any way to convert this into dataframe in spark where each comma considered as new column

Final DataFrame should look like this

enter image description here

note - this string generated within loop and during each loop new string gets generated , I have to append this string into dataframe after splitting that with comma seperator


Solution

  • Usually you want to read data from a file with spark, even from a set of files to support parallel processing. As already suggested in comments spark.read.csv is what you should use to read csv file.

    I added examples with temporary file, just to give you an inline working example. For real cases I recommend writing a real file.

    You can provide a schema into the csv function or include a header into your file. If no schema is provided, spark will name columns _cN.

    import tempfile
    
    from pyspark.sql import SparkSession
    
    spark = SparkSession.builder.getOrCreate()
    
    with tempfile.NamedTemporaryFile(delete=False) as fp:
        fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,[email protected]" \n""")
        fp.close()
    
        spark.read.csv(fp.name).show()
    
    with tempfile.NamedTemporaryFile(delete=False) as fp:
        fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,[email protected]" \n""")
        fp.close()
    
        spark.read.csv(fp.name, schema="Name string, Surname string, Address string, Phone string, Email string").show()
    
    with tempfile.NamedTemporaryFile(delete=False) as fp:
        fp.write(b"""Name,Surname,Address,Phone,Email\n""")
        fp.write(b"""Gourav , Joshi ,"Karnataka, India" ,,"gouravj09@hotmail,[email protected]" \n""")
        fp.close()
    
        spark.read.csv(fp.name, header=True).show()
    
    +-------+-------+----------------+----+--------------------+
    |    _c0|    _c1|             _c2| _c3|                 _c4|
    +-------+-------+----------------+----+--------------------+
    |Gourav | Joshi |Karnataka, India|NULL|gouravj09@hotmail...|
    +-------+-------+----------------+----+--------------------+
    
    +-------+-------+----------------+-----+--------------------+
    |   Name|Surname|         Address|Phone|               Email|
    +-------+-------+----------------+-----+--------------------+
    |Gourav | Joshi |Karnataka, India| NULL|gouravj09@hotmail...|
    +-------+-------+----------------+-----+--------------------+
    
    +-------+-------+----------------+-----+--------------------+
    |   Name|Surname|         Address|Phone|               Email|
    +-------+-------+----------------+-----+--------------------+
    |Gourav | Joshi |Karnataka, India| NULL|gouravj09@hotmail...|
    +-------+-------+----------------+-----+--------------------+