Search code examples
pythonpysparkazure-databricksread-text

how to read a simple strings text file in pyspark?


I have a list of strings saved in text file without header, and wanted to open in pyspark notebook in databricks and print all lines.

abcdef 
vcdfgrs 
vcvdfrs 
vfdedsew 
kgsldkflfdlfd

text = sc.textFile("path.../filename.txt)
print(text.collect()) 

this code is not printing lines. I appreciate your help.


Solution

  • Here it goes

    #define a function which takes line and print
    def f(line):
        print(line)
    
    #building the text file via list
    my_list = [['my text line-1'],['line-2 text2 my2'],['some junk line-3']]
    
    #create RDD via list (you have it via 
    txt_file = sc.parallelize(my_list)
    
    #use for each to call the function and print will work
    txt_file.foreach(f)
    
    #if you want each word via line, use flatmap
    

    enter image description here