Search code examples
amazon-s3pysparkaws-glue

Reading a text file using Spark and inserting the value on spark sql


from pyspark import SparkContext
from pyspark import SparkConf

lines = sc.textFile("s3://test_bucket/txt/testing_consol.txt")

llist = lines.collect()

for lines in llist:
        final_query = spark.sql("""{0}
        """.format(lines))

This is what inside the txt file:

select * from test_table 
where id=1

I'm having the error message:

"\nmismatched input 'where' expecting {'(', 'SELECT', 'FROM', 'ADD', 'DESC', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 'REDUCE', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'DFS', 'TRUNCATE', 'ANALYZE', 'LIST', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', 'LOAD'}(line 1, pos 0)\n\n== SQL ==\nwhere id=1\n^^^\n"

The spark sql works if I change the content of the txt file into a single line:

select * from test_table where id=1

It seems like the spark sql could only recognize the first line and not the succeeding line.


Solution

  • If you just merge the query lines it should work:

    llist = ' '.join(lines.collect())
    final_query = spark.sql(llist)