I'm attempting to store the top 100 rows of a view from an SQL database to a dataframe using Synapse Notebook. I have tried using this code:
%%pyspark
df = spark.sql("SELECT DataType, TableName, ColumnName FROM dbo.ExternalTableSchema LIMIT 100")
df.show()
But I consistently get the error below even though I have confirmed the name of the view is correct.
I have also tried a variation of the code above:
%%pyspark
df = spark.sql("SELECT TOP (100) [DataType], [TableName], [ColumnName] FROM [dbo].[ExternalTableSchema]")
df.show()
But I get a different error shown below. I am unsure of how to resolve this. Any help would be greatly appreciated!
I have tried the below approach:
from pyspark.sql import SparkSession
jdbcHostname = "dilisqlsrv.database.windows.net"
jdbcDatabase = "db02"
jdbcPort = "1433"
username = "dilip"
password = "Welcome@1"
jdbcUrl = "jdbc:sqlserver://{0}:{1};database={2}".format(jdbcHostname, jdbcPort, jdbcDatabase)
query = "(SELECT * FROM DepartmentView) AS tmp"
df = spark.read \
.format("jdbc") \
.option("url", jdbcUrl) \
.option("dbtable", query) \
.option("user", username) \
.option("password", password) \
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
.load()
df.show()
Results:
+------+---------+-----------+---------------+
|DeptID| DeptName| Location| Manager|
+------+---------+-----------+---------------+
| 1| Sales| New York| John Smith|
| 2|Marketing|Los Angeles| Jane Doe|
| 3| Finance| Chicago|Michael Johnson|
| 4| HR| Houston| Emily Brown|
+------+---------+-----------+---------------+
In the above code I have set JDBC connection parameters Reading data into a DataFrame using pushdown query.
As you mentioned you need to query top 100 rows you can use the below
query = "(SELECT TOP 100 * FROM YourTableName) AS tmp"
Here is how you can do it using spark.sql
tableName = "DepartmentView"
spark.read \
.format("jdbc") \
.option("url", jdbcUrl) \
.option("dbtable", tableName) \
.option("user", username) \
.option("password", password) \
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
.load() \
.createOrReplaceTempView("temp_table")
top100_rows_df = spark.sql("SELECT * FROM temp_table LIMIT 100")
top100_rows_df.show()
Results:
+------+---------+-----------+---------------+
|DeptID| DeptName| Location| Manager|
+------+---------+-----------+---------------+
| 1| Sales| New York| John Smith|
| 2|Marketing|Los Angeles| Jane Doe|
| 3| Finance| Chicago|Michael Johnson|
| 4| HR| Houston| Emily Brown|
+------+---------+-----------+---------------+
In the above code registering the Azure SQL Database table as a temporary view. Executing SQL query using spark.sql to read the top 100 rows