I am trying to extract data from oracle source using pyspark.
I am using this code. from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Spark_Job').getOrCreate()
driver = 'oracle.jdbc.driver.OracleDriver'
url = 'jdbc:oracle:thin:@XXXXXXXXXXXXXXXXXX/XXX'
user = 'XXXXX'
password = 'XXXXXXXX'
query = 'SELECT col_1 from schema_1.view_1'
df = spark.read.format('jdbc').option('driver', driver).option('url', url).option('dbtable', query)\
.option('user', user).option('password', password).load()
df.show(10)
I am getting this error :
java.sql.SQLSyntaxErrorException: ORA-00903: invalid table name
ORA-00903
means that you tried to use table name, but - that table doesn't exist.
There's only one select
statement here:
SELECT col_1 from schema_1.view_1
-------- ------
owner table (or view) name
("user")
Info you posted suggests that you're connected as
user = 'XXXXX'
If view_1
belongs to currently connected user, you don't have to specify owner name so query would then be just select col_1 from view_1
.
If view_1
belongs to currently connected user, then view_1
must be correctly spelled. Oracle doesn't care about letter case unless you decided to enclose its (table's, view's) name into double quotes and used lower or mixed case - then you have to do it every time you reference that table, so query might be e.g. select col_1 from "View_1"
.
If view_1
belongs to user who is different from currently logged user (that's what information you posted suggests), then yes - preceding table's name with its owner's name is the way to do it. However, schema_1
(the owner) has to grant at least select
privilege on view_1
to XXXXX
- otherwise, it won't work. (Double quotes/letter case issue still stands.)
I can't tell which situation of these you have, but now you have something to check and act appropriately.