In Hive
I have a table called test. In this table I want to have 5 columns.
ID, start_time, end_time, min_value, max_value.
Now using Pyspark
I want to populate this table.
I am doing like below in my python file:
start_time='4/5/2017'
end_time='5/4/2017'
min_value='1'
max_value='100'
sqlContext.sql("insert into table testing.test select '{}','{}','{}','{}','{}'".format(id,start_time,end_time,min_value,max_value))
In this above script start_time, end_time, min_value, max_value I will get these values as part of the script.
Now what I want is the column should be Auto incremented
when ever there is an insert statement is being executed then there should be
id
value added to it.
Is it possible to that using python? if yes then
How can I do auto increment for column ID
in my script
you can simply make an ID variable, and add to it, whenever you also execute the insert command, something like:
id = 1 #outside of the loop or function, possibly make it a global variable
#some loop or function#
sqlContext.sql(....)
id = id + 1
if you are going to be populating that database more often than once (which is likely) you should obtain the id variable from the latest/highest value of the database, so it doesnt repeat ; )