Search code examples
pythonhivepysparkapache-spark-sqlauto-increment

Auto increment a table column using python


In Hive I have a table called test. In this table I want to have 5 columns.

ID, start_time, end_time, min_value, max_value.

Now using Pyspark I want to populate this table.

I am doing like below in my python file:

start_time='4/5/2017'
end_time='5/4/2017'
min_value='1'
max_value='100'

sqlContext.sql("insert into table testing.test select '{}','{}','{}','{}','{}'".format(id,start_time,end_time,min_value,max_value))

In this above script start_time, end_time, min_value, max_value I will get these values as part of the script.

Now what I want is the column should be Auto incremented when ever there is an insert statement is being executed then there should be id value added to it.

Is it possible to that using python? if yes then

How can I do auto increment for column ID in my script


Solution

  • you can simply make an ID variable, and add to it, whenever you also execute the insert command, something like:

    id = 1 #outside of the loop or function, possibly make it a global variable
    
    #some loop or function#
    
    sqlContext.sql(....)
    
    id = id + 1
    

    if you are going to be populating that database more often than once (which is likely) you should obtain the id variable from the latest/highest value of the database, so it doesnt repeat ; )