Search code examples
pythonarrayspython-3.xpysparkdifference

Create array of differences in col between two adjacent numbers in an array python/pyspark


I have a column of arrays made of numbers, ie [0,80,160,220], and would like to create a column of arrays of the differences between adjacent terms, ie [80,80,60]

Does anyone have an idea how to approach this in Python or PySpark? I'm thinking of something iterative (ith term minus i-1th term starting at second term) but am really stuck how to code that. Thanks!


Solution

  • Edit:

    d=[0,80,160,220]
    df=pd.DataFrame(d,columns= ['col_list'])
    df['col_new']=df['col_list'].diff()
    print(df)
    #output
       col_list  col_new
    0   0        NaN
    1   80       80.0
    2   160      80.0
    3   220      60.0
    

    Also, if you want to delete the row with NaN you can do:

    df.dropna(subset = ['col_new'])
    
    #output
    
       col_list  col_new
    1   80       80.0
    2   160      80.0
    3   220      60.0