Search code examples
pythonsql-serverstata

Calculating datediff in days on Python / SQL Server / Stata


enter image description here

I want to create a new column called days that calculate the date difference for each distinct users.

For example, days for distinct user x1 starting from minimum date of x1, days would look like ..

days = 1, 2,3,4,5,6,...

What is the best way to do this for every distinct user in the data?

Each distinct users have different length of dates in the data set as shown in the picture.

Any solutions using SQL Server, Python, or Stata?

Thanks in advance.


Solution

  • Here is a Stata solution, assuming you would be importing the dates as strings:

    * GENERATE SAMPLE DATA
        clear
        set obs 2
        gen user_i = _n
        expand 10
        gen month = runiformint(1,12)
        gen day = runiformint(1,28) if inlist(month,2)
        replace day = runiformint(1,31) if inlist(month,1,3,5,7,8,10,12)
        replace day = runiformint(1,30) if mi(day)
        gen e_date = "2019" + "-" + (2 - strlen(string(month)))*"0" + string(month) + "-" + (2 - strlen(string(day)))*"0" + string(day)
        drop month day
    
    * calculate days
        gen date = date(e_date, "YMD")
        format date %td
        bysort user_i (date): gen days_from_min = date-date[1]
    

    that being said, there is most certainly a way to do this quite easily in MSSQL or wherever your data is currently housed.