Search code examples
performancesortingdatespss

Is it faster to sort dates or sort strings in SPSS? If so, by how much?


I have a dataset of around 5 million records. The dates are read in as strings. They are in the form MM/DD/YYYY HH:MM:SS. I am only interested in the date part of it so I read them in as (A10) format which effectively trims the time.

I then do ALTER TYPE DateVar (SDATE10). I do this as I thought sorting dates would be quicker but I can't find confirmation of this.

Is there a way to time SPSS commands to work out questions like this?


Solution

  • The quickest way I can think of is to use python for the timestamps, and normal SPSS syntax for the sorting - just to replicate real-life conditions

    ***Start timer, in python.    
    begin program.
    import time
    start = time.time()
    end program.
    ***go out of python, into normal SPSS syntax, and do your stuff.
    
    /*Put the syntax you want to test here
    
    ***get back to python, stop timer, and calculate time difference.
    begin program.
    end = time.time()
    print("It took ",end - start, " seconds")
    end program.
    

    Check the output log, and it will show you the time.

    Not very scientific, but quick and easy. I recommend re-starting SPSS between tests - just to be sure one test is not affecting the other.

    From my experience, alter type does something that affects code execution times. Not sure what, but everything seems slower after an alter type. So you might also consider saving and re-opening after using alter type.