Search code examples
pythonspss

Extract string from a line of SPSS syntax and convert to date


A file handle in my syntax references a folder which includes a version number in YYYYDDMM format. For example, the "v20170215" referenced below:

file handle WORKING/name='ROOT\Uploads\20141001_20150930 v20170215'.

The version part of the file handle is routinely updated based on new data that needs to be processed. The file handle always ends with a "v" followed by a YYYYMMDD date.

How can I automatically extract the last "YYYYMMDD" string from the file handle (e.g., "20170215") and create a date variable out of it?

If the date were a string variable in the data, I could use something like below:

* Extract data, month, and year.
compute day = number(char.substr(...),F2.0).
compute month = number(char.substr(...),F2.0).
compute year = number(char.substr(...),F4.0).

* Compute date variable.
compute Version = date.mdy(month,day,year).
formats Version (adate10).
execute.

But given it's a line of syntax I need to parse, I suspect I should look to Python, but I'm stumped how to tackle this.


Solution

  • I'll assume you can't get the updated reference as data from the same source that creates the updated syntax (might have been an easier solution).
    Once the handle is defined, you can extract that definition into data this way:

    dataset declare  myhandle.
    oms/select tables/if commands=['Show'] subtypes=['File Handles']/destination format=SAV outfile='myhandle'.
    show handles.
    omsend.
    dataset activate myhandle.
    

    This will open a dataset called myhandle in which variable Directory will contain the full path for your file as defined in the handle. From that you have to extract only the string you need - see if this can work for you:

    compute Directory=char.substr(Directory,char.index(Directory," v")+2,10).
    

    Now you have the string you needed, you can continue and turn it into a date and match it into your data.