Search code examples
datasetspss

Creating and using multiple datasets in SPSS


Forgive the likely naive question, but despite experience in databases I'm new to SPSS and am probably overlooking something simple.

I have data about Patients (unique-pt-identifier, age, gender, etc.)

The patients take multiple different kinds of Tests, each of which can can require a few 100 to several thousand fields (unique-pt-identifier, testtype, testdate, testdata1, testdata2, ... testdata2000). I have sizable datasets of these test results.

I'd like to compute things about the test results, but those computations sometimes need to reference properties of the patients. I know I can add columns to the Test dataset, adding the patient data to each row, but this seems awkward and redundant (patients take the same type of test multiple times, so I'd end up adding the same info multiple times).

This seems conceptually straightforward, but unless I'm just using the wrong terminology, I can't find anything about this in either SPSS command syntax or in multiple web searches. Happy to read the right documentation if pointed to it.

Many thanks.


Solution

  • In SPSS you need to have all the data you want to interact sit in the same dataset. So yes - you have to get the patients' properties together with test results in the same sataset. If this makes for (too) big datasets, there are two simple ways to do get what you need with a smaller dataset: First, you don't necessarily have to bring together ALL test results and ALL patient properties - just the relevant ones for each analysis. for example:

    match files /file=testresults /table=patients /by=patientID 
        /keep=patientID test1 test2 property1 property2.
    exe.
    dataset name dataForAnalysis1.
    

    The second approach is to first aggregate the test data to patient level, and only then match the datasets.

    dataset activete testdata.
    dataset declare agg1.
    aggregate out=agg1 /break patientID /test1 test2=mean(test1 test2).
    match files /file=agg1 /table=patients /by patientID.
    exe.
    dataset name dataForAnalysis1.