Search code examples
rexcelsyntaxaggregatespss

How can I get a percentages column when I aggregate a large dataset in SPSS?


I have a large dataset in SPSS as follows:

Col_1   Col_2   Qty
a         x      5
a         x      5
a         y      2
b         x      1
b         y      6
b         y      7 
c         y      4
c         z     10
c         z     11
d         x      2

Using the following syntax I aggregate the dataset:

AGGREGATE
  /OUTFILE='agg'
  /PRESORTED
  /BREAK=Col_1 Col_2
  /Qty_sum=SUM(Qty).

This is the result I get:

Col_1   Col_2   Qty_sum
a          x    10
a          y    2
b          x    1
b          y    13
c          y    4
c          z    21
d          x    2

In addition to this, I want the results to give me a percentage as follows:

Col_1   Col_2   Qty_sum Percent
a          x    10        83%
a          y    2         17%
b          x    1         7%
b          y    13        93%
c          y    4         16%
c          z    21        84%
d          x    2         100%

The Percent column has a percentage that shows how the Qty_sum is spread across the values in Col_2.

Since there's no consistency in the way the data in Col_1 and Col_2 are recorded, I cannot use a generic formula for the entire dataset.

I cannot work in Excel using the Sumifs function as the dataset is too large for Excel.

I tried using the Row Percentages option in the Crosstabs function, but that works only to get a count across two rows of categorical variables.

Instead of SPSS, please let me know if this can be done in R as well.


Solution

  • In SPSS, do this:

    AGGREGATE /OUTFILE=* mode=addvariables /BREAK=Col_1 /Qty_sum_col1=SUM(Qty_sum).
    compute Col2fraction=Qty_sum/Qty_sum_col1.
    

    The aggregate command will add a new variable Qty_sum_col1 which will contain the sum of Qty for every value of col_1. The following compute creates Col2fraction by dividing Qty_sum by Qty_sum_col1, resulting in the wanted fraction.