Search code examples
j

How to reshape data array from wide to long in J?


I like to replicate the reshape function in J.

For example, Stata can reshape a dataset "from wide to long". Below is their Example 1 from their documentation:

. use http://www.stata-press.com/data/r11/reshape1.dta

. list
  +-------------------------------------------------------+
  | id   sex   inc80   inc81   inc82   ue80   ue81   ue82 |
  |-------------------------------------------------------|
  |  1     0    5000    5500    6000      0      1      0 |
  |  2     1    2000    2200    3300      1      0      0 |
  |  3     0    3000    2000    1000      0      0      1 |
  +-------------------------------------------------------+

. reshape long inc ue, i(id) j(year)

. list
  +-----------------------------+
  | id   year   sex    inc   ue |
  |-----------------------------|
  |  1     80     0   5000    0 |
  |  1     81     0   5500    1 |
  |  1     82     0   6000    0 |
  |  2     80     1   2000    1 |
  |  2     81     1   2200    0 |
  |  2     82     1   3300    0 |
  |  3     80     0   3000    0 |
  |  3     81     0   2000    0 |
  |  3     82     0   1000    1 |
  +-----------------------------+

NB. Python Pandas has a similar function ("stack").

I understand that J can import the data files (csv format) as follows.

load 'web/gethttp'
] dataset =: gethttp 'https://bbbyc.github.io/reshape1.csv'

load 'tables/csv'
] dataInJArray =: fixcsv dataset

I am lost after getting this dataInJArray. How can I reshape it? Appreciate any hints / advice!


Solution

  • To actually work on your specific problem using J you could do this:

    NB. t is the data to be stacked:
    
       [ t=: 3 8 $ 1 0 5000 5500 6000 0 1 0   2 1 2000 2200 3300 1 0 0   3 0 3000 2000 1000 0 0 1 
    1 0 5000 5500 6000 0 1 0
    2 1 2000 2200 3300 1 0 0
    3 0 3000 2000 1000 0 0 1
    

    you can select and combine the different columns appropriately

       ({. ,. 1&{ ,. (2 3 4 & {),. (5 6 7 & {))"1 t
    1 0 5000 0
    1 0 5500 1
    1 0 6000 0
    
    2 1 2000 1
    2 1 2200 0
    2 1 3300 0
    
    3 0 3000 0
    3 0 2000 0
    3 0 1000 1
    

    Since this leaves gaps between the groups, you apply ,/ to the whole result

       ,/@:(({. ,. 1&{ ,. (2 3 4 & {),. (5 6 7 & {))"1) t
    1 0 5000 0
    1 0 5500 1
    1 0 6000 0
    2 1 2000 1
    2 1 2200 0
    2 1 3300 0
    3 0 3000 0
    3 0 2000 0
    3 0 1000 1
    

    I am not sure how well this generalizes, but variations could be used on tables of any number of records if they are already organized appropriately.

    To finish off the formatting and the introduction of 'years'

       [s1=. ,. each <"1  |: s0  NB. years inserted in the next step
    +-+-+----+-+
    |1|0|5000|0|
    |1|0|5500|1|
    |1|0|6000|0|
    |2|1|2000|1|
    |2|1|2200|0|
    |2|1|3300|0|
    |3|0|3000|0|
    |3|0|2000|0|
    |3|0|1000|1|
    +-+-+----+-+
       [s2=. ({. , ,.@:(9 $ 80 81 82"_); }.)s1  NB. 80 81 82"_ creates a verb that returns 80 81 82 given any argument
    +-+--+-+----+-+
    |1|80|0|5000|0|
    |1|81|0|5500|1|
    |1|82|0|6000|0|
    |2|80|1|2000|1|
    |2|81|1|2200|0|
    |2|82|1|3300|0|
    |3|80|0|3000|0|
    |3|81|0|2000|0|
    |3|82|0|1000|1|
    +-+--+-+----+-+
       ('id';'year';'sex';'inc';'ue'),:s2
    +--+----+---+----+--+
    |id|year|sex|inc |ue|
    +--+----+---+----+--+
    |1 |80  |0  |5000|0 |
    |1 |81  |0  |5500|1 |
    |1 |82  |0  |6000|0 |
    |2 |80  |1  |2000|1 |
    |2 |81  |1  |2200|0 |
    |2 |82  |1  |3300|0 |
    |3 |80  |0  |3000|0 |
    |3 |81  |0  |2000|0 |
    |3 |82  |0  |1000|1 |
    +--+----+---+----+--+