Search code examples
pythonpandasetl

how to create multiple columns using values in one column pandas


The data is currently structured like:

data
name1
addr1
url1
vol1
name2
addr2
url2
vol2
name3
addr3
url3
vol3

expected structure:

Name Addr url Vol
name1 addr1 url1 vol1
name2 addr2 url2 vol2
name3 addr3 url3 vol3

I need to extract the data from a column and based on a criteria i.e. if the record is name, id, url or volume, create a column for each


Solution

  • This answer assumes that the values you provided are not the real values: ie the values are meaningful and not literally numbered like that. Data usually just isn't that nicely stated.

    It also assumes that you always have a recurrent series of name, addresses, etc that recurs every four rows without exception with a well-behaving df.index that is merely a numeric count for every row.

    Make indicies specifying which row and which column each element will end up in.

    >>> df['columns'] = df.index % 4
    >>> df['rows'] = df.index // 4
    >>> df[['rows', 'columns', 'data']]
        rows  columns   data
    0      0        0  name1
    1      0        1  addr1
    2      0        2   url1
    3      0        3   vol1
    4      1        0  name2
    5      1        1  addr2
    6      1        2   url2
    7      1        3   vol2
    8      2        0  name3
    9      2        1  addr3
    10     2        2   url3
    11     2        3   vol3
    

    That will create a data frame that looks like the above (I sorted the columns to more easily visualise what's going on). Then unstack your data.

    >>> df.set_index(['rows', 'columns']).unstack()
              data                   
    columns      0      1     2     3
    rows                             
    0        name1  addr1  url1  vol1
    1        name2  addr2  url2  vol2
    2        name3  addr3  url3  vol3
    

    After this, collapse columns multi-index df.columns = df.columns.get_level_values(1) and then rename df.rename(columns={INT: NAME, INT: NAME, ...}, inplace=True). If you have different variable names, adjust as required.