The data is currently structured like:
data |
---|
name1 |
addr1 |
url1 |
vol1 |
name2 |
addr2 |
url2 |
vol2 |
name3 |
addr3 |
url3 |
vol3 |
expected structure:
Name | Addr | url | Vol |
---|---|---|---|
name1 | addr1 | url1 | vol1 |
name2 | addr2 | url2 | vol2 |
name3 | addr3 | url3 | vol3 |
I need to extract the data from a column and based on a criteria i.e. if the record is name, id, url or volume, create a column for each
This answer assumes that the values you provided are not the real values: ie the values are meaningful and not literally numbered like that. Data usually just isn't that nicely stated.
It also assumes that you always have a recurrent series of name, addresses, etc that recurs every four rows without exception with a well-behaving df.index
that is merely a numeric count for every row.
Make indicies specifying which row and which column each element will end up in.
>>> df['columns'] = df.index % 4
>>> df['rows'] = df.index // 4
>>> df[['rows', 'columns', 'data']]
rows columns data
0 0 0 name1
1 0 1 addr1
2 0 2 url1
3 0 3 vol1
4 1 0 name2
5 1 1 addr2
6 1 2 url2
7 1 3 vol2
8 2 0 name3
9 2 1 addr3
10 2 2 url3
11 2 3 vol3
That will create a data frame that looks like the above (I sorted the columns to more easily visualise what's going on). Then unstack your data.
>>> df.set_index(['rows', 'columns']).unstack()
data
columns 0 1 2 3
rows
0 name1 addr1 url1 vol1
1 name2 addr2 url2 vol2
2 name3 addr3 url3 vol3
After this, collapse columns multi-index df.columns = df.columns.get_level_values(1)
and then rename df.rename(columns={INT: NAME, INT: NAME, ...}, inplace=True)
. If you have different variable names, adjust as required.