Search code examples
pythonnumpygenfromtxt

get column names from numpy genfromtxt in python


using numpy genfromtxt in python, i want to be able to get column headers as key for a given data. I tried the following, but not able to get the column names for the corresponding data.

column = np.genfromtxt(pathToFile,dtype=str,delimiter=',',usecols=(0))
columnData = np.genfromtxt(pathToFile,dtype=str,delimiter=',')
data = dict(zip(column,columnData.tolist()))

Below is the data file

header0,header1,header2
mydate,3.4,2.0
nextdate,4,6
afterthat,7,8

Currently, it shows data as

{
  "mydate": [
    "mydate",
    "3.4",
    "2.0"
  ],
  "nextdate": [
    "nextdate",
    "4",
    "6"
  ],
  "afterthat": [
    "afterthat",
    "7",
    "8"
  ]
}

I want to get to this format

{
  "mydate": {
    "header1":"3.4",
    "header2":"2.0"
  },
  "nextdate": {
    "header1":"4",
    "header2":"6"
  },
  "afterthat": {
   "header1":"7",
   "header2":  "8"
  }
}

any suggestions?


Solution

  • With your sample file and genfromtxt calls I get 2 arrays:

    In [89]: column
    Out[89]: 
    array(['header0', 'mydate', 'nextdate', 'afterthat'], 
          dtype='<U9')
    In [90]: columnData
    Out[90]: 
    array([['header0', 'header1', 'header2'],
           ['mydate', '3.4', '2.0'],
           ['nextdate', '4', '6'],
           ['afterthat', '7', '8']], 
          dtype='<U9')
    

    Pull out the first row of columnData

    In [91]: headers=columnData[0,:]
    In [92]: headers
    Out[92]: 
    array(['header0', 'header1', 'header2'], 
          dtype='<U9')
    

    Now construct a dictionary of dictionaries (I don't need the separate column array):

    In [94]: {row[0]: {h:v for h,v in zip(headers, row)} for row in columnData[1:]}
    Out[94]: 
    {'afterthat': {'header0': 'afterthat', 'header1': '7', 'header2': '8'},
     'mydate': {'header0': 'mydate', 'header1': '3.4', 'header2': '2.0'},
     'nextdate': {'header0': 'nextdate', 'header1': '4', 'header2': '6'}}
    

    refine it a bit:

    In [95]: {row[0]: {h:v for h,v in zip(headers[1:], row[1:])} for row in columnData[1:]}
    Out[95]: 
    {'afterthat': {'header1': '7', 'header2': '8'},
     'mydate': {'header1': '3.4', 'header2': '2.0'},
     'nextdate': {'header1': '4', 'header2': '6'}}
    

    I like dictionary comprehensions!

    Your dictionary of lists version:

    In [100]: {row[0]:row[1:] for row in columnData[1:].tolist()}
    Out[100]: {'afterthat': ['7', '8'], 'mydate': ['3.4', '2.0'], 'nextdate': ['4', '6']}