Search code examples
pythonnumpytype-conversiongenfromtxt

Auto convert strings and float columns using genfromtxt from numpy/python


I have several different data files that I need to import using genfromtxt. Each data file has different content. For example, file 1 may have all floats, file 2 may have all strings, and file 3 may have a combination of floats and strings etc. Also the number of columns vary from file to file, and since there are hundreds of files, I don't know which columns are floats and strings in each file. However, all the entries in each column are the same data type.

Is there a way to set up a converter for genfromtxt that will detect the type of data in each column and convert it to the right data type?

Thanks!


Solution

  • If you're able to use the Pandas library, pandas.read_csv is much more generally useful than np.genfromtxt, and will automatically handle the kind of type inference mentioned in your question. The result will be a dataframe, but you can get out a numpy array in one of several ways. e.g.

    import pandas as pd
    data = pd.read_csv(filename)
    
    # get a numpy array; this will be an object array if data has mixed/incompatible types
    arr = data.values
    
    # get a record array; this is how numpy handles mixed types in a single array
    arr = data.to_records()
    

    pd.read_csv has dozens of options for various forms of text inputs; see more in the pandas.read_csv documentation.