Search code examples
pythonstringnumpytruncatedgenfromtxt

Python loadtxt and genfromtxt truncate strings


I have a 2-column array mixed type array that I need to read in and reshape into a data cube. I've got most of it working, but for some reason both numpy.loadtxt and np.genfromtxt drop everything after the 8th character from the string part of the tuple. I have 25 blocks of 8 parameter-value pairs corresponding to stars of varying masses and metallicities. For instance, Teff \t\t 5.2739E+3 (there are 2 tabs between the string and the float) converts to a key-value pair just fine, but MASS/MSUN \t\t 0.800 gets converted to 'MASS/MSU':0.800 instead of 'MASS/MSUN':0.800 like I expected. Similarly,LOG(L/LSUN) \t\t 0.0522 becomes 'LOG(L/LS': 0.0522 instead of 'LOG(L/LSUN)': 0.0522 Why are the last characters in the strings falling off? I've tried setting the delimiters to only tabs, only tabs and newlines (didn't seem to like that), commented out the lines between blocks, etc. Seems like no matter what I do, the character limit for each string is stuck at 8. There must be a string subtype I need to declare. I've made a workaround, it just bothers me.

This is my code (I'm using the Spyder GUI, BTW):

>>>f=np.genfromtxt("zamsdata.txt",dtype=(str,float))
>>>zcube = defaultdict(lambda: defaultdict(lambda: defaultdict(list)))
>>>infotups=[]
>>>for row in f:
>>>    if 'MASS' in row[0]:
>>>        mass=str(row[1])
>>>        continue #rows are in repeating order of MASS, X, Y, Pc, Tc, R, L, Te, LOG(Te) & LOG(L/LSUN)
>>>    if 'X' in row[0]:
>>>        hydfrac=str(row[1])
>>>        continue
>>>    else:
>>>        infotups=infotups+[[hydfrac,mass,str(row[0]),row[1]]]
>>>        
>>>for l,m,a,o in infotups:
>>>    zcube[l][m][a].append(o)

Solution

  • When the data type of a field is specified to be str, it looks like the default size assigned to the field by genfromtxt is eight characters. If you know that the maximum number of characters is, say, 12, you could use dtype=['S12', float]. (Note that I've used a list, not a tuple.) You could also use dtype=None, which tells genfromtxt to figure out the data type of each field from what it finds in the file.