Search code examples
pythonnumpygenfromtxt

Using multiple genfromtxt on a single file


I'm fairly new to Python and am currently having problems with handling my input file reads. Basically I want my code to take an input file, where the relevant info is contained in blocks of 4 lines. For my specific purpose, I only care about the info in lines 1-3 of each block.

A two-block example of the input I'm dealing with, looks like:

#Header line 1
#Header line 2
'Mn 1',       5130.0059,  -2.765,  5.4052,  2.5,  7.8214,  1.5, 1.310, 2.390, 0.500, 8.530,-5.360,-7.630,
'  LS                                                                       3d6.(5D).4p z6F*'
'  LS                                                                       3d6.(5D).4d e6F'
'K07           A   Kurucz MnI 2007    1 K07       1 K07       1 K07       1 K07       1 K07       1 K07       1 K07       1 K07       1 K07     Mn            '
'Fe 2',       5130.0127,  -5.368,  7.7059,  2.5, 10.1221,  2.5, 1.030, 0.860, 0.940, 8.510,-6.540,-7.900,
'  LS                                                                     3d6.(3F2).4p y4F*'
'  LS                                                                           3d5.4s2 2F2'
'RU                Kurucz FeII 2013   4 K13       5 RU        4 K13       4 K13       4 K13       4 K13       4 K13       4 K13       4 K13     Fe+           '

I would prefer to store the info from each of these three lines in separate arrays. Since the entries are a mix of strings and floats, I'm using Numpy.genfromtxt to read the input file, as follows:

import itertools
import numpy as np

with open(input_file) as f_in:
  #Opening file, reading every fourth line starting with line 2.
  data = np.genfromtxt(itertools.islice(f_in,2,None,4),dtype=None,delimiter=",")
  #Storing lower transition designation:
  low = np.genfromtxt(itertools.islice(f_in,3,None,4),dtype=str)
  #Storing upper transition designation:
  up = np.genfromtxt(itertools.islice(f_in,4,None,4),dtype=str)

Upon executing the code, genfromtxt correctly reads the information from the file the first time. However, for the second and third call to genfromtxt, I get the following warning

UserWarning: genfromtxt: Empty input file: "<itertools.islice object at 0x102d7a1b0>"
warnings.warn('genfromtxt: Empty input file: "%s"' % fname)

Whereas this is only a warning, the arrays returned by the second and third call of genfromtxt are empty, and not containing strings as expected. If I comment out the second and third call of genfromtxt, the code behaves as expected.

As far as I understand, the above should be working, and I'm a bit at a loss as to why it doesn't. Ideas?


Solution

  • After the first genfromtext (well, really the islice), the file iterator has reached the end of the file. Thus the warnings and empty arrays: the second two islice calls are using an empty iterator.

    You'll want to read the file into memory line-by-line with f_in.readlines() as in hpaulj's answer, or add f_in.seek(0) before your subsequent reads, to reset the file pointer back to the beginning of the input. This is a slightly more memory-friendly solution, which could be important if those files are really huge.

    # Note: Untested code follows
    with open(input_file) as f_in:
        np.genfromtxt(itertools.islice(f_in,2,None,4),dtype=None,delimiter=",")
    
        f_in.seek(0)  # Set the file pointer back to the beginning
        low = np.genfromtxt(itertools.islice(f_in,3,None,4),dtype=str)
    
        f_in.seek(0)  # Set the file pointer back to the beginning
        up = np.genfromtxt(itertools.islice(f_in,4,None,4),dtype=str)