Search code examples
python-3.xnumpyio

Skip lines with strange characters when I read a file


I am trying to read some data files '.txt' and some of them contain strange random characters and even extra columns in random rows, like in the following example, where the second row is an example of a right row:

CTD 10/07/30 05:17:14.41 CTD 24.7813, 0.15752, 1.168, 0.7954, 1497.¸ 23.4848, 0.63042, 1.047, 3.5468, 1496.542

CTD 10/07/30 05:17:14.47 CTD 23.4846, 0.62156, 1.063, 3.4935, 1496.482

I read the description of np.loadtxt and I have not found a solution for my problem. Is there a systematic way to skip rows like these?

The code that I use to read the files is:

#Function to read a datafile

def Read(filename):
    #Change delimiters for spaces
    s = open(filename).read().replace(':',' ')
    s = s.replace(',',' ')
    s = s.replace('/',' ')
    #Take the columns that we need
    data=np.loadtxt(StringIO(s),usecols=(4,5,6,8,9,10,11,12))
    return data

Solution

  • This works without using csv like the other answer and just reads line by line checking if it is ascii

    data = []
    
    def isascii(s):
        return len(s) == len(s.encode())
    
    with open("test.txt", "r") as fil:
        for line in fil:
            res = map(isascii, line)
            if all(res):
                data.append(line)
    
    print(data)