Search code examples
pythonpython-2.7file-read

Python reading the entire file as one line


I have a data file like the following.

Index   Code    Pos1    Strand  Chr2    Pos2    length  blocks
1   G32_bkd.ctx:Vu01(old4)  62739   47+9-   Vu01(old4)  63651   790 0
2   G32_bkd.ctx:Vu01(old4)  441403  10+0-   Vu01(old4)  446263  4893    0
3   G32_bkd.ctx:Vu01(old4)  450546  15+0-   Vu01(old4)  451091  576 0
4   G32_bkd.ctx:Vu01(old4)  459741  10+0-   Vu01(old4)  460841  1068    0
5   G32_bkd.ctx:Vu01(old4)  612262  14+0-   Vu01(old4)  629013  16788   0
6   G32_bkd.ctx:Vu01(old4)  688380  23+0-   Vu01(old4)  693207  4872    0
7   G32_bkd.ctx:Vu01(old4)  730643  12+0-   Vu01(old4)  740497  7011    0
8   G32_bkd.ctx:Vu01(old4)  834116  16+1-   Vu01(old4)  835797  1752    0

I want to read the header line seperately and then read each line in a for loop. My code is

with open(file) as f:
    title_line = f.readline()
    for line in f:
        line = line.strip()
        cols = line.split()

When I checked print(line) inside the for loop, it doesn't print anything. But when I checked print(title_line), the entire file is printed preserving the exact format in the file. What went wrong?

N.B. So, I just copied and pasted the whole file and saved it in a different name and it worked just fine.


Solution

  • One thing that could cause that behavior would be if Python is for some reason not liking the end of line chars from the original file.

    To confirm that, on Linux you could use od -t a file | less, and inspect what's in there. Perhaps the file conforms to a different Operating System standard? If not on Linux, you can use Python itself to print each char with ord to see what it is using (\n, \r, \r\n).

    If that's the case, you have some options:

    • For Python 2, you could open the file in Universal newlines mode. That is, "U":
      • open (file, "U")
      • That's a quick and easy way to confirm this is indeed the issue and fix it, but not recommended for long term
    • Otherwise, you can use io.open instead of open, and use its newline= argument. The default, None, should be what you need.

    If this does not fix your issue, please provide:

    • The Operating System you're using
    • The Python version you're using
    • The source Operating System of the original file

    As an unrelated side note, I'd suggest you check Python's built-in csv module for reading your file. It seems like a perfect fit (the csv module can be configured to use spaces or tabs, instead of commas)

    References