Search code examples
fortrangfortranfortran90fortran77fortran95

Read text file where the columns have specific format


I am working with Fortran and I need to read a file that have 3 columns. The problem is that the 3rd column is a combination of integers, e.g. 120120101, and I need to separate each single value in a different column.

Usually, I manually remove the first 2 columns so the file would look like:

Info
0120012545
1254875541
0122110000
2254879933

To read this file where each single value is in a different column, I can use the following Fortran subroutine:

subroutine readF(imp, m, n)
  implicit none
  integer :: n,m,i,imp(n,m)
  open(unit=100, file='file.txt', status='old', action='read')
  do i=2,n
    read(100,'(*(i1))') imp(i,1:m)
    end do
  close(unit=100)
end subroutine readF

I wonder if it is possible to read a file with the following content:

IDs Idx Info
ID001 1 125478521111
ID002 1 525478214147
ID003 2 985550004599
ID004 2 000478520002

and the results would looks like:

ID001 1 1 2 5 4 7 8 5 2 1 1 1 1
ID002 1 5 2 5 4 7 8 2 1 4 1 4 7
ID003 2 9 8 5 5 5 0 0 0 4 5 9 9
ID004 2 0 0 0 4 7 8 5 2 0 0 0 2

where the values in the 3rd column is spitted in m column.

The first row is the header, but I don't need it, so I start reading from the second line.

I tried to write use the following subroutine, but it didn't work:

subroutine readF(imp, ind, m, n)
  implicit none
  integer :: n,m,i,imp(n,m),ind(n),chip(n)
  open(unit=100, file='file.txt', status='old', action='read')
  do i=2,n
    read(100,'(i8,i1,*(i1))') ind(i),chip(i),imp(i,1:m)
  end do
  close(unit=100)
end subroutine readF

Does anyone know how I could read that file without manually removing the first two columns?

Thank you.


Solution

  • I am going to guess what each of the variables mean and also try to explain some apparent mistakes.

    I believe your do i=2,n is a mistake because I have seen some of my students make this mistake. Starting i at 2 does not mean you are reading in from the second line, it is just the value of i. Then, assuming you have n data lines, you will miss the last data line because you are reading in n-1 lines. What you want is a blank read statement before the loop. This skips the header line. Then you want i to go from 1 to n.

    From the order of the variables in the read statement, I assume ind is the ID number, chip is the Idx number, and imp has the Info numbers of 1 integer each up to m of them.

    Your i8 will take the first 8 columns of information and try to interpret them as an integer. Well, ID001 1 1 is the first 8 columns of the first data line and this is not an integer. You need to skip the 'ID' and read in '001' into ind. Then skip 1 character and read in 1 integer into chip, then skip 1 more character then read in the Info, 1 integer at a time. The x format specifier skips 1 character.

    For each integer to go into imp separately, you need an implied do loop that goes from 1 to m. I used j there for that. If you do not know about implied do loops, please google it. It is quite standard in Fortran.

    This code snippet will do just that:

    open(unit=100, file='file.txt', status='old', action='read')
    read(100,*)  ! This skips the header line.
    do i=1,n     ! Read in n data lines.
      read(100,'(2x,i3,1x,i1,1x,*(i1))') ind(i),chip(i),(imp(i,j),j=1,m)
    end do
    close(unit=100)
    

    Additional answer to address the comment. I see you would have two options. First, get into line parsing. I would not choose this.

    Second option is to read the line using unformatted input. Unformatted input uses blanks to separate the input items. I would make the third item a character variable long enough to accommodate a length of m. This character variable can be read with Fortran's read statement. This is called reading from an internal record. You would read each integer as before. This is what this would look like:

    character(len=m) :: Info
    character(len=:),allocatable :: Dumb
    open(unit=100, file='file.txt', status='old', action='read')
    read(100,*)  ! This skips the header line.
    do i=1,n     ! Read in n data lines.
      read(100,*) Dumb, chip(i), Info
      read(Info,'(*(i1))') (imp(i,j),j=1,m)
    end do
    close(unit=100)
    

    The first read statement in the do loop is reading from the file. It sticks the entire first column into Dumb no matter its length, the second column into chip(i), and the entire 3rd column into a character string named Info.

    The second read statement is reading from the "internal record" Info. You can use a read statement on a character string. Here I use the format specifiers and the implied do loop to extract 1 integer at a time.