Search code examples
countfortranfrequency

Counting frequency of variables in text data in Fortran


I have a data file with information:

 x         y       k               !name for the columns to explain

200316157 123 2004121 
200316157 456 2004121 
200316157 789 2004121 
200519776 456 2007234 
200519776 789 2007234 
200812334 123 2010333 
200812334 789 2010333 
200812334 345 2010333 
200812334 567 2010333 

So each line contains one specific unique piece of information but the rest of the line is information that is repeated. The file will be sorted so the info about each person will always be all in one place.

I want to count the frequency that each unique person (defined by the first column) appears in the data and add that information to the data so it looks like this:

     x     y      k        !name for the columns to explain

200316157 123 2004121 3
200316157 456 2004121 3
200316157 789 2004121 3
200519776 456 2007234 2
200519776 789 2007234 2
200812334 123 2010333 4
200812334 789 2010333 4
200812334 345 2010333 4
200812334 567 2010333 4

Is this possible? If you have any tips and tricks I would love to hear!

This is my feeble attempt so far:

program counting
implicit none 

integer, parameter :: n=40000    !lenght of file

integer, dimension(1:n) :: x, y, k, icount

integer :: i

open (unit=20, file="data.txt", status="old")



do i = 1, n
read (20,2001) x(i), y(i), k(i),
2001 format (i9, 1x, i3, 1x, i7,)
enddo

open (unit=21, file="data2.txt", status="new")   !a new datafile

icount(i) = 0

do
if (x(i) == x(i)) then                      !how can I compare a line to the previous one?????
icount = icount +i
write (21,2021) x(i), y(i), k(i), icount(i)
endif
icount(i) = 0
enddo


2021 format (i9, 1x, i3, 1x, i7, 1x, i1)  !a new datafile

endprogram counting

But the only thing this does is:

200316157 123 2004121 1
200316157 456 2004121 2
200316157 789 2004121 3
200519776 456 2007234 4
200519776 789 2007234 5
200812334 123 2010333 6
200812334 789 2010333 7
200812334 345 2010333 8
200812334 567 2010333 9

I don't know how I can compare line 2 to line 1 and then count the occurrence of each unique person?

*edit

I am also wondering if I can end up with this result:

200316157 123 2004121 1
200316157 456 2004121 2
200316157 789 2004121 3
200519776 456 2007234 1
200519776 789 2007234 2
200812334 123 2010333 1
200812334 789 2010333 2
200812334 345 2010333 3
200812334 567 2010333 4

Solution

  • You can solve your problem in the following way:

    1. user id starts at line i0
    2. iterate lines until you find a new user id (save last line from previous id in i1
    3. write out all lines in-between i0..i1

    Example implementation

    ! a.f90
    program counting
      implicit none
    
      integer, parameter    :: n = 9 ! length of file
      integer               :: i, iounit, i0, i1
      integer, dimension(n) :: x, y, k
    
      ! read input file
      open (newunit=iounit, file='data.txt', action='read')
      read (iounit, *)
      read (iounit, *)
      do i = 1, n
        read (iounit, *) x(i), y(i), k(i)
      end do
      close (iounit)
    
      ! compare x array and write out data
      open (newunit=iounit, file='data2.txt', action='write')
      i1 = 0
      do
        ! set i0, i1
        !   i0: line where specific user id starts
        !   i1: line where specific user id ends
        i0 = i1 + 1
        do i = i0, n
          if (x(i) /= x(i0)) exit
          i1 = i
        end do
    
        do i = i0, i1
          write (iounit, '(i0, 1x, i0, 1x, i0, 1x, i0)') x(i), y(i), k(i), i1-i0+1   ! last column will be total number
          ! write (iounit, '(i0, 1x, i0, 1x, i0, 1x, i0)') x(i), y(i), k(i), i-i0+1 ! last column will increase by 1, resets for new user id
        end do
    
        if (i1 == n) exit
      end do
      close (iounit)
    end program
    

    There are two options included on how to write out the last column. Just uncomment that one line.

    For the given input file

    $ cat data.txt
     x         y       k               !name for the columns to explain
    
    200316157 123 2004121
    200316157 456 2004121
    200316157 789 2004121
    200519776 456 2007234
    200519776 789 2007234
    200812334 123 2010333
    200812334 789 2010333
    200812334 345 2010333
    200812334 567 2010333
    

    You will receive the following output

    $ gfortran -g -Wall -fcheck=all a.f90 && ./a.out && cat data2.txt
    200316157 123 2004121 3
    200316157 456 2004121 3
    200316157 789 2004121 3
    200519776 456 2007234 2
    200519776 789 2007234 2
    200812334 123 2010333 4
    200812334 789 2010333 4
    200812334 345 2010333 4
    200812334 567 2010333 4