I have a data file with information:
x y k !name for the columns to explain
200316157 123 2004121
200316157 456 2004121
200316157 789 2004121
200519776 456 2007234
200519776 789 2007234
200812334 123 2010333
200812334 789 2010333
200812334 345 2010333
200812334 567 2010333
So each line contains one specific unique piece of information but the rest of the line is information that is repeated. The file will be sorted so the info about each person will always be all in one place.
I want to count the frequency that each unique person (defined by the first column) appears in the data and add that information to the data so it looks like this:
x y k !name for the columns to explain
200316157 123 2004121 3
200316157 456 2004121 3
200316157 789 2004121 3
200519776 456 2007234 2
200519776 789 2007234 2
200812334 123 2010333 4
200812334 789 2010333 4
200812334 345 2010333 4
200812334 567 2010333 4
Is this possible? If you have any tips and tricks I would love to hear!
This is my feeble attempt so far:
program counting
implicit none
integer, parameter :: n=40000 !lenght of file
integer, dimension(1:n) :: x, y, k, icount
integer :: i
open (unit=20, file="data.txt", status="old")
do i = 1, n
read (20,2001) x(i), y(i), k(i),
2001 format (i9, 1x, i3, 1x, i7,)
enddo
open (unit=21, file="data2.txt", status="new") !a new datafile
icount(i) = 0
do
if (x(i) == x(i)) then !how can I compare a line to the previous one?????
icount = icount +i
write (21,2021) x(i), y(i), k(i), icount(i)
endif
icount(i) = 0
enddo
2021 format (i9, 1x, i3, 1x, i7, 1x, i1) !a new datafile
endprogram counting
But the only thing this does is:
200316157 123 2004121 1
200316157 456 2004121 2
200316157 789 2004121 3
200519776 456 2007234 4
200519776 789 2007234 5
200812334 123 2010333 6
200812334 789 2010333 7
200812334 345 2010333 8
200812334 567 2010333 9
I don't know how I can compare line 2 to line 1 and then count the occurrence of each unique person?
*edit
I am also wondering if I can end up with this result:
200316157 123 2004121 1
200316157 456 2004121 2
200316157 789 2004121 3
200519776 456 2007234 1
200519776 789 2007234 2
200812334 123 2010333 1
200812334 789 2010333 2
200812334 345 2010333 3
200812334 567 2010333 4
You can solve your problem in the following way:
i0
i1
i0..i1
Example implementation
! a.f90
program counting
implicit none
integer, parameter :: n = 9 ! length of file
integer :: i, iounit, i0, i1
integer, dimension(n) :: x, y, k
! read input file
open (newunit=iounit, file='data.txt', action='read')
read (iounit, *)
read (iounit, *)
do i = 1, n
read (iounit, *) x(i), y(i), k(i)
end do
close (iounit)
! compare x array and write out data
open (newunit=iounit, file='data2.txt', action='write')
i1 = 0
do
! set i0, i1
! i0: line where specific user id starts
! i1: line where specific user id ends
i0 = i1 + 1
do i = i0, n
if (x(i) /= x(i0)) exit
i1 = i
end do
do i = i0, i1
write (iounit, '(i0, 1x, i0, 1x, i0, 1x, i0)') x(i), y(i), k(i), i1-i0+1 ! last column will be total number
! write (iounit, '(i0, 1x, i0, 1x, i0, 1x, i0)') x(i), y(i), k(i), i-i0+1 ! last column will increase by 1, resets for new user id
end do
if (i1 == n) exit
end do
close (iounit)
end program
There are two options included on how to write out the last column. Just uncomment that one line.
For the given input file
$ cat data.txt
x y k !name for the columns to explain
200316157 123 2004121
200316157 456 2004121
200316157 789 2004121
200519776 456 2007234
200519776 789 2007234
200812334 123 2010333
200812334 789 2010333
200812334 345 2010333
200812334 567 2010333
You will receive the following output
$ gfortran -g -Wall -fcheck=all a.f90 && ./a.out && cat data2.txt
200316157 123 2004121 3
200316157 456 2004121 3
200316157 789 2004121 3
200519776 456 2007234 2
200519776 789 2007234 2
200812334 123 2010333 4
200812334 789 2010333 4
200812334 345 2010333 4
200812334 567 2010333 4