Search code examples
rdatedataframedata-analysis

Extracting rows from dataframe R


I have a small problem. I managed to create data frame from two other dataframes which is nice but I have too many rows. Example:

**PL|WPLF05652203|Terytorium_nowe|F109|2017-05-14|F106|2017-09-05**
PL|WPLF05652203|Terytorium_nowe|F109|2017-05-14|F106|2017-09-07
PL|WPLF05652203|Terytorium_nowe|F109|2017-05-14|F106|2017-09-11
PL|WPLF05652203|Terytorium_nowe|F109|2017-05-14|F106|2017-09-14
PL|WPLF05652203|Terytorium_nowe|F109|2017-05-14|F107|2018-03-04
PL|WPLF05652203|Terytorium_nowe|F109|2017-05-14|KB|2018-05-13
**PL|WPLF05652203|Terytorium_nowe|F106|2017-09-05|F109|2017-09-06**
PL|WPLF05652203|Terytorium_nowe|F106|2017-09-05|F109|2017-09-10
PL|WPLF05652203|Terytorium_nowe|F106|2017-09-05|F109|2017-09-12
PL|WPLF05652203|Terytorium_nowe|F106|2017-09-05|F109|2017-09-17
PL|WPLF05652203|Terytorium_nowe|F106|2017-09-05|F107|2018-03-04
PL|WPLF05652203|Terytorium_nowe|F106|2017-09-05|KB|2018-05-13
**PL|WPLF05652203|Terytorium_nowe|F109|2017-09-06|F106|2017-09-07**

I should only have the rows with between the **. The Question is how to extract them, what rule or condition should I create or how to extract the rest to leave only relevant. The condition for data with creating this was

   If FullDataSet$date[i] <= FullDataSet1$date[j]

So It's clear that the first date is earlier than the second one, but I don't want to have that many records. The new date should match the old date from next row.

Thank you for help. Best regards


Solution

  • I reproduced your situation by eliminating the asterisks from your text and reading from the file with

    df <- read.table('text.txt', sep = '|')
    

    You can check for unique rows on the first 5 columns.

    df[which(!duplicated(df[,1:5])),]