Search code examples
rfor-loopif-statementassign

Creating a new column of 'ID's' from a condition


I am a novice when it comes to R and have run in to quite a specific issue that I can't seem to find much help with online. It also doesn't help that asking questions in stack overflow doesn't seem that straightforward. So please bare with me.

To provide some context this is an incredibly large dataset, 250,000 records plus with 9 columns.

One of these columns is called 'Site_ID'.

I now have a huge dataset that is sorted by Site_ID and date. The issue that I have is that a Site ID often does not correspond to just one set of coordinates and what I want is to create a new column that gives a new Site ID when the location changes. As the GPS coordinates have a large amount of error associated with them, the only way to separate them is by essentially 'flagging' when a Site ID is 'deployed' and assigning the same Site ID to all the following records, until the next deploy occurs and then beginning that process again.

In pseudocode it would be something like....

NewColumn <- ifelse(Task_Type == "Deploy trap", assign int val, assign previous rows int val)

The expected output would be a new column that has a series of new site ID's in descending order from 1:n (n = however many Deploys there are in the dataset)

Please let me know if I can provide anymore, I have snapshots of data available, I am simply unsure of how to use the stackoverflow Ask a Question page effectively.

TRIALDATA$SITEIDDEPLOY = ifelse(TRIALDATA$Task_Type == 'Deploy trap', paste0(TRIALDATA$Site_ID, "_1"), TRIALDATA$Site_ID)

This code sample is what I used previously to achieve the data in the attached image. Output1

structure(list(Longitude = c(1414394, 1414394, 1414398, 1411206, 1411206, 1411206, 1411206, 1412729, 1412729, 1412729, 1412733, 1414625, 1414625, 1414623, 1414456, 1414456, 1414456, 1414456, 1414456, 1414456, 1414455, 1411308, 1411308, 1411308, 1411307, 1411215, 1411215, 1411215, 1411214, 1414286, 1414286, 1414286, 1414286, 1414292, 1409923, 1409923, 1409923, 1409923, 1409923, 1409920), Latitude = c(4925150, 4925150, 4925147, 4921828, 4921828, 4921828, 4921827, 4923623, 4923623, 4923623, 4923620, 4925285, 4925285, 4925288, 4923812, 4923812, 4923812, 4923812, 4923812, 4923812, 4923802, 4922003, 4922003, 4922003, 4922002, 4921976, 4921976, 4921976, 4921974, 4924632, 4924632, 4924632, 4924632, 4924636, 4920300, 4920300, 4920300, 4920300, 4920300, 4920300 ), Site_ID = c("100000060049", "100000060049", "100000060049", "100000060070", "100000060070", "100000060070", "100000060070", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060179", "100000060179", "100000060179", "100000060179", "100000060209", "100000060209", "100000060209", "100000060209", "100000060209", "100000060209", "100000060209", "100000060209", "100000060209", "100000060209", "100000060209"), Task_Type = c("Deploy trap", "Check trap", "Remove trap", "Deploy trap", "Check trap", "Check trap", "Remove trap", "Deploy trap", "Check trap", "Check trap", "Remove trap", "Deploy trap", "Check trap", "Remove trap", "Deploy trap", "Check trap", "Check trap", "Check trap", "Check trap", "Check trap", "Remove trap", "Deploy trap", "Check trap", "Check trap", "Remove trap", "Deploy trap", "Check trap", "Check trap", "Remove trap", "Deploy trap", "Check trap", "Check trap", "Check trap", "Remove trap", "Deploy trap", "Check trap", "Check trap", "Check trap", "Check trap", "Remove trap"), Task_Option = c("Leg-hold - possum", "Still set", "Still set", "Leg-hold - possum", "Still set", "Still set", "Still set", "Leg-hold - possum", "Still set", "Still set", "Still set", "Leg-hold - possum", "Still set", "Still set", "Leg-hold - possum", "Still set", "Still set", "Still set", "Still set", "Still set", "Still set", "Leg-hold - possum", "Still set", "Still set", "Still set", "Leg-hold - possum", "Still set", "Still set", "Still set", "Leg-hold - possum", "Still set", "Still set", "Still set", "Still set", "Leg-hold - possum", "Still set", "Still set", "Still set", "Still set", "Still set" ), Additional_Data = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), Time = c("11:32:41", "13:54:16", "12:35:40", "09:23:50", "15:10:06", "14:44:31", "08:10:30", "11:33:14", "14:53:36", "15:05:08", "12:46:37", "09:39:50", "13:54:17", "11:43:09", "09:07:30", "14:32:55", "15:08:07", "16:05:35", "13:37:02", "14:50:06", "08:57:05", "08:25:12", "15:10:07", "14:44:31", "08:03:35", "08:30:32", "15:10:06", "14:44:31", "08:06:16", "12:59:51", "16:05:33", "13:37:05", "12:36:12", "11:47:09", "14:24:09", "15:05:23", "12:58:22", "09:36:56", "14:13:32", "08:08:02"), Date = structure(c(1516060800, 1516147200, 1516233600, 1517875200, 1517961600, 1518048000, 1518134400, 1515542400, 1515628800, 1515715200, 1515801600, 1516060800, 1516147200, 1516233600, 1516579200, 1516665600, 1516752000, 1516838400, 1516924800, 1517011200, 1517097600, 1517875200, 1517961600, 1518048000, 1518134400, 1517875200, 1517961600, 1518048000, 1518134400, 1516752000, 1516838400, 1516924800, 1517011200, 1517097600, 1518048000, 1518134400, 1518220800, 1518307200, 1518393600, 1518480000), tzone = "UTC", class = c("POSIXct", "POSIXt")), SITEIDDEPLOY = c("100000060049_1", "100000060049", "100000060049", "100000060070_1", "100000060070", "100000060070", "100000060070", "100000060155_1", "100000060155", "100000060155", "100000060155", "100000060155_1", "100000060155", "100000060155", "100000060155_1", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155", "100000060155_1", "100000060155", "100000060155", "100000060155", "100000060179_1", "100000060179", "100000060179", "100000060179", "100000060209_1", "100000060209", "100000060209", "100000060209", "100000060209", "100000060209_1", "100000060209", "100000060209", "100000060209", "100000060209", "100000060209")), row.names = c(NA, -40L), class = c("tbl_df", "tbl", "data.frame"))


Solution

  • You can use cumsum here -

    TRIALDATA$result <- cumsum(TRIALDATA$Task_Type == "Deploy trap")
    

    The counts are saved in new column result. This will increment the count every time Task_Type = "Deploy trap".