I have a data frame where I have agricultural use codes (1-5) for 15 consecutive years. Each row is a polygon representing a field. Ultimately I need R to loop through the rows and recognize patterns of use and tell me their respective frequency. Unfortunately in my real data set I have over 1 mio. features and thus all possible patterns are not known.
a <- data.frame(replicate(15, sample(0:5,500,rep=TRUE)))
colnames(a) <- paste0("use",2005:2019)
id <- c(1:500)
a <- cbind(id,a)
id use2005 use2006 use2007 use2008 use2009 use2010 use2011 use2012 use2013 use2014 use2015 ...
1 1 1 1 1 1 2 2 1 4 4 4 ...
2 4 4 4 4 5 5 5 0 5 5 5 ...
3 1 4 3 2 3 2 4 5 1 1 1 ...
4 1 1 1 1 1 2 2 1 4 4 4 ...
5 4 2 2 2 2 5 3 3 3 3 3 ...
So in this arbitrary example, the code should recognize that id 1 & 4 have the same pattern.
In the end I imagine the result to be some sort of frequency distribution to see if there are certain patterns in the agricultural use of my fields.
For example:
1 1 1 1 1 2 1 1 1 3 2 4 1 1 1
[50] - occurs 50 times
5 5 5 5 5 1 1 1 1 4 4 4 2 2 3
[35] - occurs 35 times
and so forth with all existing combinations...
Unfortunately I have no idea how to approach this. I have no experience with pattern recognition.
Thank you!
maybe this?
library(tidyverse)
a[, -1] %>% group_by_all %>% count
# use2005 use2006 use2007 use2008 use2009 use2010 use2011 use2012 use2013 use2014 use2015 n
# <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 1 1 1 1 1 2 2 1 4 4 4 2
# 2 1 4 3 2 3 2 4 5 1 1 1 1
# 3 4 2 2 2 2 5 3 3 3 3 3 1
# 4 4 4 4 4 5 5 5 0 5 5 5 1
or if you want to include fields you could change to group_by_at
and exclude id
from the grouping and then paste
fields together:
a %>%
group_by_at(vars(-id)) %>%
summarise(n = n(), ids = paste(id, collapse= "," ))
# use2005 use2006 use2007 use2008 use2009 use2010 use2011 use2012 use2013 use2014 use2015 n ids
# <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <chr>
# 1 1 1 1 1 1 2 2 1 4 4 4 2 1,4
# 2 1 4 3 2 3 2 4 5 1 1 1 1 3
# 3 4 2 2 2 2 5 3 3 3 3 3 1 5
# 4 4 4 4 4 5 5 5 0 5 5 5 1 2