My original dataset(cyclistic_data) has 3 columns: station name
, station_lat
and station_lng
. This is the preview of my data set.
Preview of data set
station_coordinates <-
data.frame(
station = c(Dearborn St, Franklin St, Lake Shore Dr, Dearborn St,
Dearborn St, Franklin St),
station_lat = c(-87.6291273333333, -87.6353428333333,
41.8809828333333, -87.6291273333333,
-87.6291273333333, -87.6353428333333),
station_lng = c(41.8944341666667, 41.900675, 41.8720545,
41.8944341666667, 41.8944341666667, 41.900675)
)
I tried
sort(table(cyclistic_data$start_station),decreasing=TRUE)[1:100]
code to get the most repeated 100 station names. By using this code I identified top 100 stations but I was unable to extract the corresponding lan and lng values from the original data det.I want retrieve the most repeated 100 stations name and its lat and lng values from the original data set.
It's a bit difficult to give an exact solution without a reproducible example, but if I understand you correctly, you could try:
data %%>
add_count(station, name = "station_n")%>%
slice_max(n = 100, order_by = station_n)
Plotting according to long and latitude is then a slightly different issue, and will be dependent on how you want to plot, but you can look at the maps() package and geom_sf() as starters
After your comment, the simplest way I know to do this uses joins. There may well be a better/quicker option:
#Making a minimally complex example:
data <- data.frame(station = c("one", "one", "two", "one", "three", "three"),
lat = c(18938, 18938, 18273, 18938, 199999, 199999))
top <- data %>%
count(station, sort = TRUE)%>%
slice_max(n = 2, order_by = n)
data %>%
inner_join(top)
And we could turn that into a function like so:
keeping_top_n <- function(x, top_n = 100, var_of_interest = station){
top <- x %>%
count({{var_of_interest}}, sort = TRUE)%>%
slice_max(n = top_n, order_by = n)
x %>%
inner_join(top)
}
#you would call this like:
keeping_top_n(data_frame)