Hello Ladies and Gentlemen, I have a problem with summarizing my datasample while simultaneously wanting to see the "zero-counts" resulting from my attempted method. My data looks like this:
library(dplyr)
set.seed(529)
sampledata <- data.frame(StartPos = rep(1:10, times = 10),
Velocity = c(sample(c(-36, 36), 100, replace = T)),
Response = c(sample(c("H", "M", "W"), 50, replace=T),
sample(c("M", "W"), 50, replace = T)))
The data consists of 100 rows with the Start Positions ranging from 1-10 ( each randomly generated 10 times (some 20 times like Start Position 3 which could exist 20 times)). Each of the Start Positions also has a response which could be H for Hit, M for Miss or W for wrong. It iss possible that there are no H for certain StartPositions. There is also a column called Velocity with the values -36 and 36 which describe the direction of the Stimlus which started at the certain StartPos (-36 to the right, 36 to the left).
The only thing that I really care about here are the StartPos and Velocitys with Hits - for the percentage calculation that follows.
To calculate the number of test-trials which were run per side I created the following filter/counter:
numbofrunsperside <- sampledata %>%
mutate(Direction = case_when( # add direction
Velocity < 0 ~ "Right",
Velocity > 0 ~ "Left",
TRUE ~ "None")) %>%
group_by(StartPos, Direction) %>% # for each combination
count(Velocity, .drop=FALSE) # count
numbofrunsperside
For the Hit-Counts with their respective StartPos and Direction (Left/Right):
sampledata_hit_counts <- sampledata %>%
mutate(Direction = case_when( # add direction
Velocity < 0 ~ "Right",
Velocity > 0 ~ "Left",
TRUE ~ "None")) %>%
filter(Response == "H") %>%
group_by(StartPos, Direction, .drop=FALSE) %>% # for each combination
count(StartPos, .drop=FALSE) # count
sampledata_hit_counts
The problem occurs here: the number of runs per side dataframe has 20 rows, while the sampledata_hit_counts one only has 12.
I get the following error-message, when I try to calculate the percentage of hits using:
sampledata_hit_counts$PTest = sampledata_hit_counts$n /
numbofrunsperside$n
Error in $<-.data.frame
(*tmp*
, PTest, value = c(0.2, 0.2, 0.25, 0.166666666666667, :
replacement has 20 rows, data has 12
In addition: Warning message:
In sampledata_hit_counts$n/numbofrunsperside$n :
longer object length is not a multiple of shorter object length
A way which would fix this, would be to include the "zero-counts" for the different directions and startpos in sampledata_hit_counts - so that the number of rows would be the same in each df. I sadly don't know a way to do this... Help would be greatly appreciated!
You can do a left join:
library(dplyr)
numbofrunsperside %>%
left_join(
sampledata_hit_counts,
by = c("StartPos", "Direction"),
suffix = c("_runs", "_hits")
) %>%
mutate(
p_test = ifelse(is.na(n_hits), 0, n_hits) / n_runs
) %>%
pull(p_test)
#[1] 0.2000000 0.0000000 0.0000000 0.1666667 0.0000000 0.0000000 0.3333333 0.1428571 0.0000000 0.1250000 0.1666667 0.5000000 0.2000000
#[14] 0.4000000 0.1666667 0.0000000 0.0000000 0.3333333 0.5000000 0.0000000