this is not a hard problem from my understanding, however, I have minimal experience with different R functions and problem solving within it. The text is long to provide a thorough understanding rather than complexity. Any help would be greatly appreciated.
I am trying to utilize a Cobb-Douglas Utility Function [x^a * y^(1 - a)] to calculate a Receiver Operator Curve for every point of alpha for a final result of an AUC vs. Alpha graph.
If you are unfamiliar with ROC - I'd recommend GoogleDev & Wikipedia.
Here are my two initial tables: df, alpha
view(df)
ID | X | Y |
---|---|---|
A | 1 | 4 |
B | 2 | 5 |
C | 3 | 6 |
Alpha <- seq(0.1,0.3,by = 0.1))
view(alpha)
row # | Alpha value |
---|---|
1 | 0.1 |
2 | 0.2 |
3 | 0.3 |
Wanted function #1 here to combine Alpha to XY pairs w/
view(combined_wanted_result)
ID | X | Y | Alpha |
---|---|---|---|
A | 1 | 4 | 0.1 |
A | 1 | 4 | 0.2 |
A | 1 | 4 | 0.3 |
B | 2 | 5 | 0.1 |
B | 2 | 5 | 0.2 |
B | 2 | 5 | 0.3 |
C | 3 | 6 | 0.1 |
C | 3 | 6 | 0.2 |
C | 3 | 6 | 0.3 |
The combined result is wanted due to the fact that every XY pair (1/4, 2/5, 3/6) is unique&connected and I do not want to mix those vales (bad_ex: 1/5, 3/5, 2/6).
I'm trying to use the Cobb-Douglas Utility Function so that I can make a table for every iteration of alpha. The documentation in R does not seem to be appropriate for my use.
Z = (x^a * y^(1 - a)
Wanted cobb_douglas function #2 here
View(cobb_douglas)
ID | X | Y | Alpha | Z |
---|---|---|---|---|
A | 1 | 4 | 0.1 | 3.482 |
A | 1 | 4 | 0.2 | 3.031 |
A | 1 | 4 | 0.3 | 2.649 |
B | 2 | 5 | 0.1 | 4.562 |
B | 2 | 5 | 0.2 | 4.163 |
B | 2 | 5 | 0.3 | 3.798 |
C | 3 | 6 | 0.1 | 5.598 |
C | 3 | 6 | 0.2 | 5.223 |
C | 3 | 6 | 0.3 | 4.876 |
This View(cobb_douglas) will then be grouped by alpha and an ROC curve created from those Z value in comparison to a given an absolute identification.
view(Abs)
ID | Abs |
---|---|
A | 1 |
B | 0 |
C | 1 |
#Grab all Z values of certain alpha, matched by common ID to get absolute
view(0.1_alpha)
view(0.2_alpha)
ID | Alpha | Z | Abs |
---|---|---|---|
A | 0.1 | 3.482 | 1 |
B | 0.1 | 4.562 | 0 |
C | 0.1 | 5.598 | 1 |
ID | Alpha | Z | Abs |
---|---|---|---|
A | 0.2 | 3.031 | 1 |
B | 0.2 | 4.163 | 0 |
C | 0.2 | 5.223 | 1 |
Wanted function #3 - creating an ROC table (w/ Z&Abs) of each alpha table to calculate AUC
Note, the AUC values are not given here - This table will be used to plot Alpha vs. AUC.
Alpha | AUC |
---|---|
0.1 | ? |
0.2 | ? |
0.3 | ? |
What I've tried prompt:
My initial stick up which I got while trying to solve this problem with 'expand.grid' to plot every alpha to an XY pair. But the X&Y values kept getting mixed with each other leading to unwanted value pairs such as 1/5, 3/5, 2/6.
I have reviewed similar articles in Stackoverflow and some of them solve the further problem of calculating the AUC. However the initial data matching I am still lost on.
Similar issue links:
MLE for Cobb Douglas function
How to calculate a partial area under the curve (AUC)?
How to calculate area under the curve (AUC) in several data series?
How to calculate the AUC from a ROC plot without the underlying data?
How to calculate the AUC of a graph in R?
Applying a function for calculating AUC for each subject
While it is not entirely clear how all your desired outputs should look, here is a partial solution that addresses some of the things you are trying to achieve. If there are other elements in need of consideration, please comment below and I will update my answer.
At its heart, your question relates to data manipulation and the dplyr
package is probably the easiest way to achieve your goals. If you haven't already, install dplyr
and MESS
(for the second example) prior to running this code:
library(dplyr)
library(MESS)
# Your data as described
df <- data.frame(ID = LETTERS[1:3],
X = 1:3,
Y = 4:6)
Alpha <- data.frame(row = 1:3,
Alpha = seq(0.1,0.3,by = 0.1))
Abs <- data.frame(ID = LETTERS[1:3],
Abs = c(1,0,1))
# Create full dataset
result <- df %>%
slice(rep(1:n(), each = nrow(Alpha))) %>%
group_by(ID) %>%
mutate(row = 1:n()) %>%
left_join(., Alpha, by = "row") %>%
mutate(Z = (X^Alpha * Y^(1 - Alpha))) %>%
left_join(., Abs, by = "ID") %>%
select(-row) %>%
ungroup()
data.frame(result)
ID X Y Alpha Z Abs
1 A 1 4 0.1 3.482202 1
2 A 1 4 0.2 3.031433 1
3 A 1 4 0.3 2.639016 1
4 B 2 5 0.1 4.562218 0
5 B 2 5 0.2 4.162766 0
6 B 2 5 0.3 3.798289 0
7 C 3 6 0.1 5.598198 1
8 C 3 6 0.2 5.223303 1
9 C 3 6 0.3 4.873514 1
# Create table with AUC for each Alpha (NOTE: edit columns in auc() if incorrect)
result1 <- df %>%
slice(rep(1:n(), each = nrow(Alpha))) %>%
group_by(ID) %>%
mutate(row = 1:n()) %>%
left_join(., Alpha, by = "row") %>%
mutate(Z = (X^Alpha * Y^(1 - Alpha))) %>%
left_join(., Abs, by = "ID") %>%
select(-row) %>%
group_by(Alpha) %>%
summarize(AUC = auc(Z, Abs, type = "spline"), .groups="keep")
data.frame(result1)
Alpha AUC
1 0.1 0.7930400
2 0.2 0.8208068
3 0.3 0.8363540
Update based on OP's comments
To use multiple sequences of alpha values dynamically in a loop, the following method works. NOTE: I have kept your column names consistent with your original question (rather than the ones stated in your comments) to avoid confusing others who may happen across this answer:
# Create list of alpha sequences to iterate over
alphaseq <- list(seq(0.1, 0.3, by = 0.1),
seq(0.1, 1, by = 0.1),
seq(0.01, 0.1, by = 0.01))
# Create empty df to hold the results
auc_results <- data.frame(matrix(NA, nrow = 0, ncol = 3))
# iteration column added to identify which seq() was used
colnames(auc_results) <- c("iteration",
"Alpha",
"AUC")
# Iterate over alphaseq and save results to auc_results
for(i in seq(alphaseq)) {
# Create Alpha dynamically
Alpha <- data.frame(Alpha = alphaseq[[i]],
row = 1:length(alphaseq[[i]]))
tmp <- df %>%
slice(rep(1:n(), each = nrow(Alpha))) %>%
group_by(ID) %>%
mutate(row = 1:n()) %>%
left_join(., Alpha, by = "row") %>%
mutate(Z = (X^Alpha * Y^(1 - Alpha))) %>%
left_join(., Abs, by = "ID") %>%
select(-row) %>%
group_by(Alpha) %>%
summarize(AUC = auc(Z, Abs, type = "spline"), .groups="keep") %>%
mutate(iteration = i) %>%
relocate(iteration)
auc_results <- rbind(auc_results, tmp)
rm(tmp)
}
# Result
data.frame(auc_results)
iteration Alpha AUC
1 1 0.10 0.7930400
2 1 0.20 0.8208068
3 1 0.30 0.8363540
4 2 0.10 0.7930400
5 2 0.20 0.8208068
6 2 0.30 0.8363540
7 2 0.40 0.8419765
8 2 0.50 0.8394922
9 2 0.60 0.8303900
10 2 0.70 0.8159135
11 2 0.80 0.7971144
12 2 0.90 0.7748883
13 2 1.00 0.7500000
14 3 0.01 0.7551021
15 3 0.02 0.7600145
16 3 0.03 0.7647423
17 3 0.04 0.7692901
18 3 0.05 0.7736625
19 3 0.06 0.7778638
20 3 0.07 0.7818983
21 3 0.08 0.7857700
22 3 0.09 0.7894826
23 3 0.10 0.7930400