I have multiple columns with VC's funds.
df <- data.frame(Company = c("X", "Y","Z"),
Valuation = c("10","20","30"),
VC.1 = c("Zeev Ventures","Bedrock Capital","Activant Capital"),
VC.2 = c("Bedrock Capital","Activant Capital","Zeev Ventures"))
Company | Valuatiion | VC.1 | VC.2 |
---|---|---|---|
X | 10 | Zeev Ventures | Bedrock Capital |
Y | 20 | Bedrock Capital | Activant Capital |
Z | 30 | Activant Capital | Zeev Ventures |
I need calcucalte different statistics for each VC fund. For example
VC | Total Valuatiion | Total Investments |
---|---|---|
Zeev Ventures | 40 | 2 |
Bedrock Capital | 40 | 2 |
Activant Capital | 50 | 2 |
You can pivot the VC.1
and VC.2
columns and get your results with the usual dplyr
functions:
library(tidyr)
library(dplyr)
pivot_longer(df, cols = c("VC.1", "VC.2"), values_to = "VC") %>%
group_by(VC) %>%
summarise(TV = sum(as.numeric(Valuation)), TI = n())
# A tibble: 3 × 3
VC TV TI
<chr> <dbl> <int>
1 Activant Capital 50 2
2 Bedrock Capital 30 2
3 Zeev Ventures 40 2
I recommend inspecting the result of the pivot_longer()
function call alone for a better understanding of why this pivoting step solves the problem of having identifiers in multiple columns.