I want to generate a set of dyad identifiers for a bilateral trade flow dataframe (that is coded in from
, to
, and amount
traded format) such that I could use these identifiers for further statistical analysis.
My example data is provided at below, from which I have extracted and identified unique country dyads from the data that involve the US.
# load the example data
trade_flow <- readRDS(gzcon(url("https://www.dropbox.com/s/ep7xldoq9go4f0g/trade_flow.rds?dl=1")))
# extract country dyads
country_dyad <- trade_flow[, c("from", "to")]
# identify unique pairs
up <- country_dyad[!duplicated(t(apply(country_dyad, 1, sort))),]
# extract only unique pairs that involve the US
up <- up[(up$from == "USA") | (up$to == "USA"), ]
## how can I use the unique pair object (up) to generate dyad identifiers and include them as a new column in the trade_flow dataframe
The next step is match these unique dyad pairs from the original dataframe's (trade_flow
) from
and to
columns and generate a list of unique dyad identifiers as a new column (say, dyad
) to the df (trade_flow
). It should look something like the format below in which each unique dyad is identified and coded as a unique numerical value. I will be grateful if someone could help me on this.
from to trade_flow dyad
USA ITA 5100 2
USA UKG 4000 1
USA GMY 17000 3
USA ITA 4500 2
USA JPN 2900 4
USA UKG 6700 1
USA ROK 7000 5
USA UKG 2300 1
USA SAF 1500 6
IND USA 2400 7
Assuming that flows are directioinal so that A/B and B/A are different flows, paste the from
and to
columns together and convert to factor. The internal codes that factor uses are 1, 2, ..., no_of_levels and to extract those use as.numeric
.
transform(DF, dyad = as.numeric(factor(paste(from, to))))
giving:
from to trade_flow dyad
1 USA ITA 5100 3
2 USA UKG 4000 7
3 USA GMY 17000 2
4 USA ITA 4500 3
5 USA JPN 2900 4
6 USA UKG 6700 7
7 USA ROK 7000 5
8 USA UKG 2300 7
9 USA SAF 1500 6
10 IND USA 2400 1
If we want to perform this assignment only for a subset of rows of DF
, for example head(DF)
, and then use those assignments for all of DF
using NA for flows in DF
that are not in DF0
then first perform the assignment of dyads as above (see first line below) and then remove the flow numbers from DF0
and extract its unique rows using unique
. Finally merge that with the DF
along the first two columns using all.x=TRUE
so that unmatched rows in DF
are not dropped.
DF0 <- transform(head(DF), dyad = as.numeric(factor(paste(from, to))))
merge(DF, unique(DF0[-3]), all.x = TRUE, by = 1:2)
giving:
from to trade_flow dyad
1 IND USA 2400 NA
2 USA GMY 17000 1
3 USA ITA 4500 2
4 USA ITA 5100 2
5 USA JPN 2900 3
6 USA ROK 7000 NA
7 USA SAF 1500 NA
8 USA UKG 4000 4
9 USA UKG 2300 4
10 USA UKG 6700 4
Input in reproducible form:
Lines <- "from to trade_flow
USA ITA 5100
USA UKG 4000
USA GMY 17000
USA ITA 4500
USA JPN 2900
USA UKG 6700
USA ROK 7000
USA UKG 2300
USA SAF 1500
IND USA 2400"
DF <- read.table(text = Lines, header = TRUE)