Search code examples
runiquehierarchical-datapairwise

Use unique pairs of column values to generate dyad identifiers in the dataframe


I want to generate a set of dyad identifiers for a bilateral trade flow dataframe (that is coded in from, to, and amount traded format) such that I could use these identifiers for further statistical analysis.

My example data is provided at below, from which I have extracted and identified unique country dyads from the data that involve the US.

# load the example data
trade_flow <- readRDS(gzcon(url("https://www.dropbox.com/s/ep7xldoq9go4f0g/trade_flow.rds?dl=1")))
# extract country dyads
country_dyad <- trade_flow[, c("from", "to")]
# identify unique pairs
up <- country_dyad[!duplicated(t(apply(country_dyad, 1, sort))),]
# extract only unique pairs that involve the US
up <- up[(up$from == "USA") | (up$to == "USA"), ]

## how can I use the unique pair object (up) to generate dyad identifiers and include them as a new column in the trade_flow dataframe

The next step is match these unique dyad pairs from the original dataframe's (trade_flow) from and to columns and generate a list of unique dyad identifiers as a new column (say, dyad) to the df (trade_flow). It should look something like the format below in which each unique dyad is identified and coded as a unique numerical value. I will be grateful if someone could help me on this.

from    to  trade_flow  dyad
USA   ITA      5100       2
USA   UKG      4000       1
USA   GMY     17000       3
USA   ITA      4500       2
USA   JPN      2900       4
USA   UKG      6700       1
USA   ROK      7000       5
USA   UKG      2300       1
USA   SAF      1500       6
IND   USA      2400       7

Solution

  • Assuming that flows are directioinal so that A/B and B/A are different flows, paste the from and to columns together and convert to factor. The internal codes that factor uses are 1, 2, ..., no_of_levels and to extract those use as.numeric.

    transform(DF, dyad = as.numeric(factor(paste(from, to))))
    

    giving:

       from  to trade_flow dyad
    1   USA ITA       5100    3
    2   USA UKG       4000    7
    3   USA GMY      17000    2
    4   USA ITA       4500    3
    5   USA JPN       2900    4
    6   USA UKG       6700    7
    7   USA ROK       7000    5
    8   USA UKG       2300    7
    9   USA SAF       1500    6
    10  IND USA       2400    1
    

    Applying assignments made on subset to whole

    If we want to perform this assignment only for a subset of rows of DF, for example head(DF), and then use those assignments for all of DF using NA for flows in DF that are not in DF0 then first perform the assignment of dyads as above (see first line below) and then remove the flow numbers from DF0 and extract its unique rows using unique. Finally merge that with the DF along the first two columns using all.x=TRUE so that unmatched rows in DF are not dropped.

    DF0 <- transform(head(DF), dyad = as.numeric(factor(paste(from, to))))
    merge(DF, unique(DF0[-3]), all.x = TRUE, by = 1:2)
    

    giving:

       from  to trade_flow dyad
    1   IND USA       2400   NA
    2   USA GMY      17000    1
    3   USA ITA       4500    2
    4   USA ITA       5100    2
    5   USA JPN       2900    3
    6   USA ROK       7000   NA
    7   USA SAF       1500   NA
    8   USA UKG       4000    4
    9   USA UKG       2300    4
    10  USA UKG       6700    4
    

    Note

    Input in reproducible form:

    Lines <- "from to trade_flow
    USA   ITA      5100       
    USA   UKG      4000       
    USA   GMY     17000       
    USA   ITA      4500       
    USA   JPN      2900       
    USA   UKG      6700       
    USA   ROK      7000       
    USA   UKG      2300       
    USA   SAF      1500       
    IND   USA      2400"
    DF <- read.table(text = Lines, header = TRUE)