Search code examples
chord-diagramcirclize

Circlize chord diagram with multiple levels of data


I am finding myself a bit stuck, I want to show flows between regions for trafficked species via a chord diagram on circlize, but am unable to work out how to plot when column 1 and 2 represent the "connection", column 3 is the "factor" of interest and column 4 are the values. I have included a sample of data below (yes I am aware indonesia is a region), as you can see each species is not unique to a particular region. I would like to produce a plot similar to the one included below but replace the "countries" with "species" for each region. Is this possible to do?

import_region    export_region  species                flow
North America    Europe         Acanthosaura armata     0.0104
Southeast Asia   Europe         Acanthosaura armata     0.0022
Indonesia        Europe         Acanthosaura armata     0.1971
Indonesia        Europe         Acrochordus granulatus  0.7846
Southeast Asia   Europe         Acrochordus granulatus  0.1101
Indonesia        Europe         Acrochordus javanicus   2.00E-04
Southeast Asia   Europe         Acrochordus javanicus   0.0015
Indonesia        North America  Acrochordus javanicus   0.0024
East Asia        Europe         Acrochordus javanicus   0.0028
Indonesia        Europe         Ahaetulla prasina       4.00E-04
Southeast Asia   Europe         Ahaetulla prasina       4.00E-04
Southeast Asia   East Asia      Amyda cartilaginea      0.0027
Indonesia        East Asia      Amyda cartilaginea      5.00E-04
Indonesia        Europe         Amyda cartilaginea      0.004
Indonesia        Southeast Asia Amyda cartilaginea      0.0334
Europe           North America  Amyda cartilaginea      4.00E-04
Indonesia        North America  Amyda cartilaginea      0.1291
Southeast Asia   Southeast Asia Amyda cartilaginea      0.0283
Indonesia        West Asia      Amyda cartilaginea      0.7614
South Asia       Europe         Amyda cartilaginea      2.8484
Australasia      Europe         Apodora papuana         0.0368
Indonesia        North America  Apodora papuana         0.324
Indonesia        Europe         Apodora papuana         0.0691
Europe           Europe         Apodora papuana         0.0106
Indonesia        East Asia      Apodora papuana         0.0129
Europe           North America  Apodora papuana         0.0034
East Asia        East Asia      Apodora papuana         2.00E-04
Indonesia        Southeast Asia Apodora papuana         0.0045
East Asia        North America  Apodora papuans         0.0042

example of diagram similar to what I would like, please click link below: chord diagram


Solution

  • In circlize package, the ChordDiagram() function only allows a "from" column, a "to" column and a optional "value" column. However, in your case, actually we can make some transformation for the original data frame to modify it into a three-column data frame.

    In you example, you want to distinguish e.g. Acanthosaura_armata in North America from Acanthosaura_armata in Europe, one solution is to merge region names and species names such as Acanthosaura_armata|North_America to form a unique identifier. Next I will demonstrate how to visualize this dataset by circlize package.

    Read in the data. Note I replaced space with underscores.

    df = read.table(textConnection(
    "import_region    export_region  species                flow
    North_America    Europe         Acanthosaura_armata     0.0104
    Southeast_Asia   Europe         Acanthosaura_armata     0.0022
    Indonesia        Europe         Acanthosaura_armata     0.1971
    Indonesia        Europe         Acrochordus_granulatus  0.7846
    Southeast_Asia   Europe         Acrochordus_granulatus  0.1101
    Indonesia        Europe         Acrochordus_javanicus   2.00E-04
    Southeast_Asia   Europe         Acrochordus_javanicus   0.0015
    Indonesia        North_America  Acrochordus_javanicus   0.0024
    East_Asia        Europe         Acrochordus_javanicus   0.0028
    Indonesia        Europe         Ahaetulla_prasina       4.00E-04
    Southeast_Asia   Europe         Ahaetulla_prasina       4.00E-04
    Southeast_Asia   East_Asia      Amyda_cartilaginea      0.0027
    Indonesia        East_Asia      Amyda_cartilaginea      5.00E-04
    Indonesia        Europe         Amyda_cartilaginea      0.004
    Indonesia        Southeast_Asia Amyda_cartilaginea      0.0334
    Europe           North_America  Amyda_cartilaginea      4.00E-04
    Indonesia        North_America  Amyda_cartilaginea      0.1291
    Southeast_Asia   Southeast_Asia Amyda_cartilaginea      0.0283
    Indonesia        West_Asia      Amyda_cartilaginea      0.7614
    South_Asia       Europe         Amyda_cartilaginea      2.8484
    Australasia      Europe         Apodora_papuana         0.0368
    Indonesia        North_America  Apodora_papuana         0.324
    Indonesia        Europe         Apodora_papuana         0.0691
    Europe           Europe         Apodora_papuana         0.0106
    Indonesia        East_Asia      Apodora_papuana         0.0129
    Europe           North_America  Apodora_papuana         0.0034
    East_Asia        East_Asia      Apodora_papuana         2.00E-04
    Indonesia        Southeast_Asia Apodora_papuana         0.0045
    East_Asia        North_America  Apodora_papuans         0.0042"),
    header = TRUE, stringsAsFactors = FALSE)
    

    Also, I removed some rows which have very tiny values.

    df = df[df[[4]] > 0.01, ]
    

    Assign colors for species and regions.

    library(circlize)
    library(RColorBrewer)
    all_species = unique(df[[3]])
    color_species = structure(brewer.pal(length(all_species), "Set1"), names = all_species)
    all_regions = unique(c(df[[1]], df[[2]]))
    color_regions = structure(brewer.pal(length(all_regions), "Set2"), names = all_regions)
    

    Group by species

    First I will demonstrate how to group the chord diagram by species.

    As mentioned before, we use species|region as unique identifier.

    df2 = data.frame(from = paste(df[[3]], df[[1]], sep = "|"),
                     to = paste(df[[3]], df[[2]], sep = "|"),
                     value = df[[4]], stringsAsFactors = FALSE)
    

    Next we adjust the order of all sectors to first order by species, then by regions.

    combined = unique(data.frame(regions = c(df[[1]], df[[2]]), 
        species = c(df[[3]], df[[3]]), stringsAsFactors = FALSE))
    combined = combined[order(combined$species, combined$regions), ]
    order = paste(combined$species, combined$regions, sep = "|")
    

    We want the color of the links to be the same as the color of regoins

    grid.col = structure(color_regions[combined$regions], names = order)
    

    Since the chord diagram is grouped by species, gaps between species should be larger than inside each species.

    gap = rep(1, length(order))
    gap[which(!duplicated(combined$species, fromLast = TRUE))] = 5
    

    With all settings ready, we now can make the chord diagram:

    In following code, we set preAllocateTracks so that circular lines which represents species will be added afterwards.

    circos.par(gap.degree = gap)
    chordDiagram(df2, order = order, annotationTrack = c("grid", "axis"),
        grid.col = grid.col, directional = TRUE,
        preAllocateTracks = list(
            track.height = 0.04,
            track.margin = c(0.05, 0)
        )
    )
    

    Circular lines are added to represent species:

    for(species in unique(combined$species)) {
        l = combined$species == species
        sn = paste(combined$species[l], combined$regions[l], sep = "|")
        highlight.sector(sn, track.index = 1, col = color_species[species], 
            text = species, niceFacing = TRUE)
    }
    circos.clear()
    

    And the legends for regions and species:

    legend("bottomleft", pch = 15, col = color_regions, 
        legend = names(color_regions), cex = 0.6)
    legend("bottomright", pch = 15, col = color_species, 
        legend = names(color_species), cex = 0.6)
    

    The plot looks like this:

    group_by_species

    Group by regions

    The code is similar that I will not explain it but just attach the code in the post. The plot looks like this:

    group_by_regions

    ## group by regions
    df2 = data.frame(from = paste(df[[1]], df[[3]], sep = "|"),
                     to = paste(df[[2]], df[[3]], sep = "|"),
                     value = df[[4]], stringsAsFactors = FALSE)
    
    combined = unique(data.frame(regions = c(df[[1]], df[[2]]), 
        species = c(df[[3]], df[[3]]), stringsAsFactors = FALSE))
    combined = combined[order(combined$regions, combined$species), ]
    order = paste(combined$regions, combined$species, sep = "|")
    grid.col = structure(color_species[combined$species], names = order)
    
    gap = rep(1, length(order))
    gap[which(!duplicated(combined$species, fromLast = TRUE))] = 5
    
    circos.par(gap.degree = gap)
    chordDiagram(df2, order = order, annotationTrack = c("grid", "axis"),
        grid.col = grid.col, directional = TRUE,
        preAllocateTracks = list(
            track.height = 0.04,
            track.margin = c(0.05, 0)
        )
    )
    for(region in unique(combined$regions)) {
        l = combined$regions == region
        sn = paste(combined$regions[l], combined$species[l], sep = "|")
        highlight.sector(sn, track.index = 1, col = color_regions[region], 
            text = region, niceFacing = TRUE)
    }
    circos.clear()
    
    legend("bottomleft", pch = 15, col = color_regions, 
        legend = names(color_regions), cex = 0.6)
    legend("bottomright", pch = 15, col = color_species, l
        egend = names(color_species), cex = 0.6)