I am finding myself a bit stuck, I want to show flows between regions for trafficked species via a chord diagram on circlize, but am unable to work out how to plot when column 1 and 2 represent the "connection", column 3 is the "factor" of interest and column 4 are the values. I have included a sample of data below (yes I am aware indonesia is a region), as you can see each species is not unique to a particular region. I would like to produce a plot similar to the one included below but replace the "countries" with "species" for each region. Is this possible to do?
import_region export_region species flow
North America Europe Acanthosaura armata 0.0104
Southeast Asia Europe Acanthosaura armata 0.0022
Indonesia Europe Acanthosaura armata 0.1971
Indonesia Europe Acrochordus granulatus 0.7846
Southeast Asia Europe Acrochordus granulatus 0.1101
Indonesia Europe Acrochordus javanicus 2.00E-04
Southeast Asia Europe Acrochordus javanicus 0.0015
Indonesia North America Acrochordus javanicus 0.0024
East Asia Europe Acrochordus javanicus 0.0028
Indonesia Europe Ahaetulla prasina 4.00E-04
Southeast Asia Europe Ahaetulla prasina 4.00E-04
Southeast Asia East Asia Amyda cartilaginea 0.0027
Indonesia East Asia Amyda cartilaginea 5.00E-04
Indonesia Europe Amyda cartilaginea 0.004
Indonesia Southeast Asia Amyda cartilaginea 0.0334
Europe North America Amyda cartilaginea 4.00E-04
Indonesia North America Amyda cartilaginea 0.1291
Southeast Asia Southeast Asia Amyda cartilaginea 0.0283
Indonesia West Asia Amyda cartilaginea 0.7614
South Asia Europe Amyda cartilaginea 2.8484
Australasia Europe Apodora papuana 0.0368
Indonesia North America Apodora papuana 0.324
Indonesia Europe Apodora papuana 0.0691
Europe Europe Apodora papuana 0.0106
Indonesia East Asia Apodora papuana 0.0129
Europe North America Apodora papuana 0.0034
East Asia East Asia Apodora papuana 2.00E-04
Indonesia Southeast Asia Apodora papuana 0.0045
East Asia North America Apodora papuans 0.0042
example of diagram similar to what I would like, please click link below: chord diagram
In circlize package, the ChordDiagram()
function only allows a "from" column, a "to" column and a optional "value" column. However, in your case, actually we can make some transformation for the original data frame to modify it into a three-column data frame.
In you example, you want to distinguish e.g. Acanthosaura_armata in North America from Acanthosaura_armata in Europe, one solution is to merge region names and species names such as Acanthosaura_armata|North_America
to form a unique identifier. Next I will demonstrate how to visualize this dataset by circlize package.
Read in the data. Note I replaced space with underscores.
df = read.table(textConnection(
"import_region export_region species flow
North_America Europe Acanthosaura_armata 0.0104
Southeast_Asia Europe Acanthosaura_armata 0.0022
Indonesia Europe Acanthosaura_armata 0.1971
Indonesia Europe Acrochordus_granulatus 0.7846
Southeast_Asia Europe Acrochordus_granulatus 0.1101
Indonesia Europe Acrochordus_javanicus 2.00E-04
Southeast_Asia Europe Acrochordus_javanicus 0.0015
Indonesia North_America Acrochordus_javanicus 0.0024
East_Asia Europe Acrochordus_javanicus 0.0028
Indonesia Europe Ahaetulla_prasina 4.00E-04
Southeast_Asia Europe Ahaetulla_prasina 4.00E-04
Southeast_Asia East_Asia Amyda_cartilaginea 0.0027
Indonesia East_Asia Amyda_cartilaginea 5.00E-04
Indonesia Europe Amyda_cartilaginea 0.004
Indonesia Southeast_Asia Amyda_cartilaginea 0.0334
Europe North_America Amyda_cartilaginea 4.00E-04
Indonesia North_America Amyda_cartilaginea 0.1291
Southeast_Asia Southeast_Asia Amyda_cartilaginea 0.0283
Indonesia West_Asia Amyda_cartilaginea 0.7614
South_Asia Europe Amyda_cartilaginea 2.8484
Australasia Europe Apodora_papuana 0.0368
Indonesia North_America Apodora_papuana 0.324
Indonesia Europe Apodora_papuana 0.0691
Europe Europe Apodora_papuana 0.0106
Indonesia East_Asia Apodora_papuana 0.0129
Europe North_America Apodora_papuana 0.0034
East_Asia East_Asia Apodora_papuana 2.00E-04
Indonesia Southeast_Asia Apodora_papuana 0.0045
East_Asia North_America Apodora_papuans 0.0042"),
header = TRUE, stringsAsFactors = FALSE)
Also, I removed some rows which have very tiny values.
df = df[df[[4]] > 0.01, ]
Assign colors for species and regions.
library(circlize)
library(RColorBrewer)
all_species = unique(df[[3]])
color_species = structure(brewer.pal(length(all_species), "Set1"), names = all_species)
all_regions = unique(c(df[[1]], df[[2]]))
color_regions = structure(brewer.pal(length(all_regions), "Set2"), names = all_regions)
First I will demonstrate how to group the chord diagram by species.
As mentioned before, we use species|region
as unique identifier.
df2 = data.frame(from = paste(df[[3]], df[[1]], sep = "|"),
to = paste(df[[3]], df[[2]], sep = "|"),
value = df[[4]], stringsAsFactors = FALSE)
Next we adjust the order of all sectors to first order by species, then by regions.
combined = unique(data.frame(regions = c(df[[1]], df[[2]]),
species = c(df[[3]], df[[3]]), stringsAsFactors = FALSE))
combined = combined[order(combined$species, combined$regions), ]
order = paste(combined$species, combined$regions, sep = "|")
We want the color of the links to be the same as the color of regoins
grid.col = structure(color_regions[combined$regions], names = order)
Since the chord diagram is grouped by species, gaps between species should be larger than inside each species.
gap = rep(1, length(order))
gap[which(!duplicated(combined$species, fromLast = TRUE))] = 5
With all settings ready, we now can make the chord diagram:
In following code, we set preAllocateTracks
so that circular lines which represents species will be added afterwards.
circos.par(gap.degree = gap)
chordDiagram(df2, order = order, annotationTrack = c("grid", "axis"),
grid.col = grid.col, directional = TRUE,
preAllocateTracks = list(
track.height = 0.04,
track.margin = c(0.05, 0)
)
)
Circular lines are added to represent species:
for(species in unique(combined$species)) {
l = combined$species == species
sn = paste(combined$species[l], combined$regions[l], sep = "|")
highlight.sector(sn, track.index = 1, col = color_species[species],
text = species, niceFacing = TRUE)
}
circos.clear()
And the legends for regions and species:
legend("bottomleft", pch = 15, col = color_regions,
legend = names(color_regions), cex = 0.6)
legend("bottomright", pch = 15, col = color_species,
legend = names(color_species), cex = 0.6)
The plot looks like this:
The code is similar that I will not explain it but just attach the code in the post. The plot looks like this:
## group by regions
df2 = data.frame(from = paste(df[[1]], df[[3]], sep = "|"),
to = paste(df[[2]], df[[3]], sep = "|"),
value = df[[4]], stringsAsFactors = FALSE)
combined = unique(data.frame(regions = c(df[[1]], df[[2]]),
species = c(df[[3]], df[[3]]), stringsAsFactors = FALSE))
combined = combined[order(combined$regions, combined$species), ]
order = paste(combined$regions, combined$species, sep = "|")
grid.col = structure(color_species[combined$species], names = order)
gap = rep(1, length(order))
gap[which(!duplicated(combined$species, fromLast = TRUE))] = 5
circos.par(gap.degree = gap)
chordDiagram(df2, order = order, annotationTrack = c("grid", "axis"),
grid.col = grid.col, directional = TRUE,
preAllocateTracks = list(
track.height = 0.04,
track.margin = c(0.05, 0)
)
)
for(region in unique(combined$regions)) {
l = combined$regions == region
sn = paste(combined$regions[l], combined$species[l], sep = "|")
highlight.sector(sn, track.index = 1, col = color_regions[region],
text = region, niceFacing = TRUE)
}
circos.clear()
legend("bottomleft", pch = 15, col = color_regions,
legend = names(color_regions), cex = 0.6)
legend("bottomright", pch = 15, col = color_species, l
egend = names(color_species), cex = 0.6)