I'm working on a project to visualize the Tree of Life using JavaFX, based on a dataset from Kaggle: Tree of Life Dataset.
The dataset organizes species into a hierarchical tree structure, including various biological classifications such as domains, kingdoms, phyla, etc., up to individual species. My goal is to create a visual representation of this hierarchy, allowing users to explore the tree interactively.
So far, I've defined three models to represent the data:
My current challenge is correctly assigning hierarchy levels to each node and identifying "clades," significant groupings within the tree that don't necessarily align neatly with the fixed levels of domain, kingdom, etc. This is complicated by the dataset not explicitly marking clades or providing clear biological metrics to determine them.
Here's a simplified version of my TreeNode class:
public class TreeNode {
private String id;
private String name;
private List<TreeNode> children = new ArrayList<>();
private boolean isCluster; // A potential flag for clades?
// Additional fields and methods...
}
And here's where I'm attempting to assign levels in my controller, but I'm stuck on incorporating clades effectively:
private void assignLevel(TreeNode node, int level) {
if (node == null) return;
node.setLevel(level);
node.setHierarchyTag(getHierarchyTag(level));
// A poor attempt to define clusters
if (node.getChildren().size() > THRESHOLD || level == 0) {
node.setCluster(true);
}
// recursively assign level to children
for (TreeNode child : node.getChildren()) {
assignLevel(child, level + 1);
}
}
private String getHierarchyTag(int level) {
// just set the tag
switch (level) {
case 0: return "life";
case 1: return "domain";
case 2: return "kingdom";
case 3: return "phylum";
case 4: return "class";
case 5: return "order";
case 6: return "family";
case 7: return "genus";
case 8: return "species";
default: return "unknown";
}
}
I'm uncertain how to:
I would greatly appreciate any advice on how to approach this problem, especially any strategies for identifying and integrating clades into the hierarchical structure.
Thank you in advance for your help!
Your post asks to many questions. I will attempt to answer Determine which nodes should be considered as clades
. I am not sure this is correct, but I looked up the definition of Clade
, and it is a group of organisms believed to have evolved from a common ancestor, according to the principles of cladistics
. This leads me to believe that nodes with the same Parent
are a Clade
.
I found this and thought I would use it to play around with your data. I reduced the data to a very small set because it produces a big web if it is not reduced. This is my first time using the VisFx
library. My guess is that none
represents null
nodes. Those nodes were probably removed when I reduced the dataset.
In the image, I tried to circle anything that I believe is considered a Clade
. I may have missed some.
I hope this helps.