My code requires creation the file tree of the many file paths as
dir1/file1
dir1/dir2/file2
dir1/dir2/file3
FileTree object visualization example:
dir1
|_file1
|_dir2
|_file2
|_file3
This tree is used for torrent content files visualization in graphical form. It's also used for dynamically show files progress. In a small number subfolders and files it works effectively, but if paths > 10,000 it takes a lot of memory and time (> 4 seconds and 50 MB RAM).
Is there an efficient algorithm for making such a graph? Most critical for me is the graph make speed. An example of algorithm implementation can be written in any language, it doesn't matter for me :-) Thanks in advance.
My Java code for this purpose:
FileTree root = new FileTree(FileTree.ROOT, File.Type.DIR);
FileTree parentTree;
for (String pathToFile : paths) {
parentTree = root;
String[] nodes = FileIOUtils.parsePath(pathToFile); /*String.split(File.separator)*/
for (int i = 0; i < nodes.length; i++) {
/* The last leaf item is a file */
if (i == (nodes.length - 1)) {
parentTree.addChild(new FileTree(nodes[i],
File.Type.FILE, parentTree));
} else {
parentTree.addChild(new FileTree(nodes[i], FileNode.Type.DIR, parentTree));
}
FileTree nextParent = parentTree.getChild(nodes[i]);
/* Skipping leaf nodes */
if (nextParent != null && !nextParent.isFile()) {
parentTree = nextParent;
}
}
}
FileTree class:
public class FileTree {
public static final String ROOT = "/";
/* The name for pointer to the parent node */
public static final String PARENT_DIR = "..";
protected String name;
protected boolean isLeaf;
protected FileTree parent;
protected Map<String, FileTree> children = new LinkedHashMap<>();
public FileTree(String name, int type, FileTree parent) {
this(name, type, parent);
}
public FileTree(String name, int type)
{
this(name, type, null);
}
public FileTree(String name, int type, FileTree parent)
{
this.name = name;
isLeaf = (type == File.Type.FILE);
this.parent = parent;
}
public synchronized void addChild(FileTree node)
{
if (!children.containsKey(node.getName())) {
children.put(node.getName(), node);
}
}
public boolean contains(String name)
{
return children.containsKey(name);
}
public F getChild(String name)
{
return children.get(name);
}
public Collection<FileTree> getChildren()
{
return children.values();
}
public Set<String> getChildrenName()
{
return children.keySet();
}
}
Edit:
It was possible to achieve the speed of creating tree of 1000 subfolders an average of 0.5-1 second (early 30 second).
FileTree root = new BencodeFileTree(FileTree.ROOT, 0L, File.Type.DIR);
FileTree parentTree = root;
/* It allows reduce the number of iterations on the paths with equal beginnings */
String prevPath = "";
/* Sort reduces the returns number to root */
Collections.sort(files);
for (String file : files) {
String path;
/*
* Compare previous path with new path.
* Example:
* prev = dir1/dir2/
* cur = dir1/dir2/file1
* |________|
* equal
*
* prev = dir1/dir2/
* cur = dir3/file2
* |________|
* not equal
*/
if (!prevPath.isEmpty() &&
file.regionMatches(true, 0, prevPath, 0, prevPath.length())) {
/*
* Beginning paths are equal, remove previous path from the new path.
* Example:
* prev = dir1/dir2/
* cur = dir1/dir2/file1
* new = file1
*/
path = file.substring(prevPath.length());
} else {
/* Beginning paths are not equal, return to root */
path = file;
parentTree = root;
}
String[] nodes = FileIOUtils.parsePath(path);
/*
* Remove last node (file) from previous path.
* Example:
* cur = dir1/dir2/file1
* new = dir1/dir2/
*/
prevPath = file.substring(0, file.length() - nodes[nodes.length - 1].length());
/* Iterates path nodes */
for (int i = 0; i < nodes.length; i++) {
if (!parentTree.contains(nodes[i])) {
/* The last leaf item is a file */
parentTree.addChild(makeObject(nodes[i], parentTree,
i == (nodes.length - 1)));
}
FileTree nextParent = parentTree.getChild(nodes[i]);
/* Skipping leaf nodes */
if (!nextParent.isFile()) {
parentTree = nextParent;
}
}
}
The basic algorithm looks good to me, but you are creating a lot of unnecessary FileTree
objects when you call addChild
that will be immediately thrown away in the (common) case they already exist. You could try passing in the parameters to the constructor and only construct the object if it needs to be inserted:
public synchronized void addChild(String name, int type, FileTree parent)
{
if (!children.containsKey(name)) {
children.put(name, new FileTree(name, type, parent));
}
}
and:
if (i == (nodes.length - 1)) {
parentTree.addChild(nodes[i], File.Type.FILE, parentTree);
} else {
parentTree.addChild(nodes[i], FileNode.Type.DIR, parentTree);
}
It might not be necessary to pass in parentTree
: you can just construct it with this
.
Another optimization could be to maintain the array of String objects (and associated FileTree nodes) from the previous path that you processed, and scan along until you find an entry that is different to the previous one before adding children.