Search code examples
jsonrdataframehierarchical-datahierarchical

Nested, hierarchical data frames in R


I am new to R and I don't want to misunderstand the language and its data structure from the beginning on. :)

My data.frame sample.data contains beside 'normal' attributes (e.g. author) another, nested list of data.frame (files), which has e.g. the attributes extension.

How can I filter for authors who have created files with a certain extension? Is there a R-ic way of doing that? Maybe in this direction:

t <- subset(data, data$files[['extension']] > '.R')

Actually I want to avoid for loops.

Here you can find some sample data:

d1 <- data.frame(extension=c('.py', '.py', '.c++')) # and some other attributes
d2 <- data.frame(extension=c('.R', '.py')) # and some other attributes

sample.data <- data.frame(author=c('author_1', 'author_2'), files=I(list(d1, d2)))

The JSON the sample.data comes from looks like

[
    {
        "author": "author_1",
        "files": [
            {
                "extension": ".py",
                "path": "/a/path/somewhere/"
            },
            {
                "extension": ".c++",
                "path": "/a/path/somewhere/else/"
            }, ...
        ]
    }, ...
]

Solution

  • Interesting, not many people use R to simulate a hierarchical database!

    subset(sample.data, sapply(files, function(df) any(df$extension == ".R")))