Search code examples
gomemory-leaksgarbage-collectionpass-by-referenceflatten

How to efficiently flatten a map?


My app needs to take a large array of nested instances of map[string]interface{} and turn them into flattened maps.

For example:

{
                "foo": {
                    "jim":"bean"
                },
                "fee": "bar",
                "n1": {
                    "alist": [
                        "a",
                        "b",
                        "c",
                        {
                            "d": "other",
                            "e": "another"
                        }
                    ]
                },
                "number": 1.4567,
                "bool":   true
}

After:

json.Unmarshal([]byte(input), &out)
result = Flatten(m.(map[string]interface{}))

Becomes:

{
                "foo.jim":      "bean",
                "fee":          "bar",
                "n1.alist.0":   "a",
                "n1.alist.1":   "b",
                "n1.alist.2":   "c",
                "n1.alist.3.d": "other",
                "n1.alist.3.e": "another",
                "number":       1.4567,
                "bool":         true,
}

Currently I am using the below code:

func Flatten(m map[string]interface{}) map[string]interface{} {
    o := map[string]interface{}{}
    for k, v := range m {
        switch child := v.(type) {
        case map[string]interface{}:
            nm := Flatten(child)
            for nk, nv := range nm {
                o[k+"."+nk] = nv
            }
        case []interface{}:
            for i := 0; i < len(child); i++ {
                o[k+"."+strconv.Itoa(i)] = child[i]
            }
        default:
            o[k] = v
        }
    }
    return o
}

This code is very memory inefficient, when flatting ~800 maps, ~8 GiB memory is used. I have tried passing pointers instead of copies, but that does not change the memory usage. Do I have to do some type of manual memory management? Go should garbage collect any unused memory, but it might be that I am not releasing it somehow.


Solution

  • As Andy Schweig suggested in the comments passing in a map which is updated as you iterate through the data should significantly reduce memory usage (note: have not tested). Something like:

    func Flatten2(prefix string, src map[string]interface{}, dest map[string]interface{}) {
        if len(prefix) > 0 {
            prefix += "."
        }
        for k, v := range src {
            switch child := v.(type) {
            case map[string]interface{}:
                Flatten2(prefix+k, child, dest)
            case []interface{}:
                for i := 0; i < len(child); i++ {
                    dest[prefix+k+"."+strconv.Itoa(i)] = child[i]
                }
            default:
                dest[prefix+k] = v
            }
        }
    }
    

    Try this in the playground.

    Do I have to do some type of manual memory management?

    The Go runtime is designed to handle memory management for you (see the FAQ article on garbage collection). While the garbage collector is very good it can only free memory that is not in use and your original routine created a lot of maps as it went which could not be freed until that level of the map had been processed.

    There are a few flags you can set through environmental variables or using the debug package but its rare that these are needed (and they can have unintended consequences).

    Any suggestions on where to further optimize this function?

    Its difficult to comment without access to your dataset along with an understanding of your requirements and info on how you are measuring memory usage (this can be more complicated than it appears). Profiling your application may give you some pointers on areas for improvement. One possible improvement would be to use a pre-allocated slice rather than a map for dest (but I don't know what you are doing with the result so this may not be appropriate).