Search code examples
h2o

H2O Frame constructed from multiple parts


My training frame is rather large, so I'd like to import them in a way similar to S3's multipart upload. Is the correct way to do this to manually import_file for all the parts, then call rbind on all of these parts? Or is there a more correct way or built-in of doing this?


Solution

  • the function h2o.import_file can handle import from multiple files on it's own. This works both in Python and R.

    Python:

        data = h2o.import_file(["/home/some/path/to/airliens/airline1.csv",
                                    "/home/some/path/to/airliens/airline2.csv"])
    

    R:

    data = h2o.importFile(c("/home/some/path/to/airliens/airline1.csv",
                                    "/home/some/path/to/airliens/airline2.csv"))