Search code examples
memgraphdb

How to import table data from Parquet file to Memgraph graph database


I'm working on a project where I need to import table data from a Parquet file into the Memgraph graph database. My data looks something like this:

+-----------+-------------+---------+------------+--------+
| FirstName | LastName    | Country | Occupation | Salary |
+-----------+-------------+---------+------------+--------+
| John      | Doe         | USA     | Engineer   | 70000  |
| Jane      | Smith       | UK      | Doctor     | 80000  |
| Max       | Johnson     | Canada  | Teacher    | 60000  |
| Emily     | Davis       | Germany | Scientist  | 90000  |
| Luke      | Rodriguez   | France  | Artist     | 50000  |
+-----------+-------------+---------+------------+--------+

I know that I could convert this to CSV and then use LOAD CSV Cypher clause but this is inconvenient. What can I do?


Solution

  • Memgraph supports Parquet file formats via the PyArrow package. To import data from a Parquet file into Memgraph, you can use GQLAlchemy.

    Once you have GQLAlchemy installed, you can use the ParquetLocalFileSystemImporter class to import data from a Parquet file. Here's an example:

    from gqlalchemy import Memgraph
    from gqlalchemy.transformations.importing.loaders import ParquetLocalFileSystemImporter
    
    # Define your data configuration object (parsed_yaml)
    # ...
    
    # Create an importer object
    importer = ParquetLocalFileSystemImporter(
        path="path/to/your/parquet/file",
        data_configuration=parsed_yaml,
        memgraph=Memgraph()
    )
    
    # Import the data
    importer.import_data()
    

    You can find more details at https://memgraph.com/docs/gqlalchemy/how-to-guides/table-to-graph-importer.