I'm working on a project where I need to import table data from a Parquet file into the Memgraph graph database. My data looks something like this:
+-----------+-------------+---------+------------+--------+
| FirstName | LastName | Country | Occupation | Salary |
+-----------+-------------+---------+------------+--------+
| John | Doe | USA | Engineer | 70000 |
| Jane | Smith | UK | Doctor | 80000 |
| Max | Johnson | Canada | Teacher | 60000 |
| Emily | Davis | Germany | Scientist | 90000 |
| Luke | Rodriguez | France | Artist | 50000 |
+-----------+-------------+---------+------------+--------+
I know that I could convert this to CSV and then use LOAD CSV Cypher clause but this is inconvenient. What can I do?
Memgraph supports Parquet file formats via the PyArrow
package. To import data from a Parquet file into Memgraph, you can use GQLAlchemy.
Once you have GQLAlchemy installed, you can use the ParquetLocalFileSystemImporter
class to import data from a Parquet file. Here's an example:
from gqlalchemy import Memgraph
from gqlalchemy.transformations.importing.loaders import ParquetLocalFileSystemImporter
# Define your data configuration object (parsed_yaml)
# ...
# Create an importer object
importer = ParquetLocalFileSystemImporter(
path="path/to/your/parquet/file",
data_configuration=parsed_yaml,
memgraph=Memgraph()
)
# Import the data
importer.import_data()
You can find more details at https://memgraph.com/docs/gqlalchemy/how-to-guides/table-to-graph-importer.