Search code examples
redisload-csvredisgraph

redisgraph-bulk-loader issues with huge data in csv file


Below are the few issues I am getting when I am trying to upload a file with around one million records. Help me on resolving the issues. When I am try to find the solution in blogs, all are suggesting to modify some logic. But I am using redisgraph-bulk-loader utility directly.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 3565: invalid continuation byte

_csv.Error: line contains NULL byte

Not working if column value contains quotes and comma.

redisgraph_bulk_loader.bulk_insert.CSVError: /home/ec2-user/test.csv:2 Expected 4 columns, encountered 5 ('1,3,4,"5,6"')

GraphName should be unique always for each new upload. In this case, if I want to add some more nodes to same graph or if I want establish relationships from some other file how to achieve this.


Solution

  • As answered in the issue you opened on the redisgraph-bulk-loader repository:

    Not working if column value contains quotes and comma.

    This may be resolvable by using the --quote argument to change input-quoting behavior. The next suggestion would render this unnecessary, however.

    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 3565: invalid continuation byte _csv.Error: line contains NULL byte

    These may also be problems with type inference logic. You may wish to try using an updated branch (to be merged soon) that introduces enforced schema; this will solve your first problem as well. git checkout improve-loader-logic And updated your header rows as described in the updated branch's docs.

    If this does not resolve your issues, you may need to look deeper into encoding problems.

    GraphName should be unique always for each new upload. In this case, if I want to add some more nodes to same graph or if I want establish relationships from some other file how to achieve this.

    The bulk loader is a one-time tool, and currently all updates to existing graphs must be made using Cypher queries.