I noticed that whenever I import the same set of CSV my records are duplicated even if I have skip_unchanged=True. Ideally, I wanted is if the CSV gets imported again it should not import the same records and prevent duplicate but it should update if there are changes to it.
I have this configuration in my resource file
bill_date = fields.Field(
attribute="bill_date", column_name="date", widget=widgets.DateWidget()
)
then import_id_fields = ("account_number",)
I also tried printing the original and instance from the skip_row method but I get this in the logs,
print(f"{getattr(original, "bill_date")} - {getattr(instance, "bill_date")}")
RESULT: None - 2021-06-07
UPDATE
Fixed my issue, I mistakenly added get_instance = False
during one of the test.
This should work fine. What you need to do is to ensure that account_number
is included in the csv feed, and that it can uniquely identify a record in the table you are importing into.
Then, when the import occurs, the logic tries to load the existing record using account_number
and will update the row if it is present, otherwise it creates a new row.
This is documented here, and you can debug the get_or_init_instance()
method if it is not working.
If skip_unchanged
is true, then the logic will compare each field declared in your fields
list, and will not update if there are no changes between the stored data and the imported data.