Search code examples
pythonazureazure-storage

Conflict accessing twice in azure storage table simultaneously


I've been running a script to retrieve data from an Azure storage table (such as this one as a reference) and copy it in another table from the same storage account without problem.

Now, the issue came when I tried to access this latter table to run some calculations and copy that in another table from the same storage account. This script returned the following error:

AzureConflictHttpError: Conflict
{"odata.error":{"code":"EntityAlreadyExists","message":{"lang":"en-US","value":"The specified entity already exists.\nRequestId:57d9b721-6002-012d-3d0c-b88bef000000\nTime:2019-01-29T19:55:53.5984026Z"}}}

At the same time, however, the code I was running previously also stopped printing the same error and won't start again even if no code is run, returning the previous error over and over again.

Is there any way to access the same API in azure storage multiple times?

UPDATE

Adding the source code, sorry for not having done that before. Basically the 2 codes I'm running in parallel are the same but with different filters; on this one I'm taking the data from Table 1 (which has a row per second) and I'm averaging these numbers per minute to add a row to Table 2, and on the other script I'm taking data from this Table 2 to average these rows per minute to a 5-minute average row in another Table 3, so basically a few parameters change but the code is basically the same.

There will be a third script, slightly different to these 2, but will take Table 2 as the input source, run other calculations and paste the results in a new row per minute in a future Table 4, so in general my idea is to have multiple entries to multiple tables at the same time to build new specific tables.

import datetime
import time
from azure.storage.table import TableService, Entity

delta_time = '00:01:00'
retrieve_time = '00:10:00'
start_time = '08:02:00'
utc_diff = 3

table_service = TableService(account_name='xxx', account_key='yyy')

while True:
    now_time = datetime.datetime.now().strftime("%H:%M:%S") 
    now_date = datetime.datetime.now().strftime("%d-%m-%Y")
    hour = datetime.datetime.now().hour

    if hour >= 21:
        now_date = (datetime.datetime.now() + datetime.timedelta(days=1)).strftime("%d-%m-%Y")

    retrieve_max = (datetime.datetime.now() + datetime.timedelta(hours=utc_diff)+ datetime.timedelta(minutes=-10)).strftime("%H:%M:%S")

    start_diff = datetime.datetime.strptime(now_time, '%H:%M:%S') - datetime.datetime.strptime(start_time, '%H:%M:%S') + datetime.timedelta(hours=utc_diff)
    if start_diff.total_seconds() > 0:

        query = "PartitionKey eq '"+str(now_date)+"' and RowKey ge '"+str(retrieve_max)+"'"
        tasks=table_service.query_entities('Table1',query)
        iqf_0 = []

        for task in tasks:
            if task.Name == "IQF_0":
                iqf_0.append([task.RowKey, task.Area])  

        last_time = iqf_0[len(iqf_0)-1][0]
        time_max = datetime.datetime.strptime(last_time, '%H:%M:%S') - datetime.datetime.strptime(delta_time, '%H:%M:%S') #+ datetime.timedelta(hours=utc_diff)
        area = 0.0
        count = 0
        for i in range(len(iqf_0)-1, -1, -1):
            diff = datetime.datetime.strptime(last_time, '%H:%M:%S') - datetime.datetime.strptime(iqf_0[i][0], '%H:%M:%S')
            if diff.total_seconds() < 60:
                area += iqf_0[i][1]
                count += 1
            else: 
                break
        area_average = area/count

        output_row = Entity()
        output_row.PartitionKey = now_date
        output_row.RowKey = last_time
        output_row.Name = task.Name
        output_row.Area = area_average
        table_service.insert_entity('Table2', output_row)

        date_max = datetime.datetime.now() + datetime.timedelta(days=-1)
        date_max = date_max.strftime("%d-%m-%Y")
        query = "PartitionKey eq '"+str(date_max)+"' and RowKey ge '"+str(retrieve_max)+"'"
        tasks=table_service.query_entities('Table2',query)

        for task in tasks:
            diff = datetime.datetime.strptime(now_time, '%H:%M:%S') - datetime.datetime.strptime(task.RowKey, '%H:%M:%S') + datetime.timedelta(hours=utc_diff)
            print(i, datetime.datetime.strptime(now_time, '%H:%M:%S'), datetime.datetime.strptime(task.RowKey, '%H:%M:%S'), diff.total_seconds())
            if task.PartitionKey == date_max and diff.total_seconds()>0:
                table_service.delete_entity('Table2', task.PartitionKey, task.RowKey)

        time.sleep(60 - time.time() % 60)

Solution

  • It sounds like you were running two codes to copy data in a same Azure Storage Accout from Table 1 to Table 2 to Table 3 at the same time. Per my experience, the issue was normally caused by writing a data record (a Table Entity) concurrently at the same time, or using the incorrect method for an existing Entity, which is resource competition issue for writing.

    It's a common Table Service Error, you can find it at here. enter image description here

    And there is a document Inserting and Updating Entities which explains the differences of the operation effect between the functions Insert Entity, Update Entity, Merge Entity, Insert Or Merge Entity, and Insert Or Replace Entity.

    Now, your code did not shared for us. Considering for all possible cases, there are three solutions to fix the issue.

    1. Run your two codes one after another in order of copying data between different tables, not concurrently.

    2. Using the correct function to update data for an existing entity, you can refer to the document above and the similar SO thread Add or replace entity in Azure Table Storage.

    3. To use a global lock for a unique primary key of a Table Entity to avoid operating the same Table Entity concurrently in two codes at the same time.