Search code examples
python-3.xazureazure-sdk-python

Azure Python SDK data tables


I need help to get through this workflow. I have 2 storage accounts which I name storage1 and storage2

storage1 contrains a list of tables with some data in, and I would like to loop through all those tables, copy their content into storage2. I tried with azCopy but I had no luck as this feature is available only in azCopy v7.3 and I couldn't find this version for MacOs M1. The other option is Data factory but its too complex for what I want to achieve. So I decided to go with azure Python sdk.

As a library I am using azure.data.tables import TableServiceClient

The code I wrote looks like this:

from azure.data.tables import TableServiceClient
my_conn_str_out = 'storage1-Conn-Str'

table_service_client_out = TableServiceClient.from_connection_string(my_conn_str_out)
list_table = []
for table in table_service_client_out.list_tables():
    list_table.append(table.table_name)

my_conn_str_in = 'Storage2-Conn-str'

table_service_client_in = TableServiceClient.from_connection_string(my_conn_str_in)
for new_tables in table_service_client_out.list_tables():
    table_service_client_in.create_table_if_not_exists(new_tables.table_name)
    print(f'tables created successfully {new_tables.table_name}')

this is how I structured my code.

for table in table_service_client_out.list_tables():
    list_table.append(table.table_name)

I loop through all my tables in the storage account and append them into a list.

then:

for new_tables in table_service_client_out.list_tables():
    table_service_client_in.create_table_if_not_exists(new_tables.table_name)
    print(f'tables created successfully {new_tables.table_name}')

I create the same table in the storage2

So far everything works just fine.

What I would like to achieve now, is to query all the data in each table in storage1 and pass it to the respective table in storage2

According to Microsoft documentation I can achieve the query table using this:

query = table_service_client_out.query_tables(filter=table)

so I integrated this in my loop like this:

for table in table_service_client_out.list_tables():
    query = table_service_client_out.query_tables(filter=table)
    list_table.append(table.table_name)
    print(query)

When I run my python code, I get back the memory allocation of the query and not the data in the tables:

<iterator object azure.core.paging.ItemPaged at 0x7fcd90c8fbb0>
<iterator object azure.core.paging.ItemPaged at 0x7fcd90c8f7f0>
<iterator object azure.core.paging.ItemPaged at 0x7fcd90c8fd60>

I was wondering if there is a way how I can query all the data in my tables and pass them to my storage2


Solution

  • Try this :

    from azure.cosmosdb.table.tableservice import TableService,ListGenerator
    
    table_service_out = TableService(account_name='', account_key='')
    table_service_in = TableService(account_name='', account_key='')
    
    #query 100 items per request, in case of consuming too much menory load all data in one time
    query_size = 100
    
    #save data to storage2 and check if there is lefted data in current table,if yes recurrence
    def queryAndSaveAllDataBySize(tb_name,resp_data:ListGenerator ,table_out:TableService,table_in:TableService,query_size:int):
        for item in resp_data:
            #remove etag and Timestamp appended by table service
            del item.etag
            del item.Timestamp
            print("instet data:" + str(item) + "into table:"+ tb_name)
            table_in.insert_entity(tb_name,item)
        if resp_data.next_marker:
            data = table_out.query_entities(table_name=tb_name,num_results=query_size,marker=resp_data.next_marker)
            queryAndSaveAllDataBySize(tb_name,data,table_out,table_in,query_size)
    
    
    tbs_out = table_service_out.list_tables()
    
    for tb in tbs_out:
        #create table with same name in storage2
        table_service_in.create_table(tb.name)
        #first query 
        data = table_service_out.query_entities(tb.name,num_results=query_size)
        queryAndSaveAllDataBySize(tb.name,data,table_service_out,table_service_in,query_size)
    

    Of course, this is a simple demo for your requirement.For more efficiency, you can also query table data by partition key and commit them by batch

    Let me know if you have any more questions.