Search code examples
azure-data-explorerkql

Is there a way to clone a table in Kusto?


Is there a way to clone a table in Kusto exactly so it has all the extents of the original table? Even if it's not possible to have extents retained , at least is there a performant way to copy a table to a new table. I tried the following:-

.set new_table <| existing_table;

It was running forever and got timeout error. Is there way to copy so the Kusto engine recognizes that this is just a dump copy so instead of using Kusto engine, it will just do a simple blob copy from back-end and simply point the new table to the copied blob thus bypassing the whole Kusto processing route?


Solution

  • 1. Copying schema and data of one table to another is possible using the command you mentioned (another option to copy the data is to export its content into cloud storage, then ingest the result storage artifacts using Kusto's ingestion API or a tool that uses it, e.g. LightIngest or ADF)

    Of course, if the source table has a lot of data, then you would want to split this command into multiple ones, each dealing with a subset of the source data (which you can 'partition', for example, by time).

    Below is just one example (it obviously depends on how much data you have in the source table):

    .set-or-append [async] new_table <| existing_table | where ingestion_time() > X and ingestion_time() < X + 1h
    
    .set-or-append [async] new_table <| existing_table | where ingestion_time() >= X+1h and ingestion_time() < X + 2h
    
    ...
    

    Note that the async is optional, and is to avoid the potential client-side-timeout (default after 10 minutes). the command itself continues to run on the backend for up to a non-configurable timout of 60 mins (though it's strongly advised to avoid such long-running commands, e.g. by performing the "partitioning" mentioned above).

    2. To your other question: There's no option to copy data between tables without re-ingesting the data (an extent / data shard currently can't belong to more than 1 table).

    3. If you need to "duplicate" data being ingestion into table T1 continuously into table T2, and both T1 and T2 are in the same database, you can achieve that using an update policy.