Search code examples
cstringduckdb

How to extract VARCHAR from DuckDB by C API?


I am developing with Duckdb C API. As recommended, I use data chunk and vector to extract value from the database. In duckdb, string is stored as VARCHAR. But I can't extract it and turn to string.

Infomations

duckdb version: 0.8.1

with C API

When I tried to extract value with the type VARCHAR. I use the func duckdb_vector_get_data to get the internal data pointer(which is a void*). But I found this pointer actually points at something that is in VARCHAR format(or duckdb_string_t in duckdb c api). that's an structure of uint32_t length and char inlined[12]. The full string seems to have been compressed into that 12 long char array. So I can't get the real internal string by just convert the pointer to char* or char**. Here's my code.

// start and connect a database befohand
// use func duckdb_query to execute sql statements.
if (duckdb_query(con, "CREATE TABLE strings(i VARCHAR(255));", NULL) == DuckDBError) {
        fprintf(stderr, "Failed to query database1\n");
    goto cleanup;
}
if (duckdb_query(con, "INSERT INTO strings VALUES ('aaaaa'), ('abcdrfghikjlmn'), ('bbbbbbbbbbbb');", NULL) == DuckDBError) {
    fprintf(stderr, "Failed to query database2\n");
    goto cleanup;
}
if (duckdb_query(con, "SELECT * FROM strings", &result) == DuckDBError) {
    fprintf(stderr, "Failed to query database3\n");
    goto cleanup;
}
// use data chunk and vector to extract data
duckdb_data_chunk res_chunk = duckdb_result_get_chunk(result,0);
duckdb_vector vec = duckdb_data_chunk_get_vector(res_chunk,0);
duckdb_logical_type type_l = duckdb_vector_get_column_type(vec);
duckdb_type type = duckdb_get_type_id(type_l);
printf("type id: %d\n", type);
void* pdata = duckdb_vector_get_data(vec);
//the I convert the void* to char*,trying to figure out the real data
char* data = (char*)pdata;
for(idx_t i = 0; i < 48 ; i ++) {
    printf("%d ", *(data+i));
    if((i+1)% 16 == 0)   printf("\n");
    // i will explain this later.
}

here's the result after running. result imag here

type id: 17
5 0 0 0 97 97 97 97 97 0 0 0 0 0 0 0
14 0 0 0 97 98 99 100 -112 -52 115 0 0 0 0 0
12 0 0 0 98 98 98 98 98 98 98 98 98 98 98 98

It's for sure that type id 17 indicates type DUCKDB_TYPE_VARCHAR. The internal data, it looks like every 16 chars form a 'group'. In one group, the first four indicates the length of string, the last 12 stores the real data. But when the length of string over 12. the string seems to be compressed into 6 nums. which pretty like the duckdb type duckdb_string_t refrence

I haven't found any func which can get the real string in VARCHAR format in duckdb's documantation up to now. Am I neglecting or mistaking anything when extracting the VARCHAR? or is there any func can turn VARCHAR to string? Hope someone succeeded extracting VARCHAR by data chunk and vector can help me.


Solution

  • The part you've missed is what the duckdb_string_t struct actually is - it's a way to save allocations for every string used.

    For short strings (<= 12 chars at the moment), we avoid the extra allocation and embed the value inside the struct. You can determine this with the duckdb_string_is_inlined function, if that's available in the DuckDB version you're using. If it isn't inline, the char array should be cast to a char*, which will point to the actual string, allocated elsewhere.

    Hope that helps!