Search code examples
databasetime-seriesquestdb

Some data is not written into QuestDB when using the ILP Python client


I am experiencing a weird behavior with the ILP Python client. Parts of my data doesn’t get written to the database for no obvious reason. Relevant code below:

def send_data(questdb: c_questdb, table_name: str, timestamp_column_name: str, data: pd.DataFrame) -> None:
    """Write dataframe to QuestDB table."""
    conf = (
        f"tcp::addr={questdb['host']}:{questdb['ilp_port']};"
        f"username={questdb['ilp_json_web_key_kid']};"
        f"token={questdb['ilp_json_web_key_d']};"
        f"token_x={questdb['ilp_json_web_key_x']};"
        f"token_y={questdb['ilp_json_web_key_y']};"
    )
    with Sender.from_conf(conf) as sender:
        sender.dataframe(data, table_name=table_name, at=timestamp_column_name)

def main(questdb: c_questdb, start_date: str, end_date: str):
...
    try:
        # Write data to QuestDB with error handling
        print("Writing raw weather data to database...")
        print(all_dfs)
        with ThreadPoolExecutor() as executor:
            futures = []
            for df in all_dfs:
                future = executor.submit(send_data, questdb, "weather_raw_data", "DATE", df)
                futures.append(future)
            for future in futures:
                try:
                    future.result()
                except Exception as e:
                    print(f"Error writing raw data: {str(e)}")

        print("Writing DJU data to database...")
        with ThreadPoolExecutor() as executor:
            futures = []
            for df in all_dju_dfs:
                future = executor.submit(send_data, questdb, "weather_dju", "date", df)
                futures.append(future)
            for future in futures:
                try:
                    future.result()
                except Exception as e:
                    print(f"Error writing DJU data: {str(e)}")

        print("All data processed successfully!")

    except requests.exceptions.HTTPError as e:
        print(f"API Error: {str(e)}")
    except ValueError as e:
        print(f"Date Error: {str(e)}")

Here is all_dfs (a list of 9 dataframes) printed in Python:

6   69029001 2025-01-04 01:00:00  0.0     0  ...     0      NaN   NaN   NaN
7   69029001 2025-01-04 02:00:00  0.0     0  ...     0      NaN   NaN   NaN
8   69029001 2025-01-04 03:00:00  0.0     0  ...     0      NaN   NaN   NaN
9   69029001 2025-01-04 04:00:00  0.0     0  ...     0      NaN   NaN   NaN
10  69029001 2025-01-04 05:00:00  0.0     0  ...     0      NaN   NaN   NaN
11  69029001 2025-01-04 06:00:00  0.0     0  ...     0      NaN   NaN   NaN
12  69029001 2025-01-04 07:00:00  0.0     0  ...     0      NaN   NaN   NaN
13  69029001 2025-01-04 08:00:00  0.0     0  ...     0      NaN   NaN   NaN
14  69029001 2025-01-04 09:00:00  0.0     0  ...     0      NaN   NaN   NaN
15  69029001 2025-01-04 10:00:00  0.0     0  ...     1      NaN   NaN   NaN
16  69029001 2025-01-04 11:00:00  0.0     0  ...     0      NaN   NaN   NaN
17  69029001 2025-01-04 12:00:00  0.0     0  ...     0      NaN   NaN   NaN
18  69029001 2025-01-04 13:00:00  0.0     0  ...     0      NaN   NaN   NaN
19  69029001 2025-01-04 14:00:00  0.0     0  ...     0      NaN   NaN   NaN
20  69029001 2025-01-04 15:00:00  0.0     0  ...     0      NaN   NaN   NaN
21  69029001 2025-01-04 16:00:00  0.0    16  ...     0      NaN   NaN   NaN
22  69029001 2025-01-04 17:00:00  1.2    49  ...     0      NaN   NaN   NaN
23  69029001 2025-01-04 18:00:00  0.6    60  ...     0      NaN   NaN   NaN
24  69029001 2025-01-04 19:00:00  1.2    60  ...     0      NaN   NaN   NaN
25  69029001 2025-01-04 20:00:00  0.2    59  ...     0      NaN   NaN   NaN
26  69029001 2025-01-04 21:00:00  0.2    56  ...     0      NaN   NaN   NaN
27  69029001 2025-01-04 22:00:00  0.0    15  ...     0      NaN   NaN   NaN
28  69029001 2025-01-04 23:00:00  0.0     0  ...     0      NaN   NaN   NaN
29  69029001 2025-01-05 00:00:00  0.0    22  ...     0      NaN   NaN   NaN
30  69029001 2025-01-05 01:00:00  0.2    60  ...     0      NaN   NaN   NaN
31  69029001 2025-01-05 02:00:00  1.0    60  ...     0      NaN   NaN   NaN
32  69029001 2025-01-05 03:00:00  0.2    59  ...     0      NaN   NaN   NaN
33  69029001 2025-01-05 04:00:00  0.0    60  ...     0      NaN   NaN   NaN
34  69029001 2025-01-05 05:00:00  0.0    47  ...     0      NaN   NaN   NaN
35  69029001 2025-01-05 06:00:00  0.0     2  ...     0      NaN   NaN   NaN
36  69029001 2025-01-05 07:00:00  0.0     0  ...     0      NaN   NaN   NaN
[37 rows x 99 columns],        POSTE                DATE  RR1  DRR1  ...  INS2  INFRAR2  DIR2  DIF2
0   73304005 2025-01-03 19:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
1   73304005 2025-01-03 20:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
2   73304005 2025-01-03 21:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
3   73304005 2025-01-03 22:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
4   73304005 2025-01-03 23:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
5   73304005 2025-01-04 00:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
6   73304005 2025-01-04 01:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
7   73304005 2025-01-04 02:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
8   73304005 2025-01-04 03:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
9   73304005 2025-01-04 04:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
10  73304005 2025-01-04 05:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
11  73304005 2025-01-04 06:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
12  73304005 2025-01-04 07:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
13  73304005 2025-01-04 08:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
14  73304005 2025-01-04 09:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
15  73304005 2025-01-04 10:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
16  73304005 2025-01-04 11:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
17  73304005 2025-01-04 12:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
18  73304005 2025-01-04 13:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
19  73304005 2025-01-04 14:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
20  73304005 2025-01-04 15:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
21  73304005 2025-01-04 16:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
22  73304005 2025-01-04 17:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
23  73304005 2025-01-04 18:00:00  0.2   NaN  ...   NaN      NaN   NaN   NaN
24  73304005 2025-01-04 19:00:00  0.2   NaN  ...   NaN      NaN   NaN   NaN
25  73304005 2025-01-04 20:00:00  0.6   NaN  ...   NaN      NaN   NaN   NaN
26  73304005 2025-01-04 21:00:00  1.4   NaN  ...   NaN      NaN   NaN   NaN
27  73304005 2025-01-04 22:00:00  1.2   NaN  ...   NaN      NaN   NaN   NaN
28  73304005 2025-01-04 23:00:00  0.4   NaN  ...   NaN      NaN   NaN   NaN
29  73304005 2025-01-05 00:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
30  73304005 2025-01-05 01:00:00  0.6   NaN  ...   NaN      NaN   NaN   NaN
31  73304005 2025-01-05 02:00:00  0.8   NaN  ...   NaN      NaN   NaN   NaN
32  73304005 2025-01-05 03:00:00  1.0   NaN  ...   NaN      NaN   NaN   NaN
33  73304005 2025-01-05 04:00:00  0.4   NaN  ...   NaN      NaN   NaN   NaN
34  73304005 2025-01-05 05:00:00  0.4   NaN  ...   NaN      NaN   NaN   NaN
35  73304005 2025-01-05 06:00:00  0.2   NaN  ...   NaN      NaN   NaN   NaN
36  73304005 2025-01-05 07:00:00  0.0   NaN  ...   NaN      NaN   NaN   NaN
[37 rows x 99 columns],        POSTE                DATE  RR1  DRR1  ...  INS2  INFRAR2  DIR2  DIF2
0   73329001 2025-01-03 19:00:00  0.0     0  ...     0      NaN   NaN   NaN
1   73329001 2025-01-03 20:00:00  0.0     0  ...     0      NaN   NaN   NaN
2   73329001 2025-01-03 21:00:00  0.0     0  ...     0      NaN   NaN   NaN
3   73329001 2025-01-03 22:00:00  0.0     0  ...     0      NaN   NaN   NaN
4   73329001 2025-01-03 23:00:00  0.0     0  ...     0      NaN   NaN   NaN
5   73329001 2025-01-04 00:00:00  0.0     0  ...     0      NaN   NaN   NaN
6   73329001 2025-01-04 01:00:00  0.0     0  ...     0      NaN   NaN   NaN
7   73329001 2025-01-04 02:00:00  0.0     0  ...     0      NaN   NaN   NaN
8   73329001 2025-01-04 03:00:00  0.0     0  ...     0      NaN   NaN   NaN
9   73329001 2025-01-04 04:00:00  0.0     0  ...     0      NaN   NaN   NaN
10  73329001 2025-01-04 05:00:00  0.0     0  ...     0      NaN   NaN   NaN
11  73329001 2025-01-04 06:00:00  0.0     0  ...     0      NaN   NaN   NaN
12  73329001 2025-01-04 07:00:00  0.0     0  ...     0      NaN   NaN   NaN
13  73329001 2025-01-04 08:00:00  0.0     0  ...     0      NaN   NaN   NaN
14  73329001 2025-01-04 09:00:00  0.0     0  ...    60      NaN   NaN   NaN
15  73329001 2025-01-04 10:00:00  0.0     0  ...    60      NaN   NaN   NaN
16  73329001 2025-01-04 11:00:00  0.0     0  ...    25      NaN   NaN   NaN
17  73329001 2025-01-04 12:00:00  0.0     0  ...     4      NaN   NaN   NaN
18  73329001 2025-01-04 13:00:00  0.0     0  ...     0      NaN   NaN   NaN
19  73329001 2025-01-04 14:00:00  0.0     0  ...     0      NaN   NaN   NaN
20  73329001 2025-01-04 15:00:00  0.0     0  ...     0      NaN   NaN   NaN
21  73329001 2025-01-04 16:00:00  0.0     0  ...     0      NaN   NaN   NaN
22  73329001 2025-01-04 17:00:00  0.2    26  ...     0      NaN   NaN   NaN
23  73329001 2025-01-04 18:00:00  1.4    60  ...     0      NaN   NaN   NaN
24  73329001 2025-01-04 19:00:00  1.4    60  ...     0      NaN   NaN   NaN
25  73329001 2025-01-04 20:00:00  3.1    60  ...     0      NaN   NaN   NaN
26  73329001 2025-01-04 21:00:00  1.2    60  ...     0      NaN   NaN   NaN
27  73329001 2025-01-04 22:00:00  0.2    60  ...     0      NaN   NaN   NaN
28  73329001 2025-01-04 23:00:00  0.2    59  ...     0      NaN   NaN   NaN
29  73329001 2025-01-05 00:00:00  0.2    49  ...     0      NaN   NaN   NaN
30  73329001 2025-01-05 01:00:00  0.2    32  ...     0      NaN   NaN   NaN
31  73329001 2025-01-05 02:00:00  0.4    60  ...     0      NaN   NaN   NaN
32  73329001 2025-01-05 03:00:00  0.0    60  ...     0      NaN   NaN   NaN
33  73329001 2025-01-05 04:00:00  0.4    60  ...     0      NaN   NaN   NaN
34  73329001 2025-01-05 05:00:00  0.0    15  ...     0      NaN   NaN   NaN
35  73329001 2025-01-05 06:00:00  0.2    55  ...     0      NaN   NaN   NaN
36  73329001 2025-01-05 07:00:00  0.0    28  ...     0      NaN   NaN   NaN

When querying the DB SELECT * from weather_raw_data , only data for POSTE 73329001 and 69029001 get written. 73304005 is missing entirely.


Solution

  • After a chat on QuestDB slack, a core engineer suggested to switch to HTTP endpoint, rather than TCP endpoint, as it gives you more significant error messages and error control.

    I could then capture every sending exception and print it out:

    Processing data from 2025-01-01 19:00 to 2025-01-03 07:00...
    Retrieving weather data for station 69029001...
    Computing DJUs...
        station       date  dju_heating  dju_cooling
    0  69029001 2025-01-02         9.45            0
    Retrieving weather data for station 73329001...
    Computing DJUs...
        station       date  dju_heating  dju_cooling
    0  73329001 2025-01-02         13.9            0
    Retrieving weather data for station 73304005...
    Error processing station 73304005 for date 2025-01-01 00:00:00: Could not flush buffer: failed to parse line protocol:errors encountered on line(s):
    error in line 1: table: weather_raw_data, column: DRR1; cast error from protocol type: FLOAT to column type: LONG [id: 03eba5d178eb-4, code: invalid, line: 1]
    Processing data from 2025-01-02 19:00 to 2025-01-04 07:00...
    Retrieving weather data for station 69029001...
    Computing DJUs...
        station       date  dju_heating  dju_cooling
    0  69029001 2025-01-03        15.15            0
    Retrieving weather data for station 73329001...
    Computing DJUs...
        station       date  dju_heating  dju_cooling
    0  73329001 2025-01-03         13.2            0
    Retrieving weather data for station 73304005...
    Error processing station 73304005 for date 2025-01-02 00:00:00: Could not flush buffer: failed to parse line protocol:errors encountered on line(s):
    error in line 1: table: weather_raw_data, column: DRR1; cast error from protocol type: FLOAT to column type: LONG [id: 03eba5d178eb-5, code: invalid, line: 1]
    Processing data from 2025-01-03 19:00 to 2025-01-05 07:00...
    Retrieving weather data for station 69029001...
    Computing DJUs...
        station       date  dju_heating  dju_cooling
    0  69029001 2025-01-04         15.5            0
    Retrieving weather data for station 73329001...
    Computing DJUs...
        station       date  dju_heating  dju_cooling
    0  73329001 2025-01-04         16.4            0
    Retrieving weather data for station 73304005...
    Error processing station 73304005 for date 2025-01-03 00:00:00: Could not flush buffer: failed to parse line protocol:errors encountered on line(s):
    error in line 1: table: weather_raw_data, column: DRR1; cast error from protocol type: FLOAT to column type: LONG [id: 03eba5d178eb-6, code: invalid, line: 1]
    

    So I could pinpoint the problem with some type mismatch on the DRR1 column. It turns out the API from where I am reading is sending some NaNs which are causing trouble. I only had to dropna on the dataframe before sending and now it's all good.