Search code examples
pythonpython-3.xpandasdataframefacebook-prophet

convert prometheus time series JSON metrics to pandas dataframes


I am trying to convert prometheus time series JSON metrics to pandas dataframes.

FYI...

#!/usr/bin/env python3

import requests
import copy 
import pandas as pd
URL = "http://x.x.x.x:30000/api/v1/query?query=node_memory_MemFree_bytes{instance=\"10.56.0.4:9100\",job=\"node-exporter\"}[1d]"
  
r = requests.get(url = URL) 

data = r.json()

data_dict={}
metric_list = []
# print(data['data']['result']['values'])
# exit()
for i in data['data']['result']:
    for j in i['values']:
        data_dict = copy.deepcopy(i['metric'])
        data_dict['time'] = j[0]
        data_dict['value'] = j[1]
        metric_list.append(data_dict)
        
  
df_metric = pd.DataFrame(metric_list)

df1 = df_metric[['time', 'value']]
df1['time'] = pd.to_datetime(df1['time'],unit='s')
df1 = df1.set_index('time')
df1["value"] = df1.values.astype(float)

### before resampling
print(df1)

### after resampling to 15mins
df1 = df1.resample('15min').mean()
print(df1)

print(type(df1))

But the output i get is a single column output with value as column header. Instead i want the output to be with two cloumns like date and value.

Current Output:

                            value
time                             
2021-08-12 16:15:00  1.309109e+08
2021-08-12 16:30:00  1.376348e+08
2021-08-12 16:45:00  1.376068e+08
2021-08-12 17:00:00  1.354043e+08
2021-08-12 17:15:00  1.355775e+08

Expected Output:

                            
        ds             y             
2021-08-12 16:15:00  1.309109e+08
2021-08-12 16:30:00  1.376348e+08
2021-08-12 16:45:00  1.376068e+08
2021-08-12 17:00:00  1.354043e+08
2021-08-12 17:15:00  1.355775e+08

So that now i could apply different timeseries forecasting models like Facebook Propher


Solution

  • Currently, you have set time as an index, so if you want it to be a column, you can use reset_index().

    df1 = df1.reset_index()
    

    Then, you can rename the column labels by:

    df1.columns = ['ds', 'y']
    

    or by:

    df1 = df1.rename({'time': 'ds', 'value': 'y'}, axis=1)
    

    Result:

    print(df1)
    
                        ds            y
    0  2021-08-12 16:15:00  130910900.0
    1  2021-08-12 16:30:00  137634800.0
    2  2021-08-12 16:45:00  137606800.0
    3  2021-08-12 17:00:00  135404300.0
    4  2021-08-12 17:15:00  135577500.0