Search code examples
pythonpandasdataframestatsmodelscryptocurrency

Error using Santiment sanpy library for cryptocurrency data analysis


I am using sanpy to gather crypto market data, compute alpha, beta and rsquared with statsmodels, and then create a crypto = input("Cryptocurrency: ") function with a while loop that allows me to ask the user for an specific crypto and output its respective statistics, followed by showing the input again.

With the following code I receive the error: ValueError: If using all scalar values, you must pass an index

import san
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime
import statsmodels.api as sm
from statsmodels import regression

cryptos = ["bitcoin", "ethereum", "ripple", "bitcoin-cash", "tether",
"bitcoin-sv", "litecoin", "binance-coin", "eos", "chainlink",
"monero", "bitcoin-gold"]

def get_and_process_data(c):
    raw_data = san.get("daily_closing_price_usd/" + c, from_date="2014-12-31", to_date="2019-12-31", interval="1d") # "query/slug"
    return raw_data.pct_change()[1:]


df = pd.DataFrame({c: get_and_process_data(c) for c in cryptos})

df['MKT Return'] = df.mean(axis=1) # avg market return
#print(df) # show dataframe with all data

def model(x, y):
    # Calculate r-squared
    X = sm.add_constant(x) # artificially add intercept to x, as advised in the docs
    model = sm.OLS(y,X).fit()
    rsquared = model.rsquared
    
    # Fit linear regression and calculate alpha and beta
    X = sm.add_constant(x)
    model = regression.linear_model.OLS(y,X).fit()
    alpha = model.params[0]
    beta = model.params[1]

    return rsquared, alpha, beta

results = pd.DataFrame({c: model(df[df[c].notnull()]['MKT Return'], df[df[c].notnull()][c]) for c in cryptos}).transpose()
results.columns = ['rsquared', 'alpha', 'beta']
print(results)

The error is in the following line:

df = pd.DataFrame({c: get_and_process_data(c) for c in cryptos})

I tried solving the issue by changing it to:

df = {c: get_and_process_data(c) for c in cryptos}

df['MKT Return'] = df.mean(axis=1) # avg market return
print(df) # show dataframe with all data

But with that, it gave me a different error: AttributeError: 'dict' object has no attribute 'mean'.

The goal is to create a single DataFrame with the datatime column, columns for the cryptos and their pct.change data, an additional column for MKT Return with the daily mean from all cryptos' pct.change. Then, use all this data to calculate each crypto's statistics and finally create the input function mentioned at the beginning.

I hope I made myself clear and that someone is able to help me with this matter.


Solution

  • This is a great start, but I think that you are getting confused with the return from san. If you look at

    import san
    import pandas as pd
    
    # List of data we are interested in    
    cryptos = ["bitcoin", "ethereum", "ripple", "bitcoin-cash", "tether",
    "bitcoin-sv", "litecoin", "binance-coin", "eos", "chainlink",
    "monero", "bitcoin-gold"]
    
    # function to get the data from san into a dataframe and turn in into
    # a daily percentage change
    def get_and_process_data(c):
        raw_data = san.get("daily_closing_price_usd/" + c, from_date="2014-12-31", to_date="2019-12-31", interval="1d") # "query/slug"
        return raw_data.pct_change()[1:]
    
    # now set up an empty dataframe to get all the data put into
    df = pd.DataFrame()
    # cycle through your list
    for c in cryptos:
        # get the data as percentage changes
        dftemp = get_and_process_data(c)
        # then add it to the output dataframe df
        df[c] = dftemp['value']
    
    # have a look at what you have
    print(df)
    

    And from that point on you know you have some good data and you can play with it as you go forward.

    If I could suggest that you just get one currency and get the regressions working with that one then move forward to cycling through all of them.