Search code examples
pythonpython-2.7tableau-apihypothesis-testtabpy

Tableau error "All Fields must be aggregate or constant" when invoking TabPy SCRIPT_REAL


I am calling a TabPy server via a calculated field in a Tableau worksheet to run a hypothesis test: does the rate of Bookings vary significantly by Group?

I have a table such as:

     Group  Bookings
0        A         1
1        A         0
3998     B         1
3999     B         0

In Python, on the same server (using the python 2.7 docker image) the test I want is simply:

from scipy.stats import fisher_exact
df_cont_tbl = pd.crosstab(df['Group'], df['Bookings'])
prop_test = fisher_exact(df_cont_tbl)
print 'Fisher exact test: Odds ratio = {:.2f}, p-value = {:.3f}'.format(*prop_test)

Returns: Fisher exact test: Odds ratio = 1.21, p-value = 0.102

I connected Tableau to the TabPy server and can execute a hello-world calculated field. For example, I get 42 back with the calculated field: SCRIPT_REAL("return 42", ATTR([Group]),ATTR([Bookings]) )

However, I try to invoke the stats function above with a calculated field to extract the p-value:

SCRIPT_REAL("
import pandas as pd
from scipy.stats import fisher_exact
df_cont_tbl = pd.crosstab(_arg1, _arg2)
prop_test = fisher_exact(df_cont_tbl)
return prop_test[1]
", [Group], [Bookings] )

I get the notification: The calculation contains errors with the drop-down All fields must be aggregate or constant when using table calculation functions or fields from multiple data sources

error box

I tried wrapping the inputs with ATTR(), as in:

SCRIPT_REAL("
import pandas as pd
from scipy.stats import fisher_exact
df_cont_tbl = pd.crosstab(_arg1, _arg2)
prop_test = fisher_exact(df_cont_tbl)
return prop_test[1]
",ATTR([Group]), ATTR([Bookings])
)

Which changes the notification to "The calculation is valid" but returns a Pandas ValueError from the server:

An error occurred while communicating with the External Service.
Error processing script
Error when POST /evaluate: Traceback
Traceback (most recent call last):
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tabpy_server/tabpy.py", line 467, in post
result = yield self.call_subprocess(function_to_evaluate, arguments)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/gen.py", line 1014, in run
yielded = self.gen.throw(*exc_info)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tabpy_server/tabpy.py", line 488, in call_subprocess
ret = yield future
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/_base.py", line 400, in result
return self.__get_result()
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/_base.py", line 359, in __get_result
reraise(self._exception, self._traceback)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/_compat.py", line 107, in reraise
exec('raise exc_type, exc_value, traceback', {}, locals_)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/concurrent/futures/thread.py", line 61, in run
result = self.fn(*self.args, **self.kwargs)
File "<string>", line 5, in _user_script
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/tools/pivot.py", line 479, in crosstab
df = DataFrame(data)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 266, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 402, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 5398, in _arrays_to_mgr
index = extract_index(arrays)
File "/opt/conda/envs/Tableau-Python-Server/lib/python2.7/site-packages/pandas/core/frame.py", line 5437, in extract_index
raise ValueError('If using all scalar values, you must pass'
ValueError: If using all scalar values, you must pass an index
Error type : ValueError
Error message : If using all scalar values, you must pass an index

Example dataset:

To generate the CSV I am connecting to:

import os
import pandas as pd
import numpy as np
from collections import namedtuple

OUTPUT_LOC = os.path.expanduser('~/TabPy_demo/ab_test_demo_results.csv')

GroupObs = namedtuple('GroupObs', ['name','n','p'])

obs = [GroupObs('A',3000,.10),GroupObs('B',1000,.13)] 
# note true odds ratio = (13/87)/(10/90) = 1.345

np.random.seed(2019)

df = pd.concat( [ pd.DataFrame({'Group': grp.name,
                                'Bookings':  pd.Series(np.random.binomial(n=1, 
                                                            p=grp.p, size=grp.n))
                              }) for grp in obs
                  ],ignore_index=True )

df.to_csv(OUTPUT_LOC,index=False)

Solution

  • Old question, but perhaps this will help someone else. There are a couple of issues. First is in relation to the way the data is passed to the pd.crosstab. Tableau passes a list of values to the tabpy server so wrap this in an array to fix your error you are getting.

    SCRIPT_REAL(
      "
       import pandas as pd
       import numpy as np
       from scipy.stats import fisher_exact
       df_cont_tbl = pd.crosstab(np.array(_arg1), np.array(_arg2))
       prop_test = fisher_exact(df_cont_tbl)
       return prop_test[1]
       ", 
       attr([Group]), attr([Bookings])
    
       )
    

    Another problem is the way the table calculation is being performed. You want to send tabpy two lists of information each as long as your table. In the default case tableau wants to calculate at the row level which is not going to work.

    I included the row count F1 into the csv that I built the workbook on and made sure to calculate the python value along this function.

    Tableau Call to tabpy

    Now when you put F1 into the worksheet it will return the P-value as many times as you have rows, A workaround for this is to wrap your calculation in another calculation to only return the value if it is the first row and place this in your worksheet.

    Wrapper Calculation

    Now you can place this into a worksheet.

    Final Desired Output