I use a BI tool (Tableau) that has visual problems for my clients when certain filters are applied. Reason why is that some categorical values disappear if there is no data for them under these filters selection. The client needs that all combinations are present at all times.
In order to avoid I am trying to inject dummy values that prevents the filtering from happening:
SELECT * FROM TABLE_A
UNION
SELECT Null AS FIRST_COLUMN, Null AS SECOND_COLUMN, 'Peter' AS THIRD_COLUMN
UNION
SELECT Null AS FIRST_COLUMN, Null AS SECOND_COLUMN, 'Michael' AS THIRD_COLUMN
UNION
SELECT Null AS FIRST_COLUMN, Null AS SECOND_COLUMN, 'James' AS THIRD_COLUMN
...
In my real case I have more than 50 columns and from time to time more columns are going to be added.
I was wondering if there is a way in that I can set the values for the columns I am interested in and Null for everything else without the need to define every single column like in the example I provide.
I only have read permissions on the table
I solved my problem using Python to achieve it (detailed below):
First let me detail the problem a little more with an example table
Student | Sport | Score |
---|---|---|
Mark | Football | 50 |
Mark | Golf | 20 |
Mark | Basketball | 70 |
Peter | Golf | 30 |
Peter | Basketball | 50 |
In Tableau if you filter by sport and select Football will make Peter dissapear from the map since there is no row in the datasource combining Peter and Football, so resulting table after filtering will be this:
Student | Sport | Score |
---|---|---|
Mark | Football | 50 |
But customer expects this:
Student | Sport | Score |
---|---|---|
Mark | Football | 50 |
Peter | Football | 0 |
So even if Sport its not in the visualization the effect of filtering by Football makes Peter dissapear.
For my solution I take the data from a MSSQL Server put it inside a pandas dataframe. Then I use the following function:
import pandas as pd
def dummyGenerator(df: pd.DataFrame, distinct_fields: list, combined_fields: list, filter_selection: dict = {}):
"""
dummyGenerator takes a pandas DataFrame and generates dummy values based on arguments
Args:
df (pandas DataFrame) : Source data to input
distinct_fields (list): Values to take as a distinct chunk. It will take the combinations present ONLY in the datasource of this fields
combined_fields (list): Name of destination datasource
filter_selection (dict): filter for specific values Example: {"field A": ["value1", "value2"], "field B": ["value1", "value2"]}
Returns:
resulting dataframe
"""
#Distinct: Fields in which we want to take the default combinations available in original datasource
distinct = df[distinct_fields].drop_duplicates().reset_index(drop=True)
for field in filter_selection:
distinct = distinct[distinct[field].isin(filter_selection[field])].reset_index(drop=True)
#Get combinations combine all possibilities
combinations = []
for field in combined_fields:
combinations.append(df[field].unique())
combined_df = pd.DataFrame(list(product(*combinations)), columns=combined_fields)
result_df = pd.DataFrame()
for column in df.columns:
if column not in combined_df.columns:
result_df[column] = None
else:
result_df[column] = combined_df[column]
result = distinct.merge(result_df, how="cross", suffixes= ["", "_y"])
return result[df.columns.to_list()]
After that I concat the data to the original data source and then I refresh an existent Tableau's datasource with the following script (Note that tableau version refrence your Tableaus's desktop folder Example: 2022.1":
def to_Tableau_DataSource(df: pd.DataFrame, project: str, datasource: str, tableau_version: str, site: str, server: str):
"""
to_Tableau_DataSource takes a pandas DataFrame query and uses it to refresh a Tableau DataSource
It also requires an enviromental variable named ASRA containing the username and password in the format <user>@email.com:<password>
Args:
df (pandas DataFrame) : Source data to input
project (str): Name of destination project
datasource (str): Name of destination datasource
tableau_version (str): Tableau Desktop version of hosting computer
Returns:
Nothing
"""
home = str(Path.home()) + "\\Downloads\\"
connection_string = os.getenv("ASRA")
u, p = connection_string.split(":")
si = site
s= server
filename = home + "output.xlsx"
print(filename)
df.to_excel(filename, index = False)
filename.replace("/","\\")
os.chdir(f"C:\\Program Files\\Tableau\\Tableau {tableau_version}\\bin\\")
os.system(f'tableau refreshextract --server {s} --username {u} --password {p} --project "{project}" --site {si} --datasource "{datasource}" --original-file "{filename}"')
os.remove(filename)