Search code examples
pythonpandasstatadta

Is there a way to save value labels for Stata categorical data within Python?


So I know it's possible to read in either Stata categorical labels or values using the convert_categoricals parameter.

I was looking for a way to write/export a pandas dataframe to Stata and include the value labels. However all I could find was either

data_label : str, optional for the dataset label

or

variable_labels : dict for column names label,

but nothing for the values themselves.


Solution

  • Here is an answer to your question. It is probably not what you were expecting because I am not using pd.to_Stata, but the Python integration developed on Stata 16.

    The code below must be executed within Stata (from version 16 onwards). Briefly, I am generating a Pandas Data.Frame (df) that I will export to Stata. The trick is to apply the labels on the values using the ValueLabel.setLabelValue() functionality that comes from the sfi library.

    clear all
    
    python:
    from sfi import ValueLabel, Data
    import pandas as pd
    
    data = [['Eren Jaeger', 15, 1, 'Soldier' ] , ['Mikasa Ackerman', 14, 1, 'Soldier'], ['Armin Arlert', 14, 1 , 'Soldier'],['Levi Ackerman', 30, 2, 'Captain']]  
    #creating DataFrame
    df = pd.DataFrame(data, columns = ['Name', 'Age', 'Rank_num', 'Rank'])
    
    ##              Name  Age  Rank_num     Rank
    ##0      Eren Jaeger   15         1  Soldier
    ##1  Mikasa Ackerman   14         1  Soldier
    ##2     Armin Arlert   14         1  Soldier
    ##3    Levi Ackerman   30         2  Captain
    
    
    # Set number of observations in Stata
    Data.setObsTotal(len(df))
    
    #Create variables on Stata (from Python)
    Data.addVarStr("Name",10)
    Data.addVarDouble("Age")
    Data.addVarInt("Rank_num")
    
    #Store the content of "df" object from Python to Stata
    Data.store("Name", None, df['Name'], None)
    Data.store("Age", None, df['Age'], None)
    Data.store("Rank_num", None, df['Rank_num'], None)
    
    # HERE is where I solve your question!
    # 1) Create the labels
    ValueLabel.setLabelValue('rank_num_LABEL', 1, 'Soldier')
    ValueLabel.setLabelValue('rank_num_LABEL', 2, 'Captain')
    ValueLabel.getValueLabels('rank_num_LABEL')
    
    # 2) Attach the labels to the created variable
    #Attach the created label 
    ValueLabel.setVarValueLabel('Rank_num', 'rank_num_LABEL')
    
    end 
    
    br
    
    * At the end, you will see the following on the Stata browser
    * Name              Age Rank_num
    * Eren Jaeger       15  Soldier
    * Mikasa Ackerman   14  Soldier
    * Armin Arlert      14  Soldier
    * Levi Ackerman     30  Captain
    
    

    In case you want to understand better the reasoning behind the code above, here are the references that I used to learn it.

    1. Stata/Python integration part 9: Using the Stata Function Interface to copy data from Python to Stata
    2. Stata/Python integration part 8: Using the Stata Function Interface to copy data from Stata to Python