Search code examples
pythonpandasnumpyrandom

Using numpy random and pandas sample to make a random choice from a DF, not random distribution of choices after 100k runs


I have made a small script for my D&D group that, when a spell is cast, chooses whether or not another random spell gets cast, the ordinal spell gets cast, or nothing happens (with a 50%, 25%, 25% ratio, respectively).

In order to do this, I made a pd.DataFrame of every possible spell in the game (using data from http://dnd5e.wikidot.com/spells), and then appended two data frames (one that says "Original Cast" in the spell name and the other that says "Nothing Happens"), each with len == len(df)/2 so as to have a full df twice the size of the original, as shown in the code below.

import pandas as pd
import numpy as np
from os import urandom

def create_df():

    df = pd.read_csv("all_spells.csv", encoding="utf-8")
    df = df.dropna(axis=0, subset="Spell Name")
    df_b = df[~df["Spell Name"].str.contains(r"\((.*?)\)")].reset_index(drop=True)


    OG_cast = ['Orginal Cast' for i in range(int(len(df.index)/2))]
    No_Magic = ['Nothing Happens' for i in range(int(len(df.index)/2))]
    Nadda = [None for i in range(int(len(df.index)/2))]

    df_same = pd.DataFrame(columns=df.columns,
                           data={
                               df.columns[0]: OG_cast,
                               df.columns[1]: Nadda,
                               df.columns[2]: Nadda,
                               df.columns[3]: Nadda,
                               df.columns[4]: Nadda
                           })
    df_nothing = pd.DataFrame(columns=df.columns,
                           data={
                               df.columns[0]: No_Magic,
                               df.columns[1]: Nadda,
                               df.columns[2]: Nadda,
                               df.columns[3]: Nadda,
                               df.columns[4]: Nadda
                           })

    df_full = pd.concat([df_b, df_same, df_nothing], axis=0).reset_index(drop=True)

    return df_full

df_full.sample(n=10) is shown below, for reference.

+-----+----------------------------+---------------+----------------+---------+--------------------------------+--------------+
|     | Spell Name                 | School        | Casting Time   | Range   | Duration                       | Components   |
+=====+============================+===============+================+=========+================================+==============+
|  12 | Psychic Scream             | Enchantment   | 1 Action       | 90 feet | Instantaneous                  | S            |
+-----+----------------------------+---------------+----------------+---------+--------------------------------+--------------+
|  18 | True Polymorph             | Transmutation | 1 Action       | 30 feet | Concentration up to 1 hour     | V S M        |
+-----+----------------------------+---------------+----------------+---------+--------------------------------+--------------+
| 670 | Orginal Cast               |               |                |         |                                | nan          |
+-----+----------------------------+---------------+----------------+---------+--------------------------------+--------------+
| 193 | Conjure Woodland Beings    | Conjuration   | 1 Action       | 60 feet | Concentration up to 1 hour     | V S M        |
+-----+----------------------------+---------------+----------------+---------+--------------------------------+--------------+
| 795 | Orginal Cast               |               |                |         |                                | nan          |
+-----+----------------------------+---------------+----------------+---------+--------------------------------+--------------+
| 218 | Otilukes Resilient Sphere | Evocation     | 1 Action       | 30 feet | Concentration up to 1 minute   | V S M        |
+-----+----------------------------+---------------+----------------+---------+--------------------------------+--------------+
| 353 | Levitate                   | Transmutation | 1 Action       | 60 feet | Concentration up to 10 minutes | V S M        |
+-----+----------------------------+---------------+----------------+---------+--------------------------------+--------------+
| 839 | Nothing Happens            |               |                |         |                                | nan          |
+-----+----------------------------+---------------+----------------+---------+--------------------------------+--------------+
| 459 | Silent Image               | Illusion      | 1 Action       | 60 feet | Concentration up to 10 minutes | V S M        |
+-----+----------------------------+---------------+----------------+---------+--------------------------------+--------------+
| 719 | Orginal Cast               |               |                |         |                                | nan          |
+-----+----------------------------+---------------+----------------+---------+--------------------------------+--------------+       

I then call the script below to get what happens when a spell is cast.

    df = create_df()

    seed = int(np.random.uniform(0, len(df.index)*10))
    spell = df.sample(1, random_state=seed)['Spell Name'].values[0]
    print("The spell cast is:", spell)

To test if this was giving me the distribution I wanted (50% of the time, there is a random spell cast, 25% nothing happens, and 25% the spell works as intended), I ran

    OC = 0
    NH = 0
    N = 0
    for i in range(100000):
        seed = int(np.random.uniform(0, len(df.index)*10))
        arb = df.sample(1, random_state=seed)['Spell Name'].values[0]
        # print(arb)
        if arb == 'Orginal Cast':
            OC += 1
        elif arb == 'Nothing Happens':
            NH += 1
        else: N += 1


    print(OC, NH, N)

And instead of getting 50/25/25 (for the things stated above), I get an extremely consistent 47/26.5/26.5. Does anyone know why this is happening? And does anyone have a better idea and random sampling so as to more consistently get the correct ratio?


Solution

  • Your approach seems unnecessarily complicated.

    In the end you want that "when a spell is cast, randomly chooses whether or not that spell gets cast, another random spell gets cast, or nothing happens.".

    So just do that, no need to create complex pandas structures:

    import numpy as np
    import pandas as pd
    
    # load dataset
    df = pd.read_html('http://dnd5e.wikidot.com/spells')[0]
    
    def cast_spell(df, spell, p=[0.25, 0.5, 0.25]):
        # check the input is valid
        assert spell in df['Spell Name'].values
        
        # randomly pick one of the outcomes
        match np.random.choice(['Original Spell', 'Other Spell', 'Nothing happens'], p=p):
            case 'Original Spell':
                return f'The "{spell}" spell was successfully cast.'
            case 'Nothing happens':
                return f'The "{spell}" spell failed: nothing happens.'
            case _:
                other = df.loc[df['Spell Name'].ne(spell), 'Spell Name'].sample(1).item()
                return f'The "{spell}" spell failed. "{other}" was cast instead.'
    
    cast_spell(df, 'True Strike')
    

    Example of calling 20 times cast_spell(df, 'True Strike'):

    The "True Strike" spell failed. "Poison Spray" was cast instead.
    The "True Strike" spell was successfully cast.
    The "True Strike" spell failed. "Lightning Lure" was cast instead.
    The "True Strike" spell failed. "Thaumaturgy" was cast instead.
    The "True Strike" spell was successfully cast.
    The "True Strike" spell was successfully cast.
    The "True Strike" spell failed. "Light" was cast instead.
    The "True Strike" spell failed. "Infestation" was cast instead.
    The "True Strike" spell was successfully cast.
    The "True Strike" spell failed. "Mind Sliver" was cast instead.
    The "True Strike" spell failed: nothing happens.
    The "True Strike" spell failed. "Lightning Lure" was cast instead.
    The "True Strike" spell failed: nothing happens.
    The "True Strike" spell failed: nothing happens.
    The "True Strike" spell failed. "Mage Hand" was cast instead.
    The "True Strike" spell was successfully cast.
    The "True Strike" spell failed: nothing happens.
    The "True Strike" spell was successfully cast.
    The "True Strike" spell failed. "Primal Savagery" was cast instead.
    The "True Strike" spell failed. "Frostbite" was cast instead.
    

    pure python variant

    import pandas as pd
    import random
    
    df = pd.read_html('http://dnd5e.wikidot.com/spells')[0]
    
    spells = set(df['Spell Name'])
    
    def cast_spell(spells, spell, p=[0.25, 0.5, 0.25]):
        assert spell in spells
    
        choice = random.choices(['Original Spell', 'Other Spell', 'Nothing happens'],
                                weights=p)[0]
        if choice == 'Original Spell':
            return f'The "{spell}" spell was successfully cast.'
        elif choice == 'Nothing happens':
            return f'The "{spell}" spell failed: nothing happens.'
        else:
            other = random.choice(list(spells-{spell}))
            return f'The "{spell}" spell failed. "{other}" was cast instead.'
    
    cast_spell(spells, 'True Strike')