Let's say I have the following simple situation:
import pandas as pd
def multiply(row):
global results
results.append(row[0] * row[1])
def main():
results = []
df = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'a': 5, 'b': 6}])
df.apply(multiply, axis=1)
print(results)
if __name__ == '__main__':
main()
This results in the following traceback:
Traceback (most recent call last):
File "<ipython-input-2-58ca95c5b364>", line 1, in <module>
main()
File "<ipython-input-1-9bb1bda9e141>", line 11, in main
df.apply(multiply, axis=1)
File "C:\Users\bbritten\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4262, in apply
ignore_failures=ignore_failures)
File "C:\Users\bbritten\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4358, in _apply_standard
results[i] = func(v)
File "<ipython-input-1-9bb1bda9e141>", line 5, in multiply
results.append(row[0] * row[1])
NameError: ("name 'results' is not defined", 'occurred at index 0')
I know that I can move results = []
to the if
statement to get this example to work, but is there a way to keep the structure I have now and make it work?
You must declare results outside the functions like:
import pandas as pd
results = []
def multiply(row):
# the rest of your code...
Also note that list
in python is mutable, hence you don't need to specify it with global in the beginning of the functions. Example
def multiply(row):
# global results -> This is not necessary!
results.append(row[0] * row[1])