I am trying to pass relation to Python UDF in Pig. But it's throwing me an error. Following are my Pig Latin Script, Python Script, and error log,
REGISTER '/home/cloudera/jython-installer-2.7.0.jar';
REGISTER '/home/cloudera/Code.py' USING jython as myfunc;
A = LOAD '/home/cloudera/Link.txt' as (line:chararray);
B = FOREACH A GENERATE myfunc.codefunc(line);
//Python Script
import pandas as pd
def count(A, crime):
with open(A, 'r', encoding='UTF8') as fileA:
data = fileA.read().lower()
count = data.count(crime.lower())
return count
def codefunc(A):
crime = ['Rape', 'Murder', 'Extortion', 'Felony', 'Burglary', 'Property Damage', 'Arrest', 'Political Unrest', 'Civil Unrest', 'Solitication', 'Larceny', 'Abettor', 'Trafficking', 'Tresspasser', 'Robbery']
crimecount = {}
for i in range(len(crime)):
crimecount[crime[i]] = count(A, crime[i])
final_count = pd.DataFrame(list(crimecount.items()), columns = ['Crime', 'Value'])
final_count['Percentage'] = 0
total_count = final_count['Values'].sum()
for i in range(0, final_count.last_valid_index()+1):
final_count['Percentage'][i] = float((final_count['Values'][i]/total_count)*100.0)
final_count.sort_values(by=['Percentage'], ascending=False)
final_count.to_csv('/home/cloudera/solution.csv', header=0)
//Error Log Link of Error Log
I have placed the link where dataset resides, and I have passed the link from Pig to Python. Python should go to that link and read the dataset and execute the code written. Python Code is absolutely fine. I am confident on that. But Pig is throwing me an error at relation, 'B'. I tried placing the error code here, but Stack Overflow isn't letting me do it, so I have placed the link. Regret the inconvenience. Can anyone please help me. Thanks in advance.
Your code is absolutely fine. Problem is with Jython. Jython doesn't support Pandas dataframe because it was written in C/C++. So cheer up!
Hope you like my answer! Yippee!!