I want to add a new column in the dataframe. The new column is depend on some rules.
This is my code:
#!/usr/bin/python3.6
# coding=utf-8
import sys
import pandas as pd
import numpy as np
import io
import csv
df = pd.read_csv(sys.stdin,sep=',',encoding='utf-8',engine="python")
col_0 = check
df['df_cal'] = df.groupby(col_0)[col_0].transform('count')
df['status'] = np.where(
df['df_cal'] > 1,'change',
'New')
df = df.drop_duplicates(
subset=df.columns.difference(['keep']),keep = False)
df = df[(df.keep == '2')]
df.drop(['keep','df_cal'],axis = 1,inplace = True)
# print(sys.stdin)
df.to_csv(sys.stdout,encoding='utf-8',index = None)
sample csv:
VIP_number,keep
ab1,1
ab1,2
ab2,2
ab3,1
when I try to run this code, I write the command like this:
python3.6 nifi_python.py < test.csv check = VIP_number
and I get the error:
name 'check' is not defined
This is still not work because I don't know how can I input the column name to col_0 by stdin. col_0 should be 'VIP_number'. I don't want to hardcode the column name because the script will use in next time but the columns are different.
How can I add a new column in the dataframe by stdin? Any help would be very much appreciated.
#!/usr/bin/python3.6
# coding=utf-8
import sys
import pandas as pd
import numpy as np
import io
import csv
if len(sys.argv) < 2:
print( "Usage: nifi_python.py check=<column>"
sys.exit(1)
df = pd.read_csv(sys.stdin,sep=',',encoding='utf-8',engine="python")
col_0 = sys.argv[1].split('=')[1]
...
python nifi_python.py check=VIP_number < test.csv