Search code examples
pythonpandasdataframeexport-to-csv

Formatting List to Columns for CSV


I am trying to a calculated list to serve as 2 additional columns in an existing csv, however I struggle with preparing them as 2 columns.

MWE:

import pandas as pd
from transformers import BertTokenizer, BertForSequenceClassification, pipeline

df = pd.read_csv('original.csv')
dtype_before = type(df["text"])
text_list = df["text"].tolist()
tokenizer = BertTokenizer.from_pretrained("daigo/bert-base-japanese-sentiment")
model = BertForSequenceClassification.from_pretrained("daigo/bert-base-japanese-sentiment")
sentiment_analyzer = pipeline("sentiment-analysis",model=model, tokenizer=tokenizer)
list(map(sentiment_analyzer, text_list))

printing the list would result in this:

[[{'label': 'ポジティブ', 'score': 0.7804045081138611}], [{'label': 'ポジティブ', 'score': 0.9542087912559509}], [{'label': 'ポジティブ', 'score': 0.8557115793228149}], [{'label': 'ポジティブ', 'score': 0.9135494232177734}], [{'label': 'ポジティブ', 'score': 0.86244797706604}], [{'label': 'ネガティブ', 'score': 0.8266600370407104}], [{'label': 'ポジティブ', 'score': 0.9198371767997742}], [{'label': 'ポジティブ', 'score': 0.9033421874046326}], [{'label': 'ポジティブ', 'score': 0.7705154418945312}], [{'label': 'ポジティブ', 'score': 0.8205435872077942}], [{'label': 'ポジティブ', 'score': 0.8045720458030701}], [{'label': 'ネガティブ', 'score': 0.5160148739814758}], [{'label': 'ポジティブ', 'score': 0.8745550513267517}], [{'label': 'ポジティブ', 'score': 0.941367506980896}], [{'label': 'ポジティブ', 'score': 0.899341344833374}], [{'label': 'ポジティブ', 'score': 0.9200822710990906}], [{'label': 'ポジティブ', 'score': 0.6254457235336304}], [{'label': 'ポジティブ', 'score': 0.8494048714637756}], [{'label': 'ポジティブ', 'score': 0.6723847389221191}], [{'label': 'ポジティブ', 'score': 0.9329613447189331}], [{'label': 'ポジティブ', 'score': 0.9084392786026001}], [{'label': 'ポジティブ', 'score': 0.7804917693138123}], [{'label': 'ポジティブ', 'score': 0.6737139225006104}], [{'label': 'ネガティブ', 'score': 0.5254362225532532}], [{'label': 'ネガティブ', 'score': 0.7653219103813171}], [{'label': 'ネガティブ', 'score': 0.7342881560325623}], [{'label': 'ポジティブ', 'score': 0.8476402163505554}]]

I would like to achieve, getting 'label' as one column header and 'score' as the 2nd column header, so that the final 2 columns would look somewhat like this:

label         column
ポジティブ      0.7804045081138611
ポジティブ      0.9542087912559509
ポジティブ      0.8557115793228149
...
ネガティブ      0.5160148739814758

I think once I achieve that, to add these columns to a csv I could use pandas right? So adding:

import csv
import re
import pandas as pd
from transformers import BertTokenizer, BertForSequenceClassification, pipeline
 df = pd.read_csv('original.csv')
    dtype_before = type(df["text"])
    text_list = df["text"].tolist()
    tokenizer = BertTokenizer.from_pretrained("daigo/bert-base-japanese-sentiment")
    model = BertForSequenceClassification.from_pretrained("daigo/bert-base-japanese-sentiment")
    sentiment_analyzer = pipeline("sentiment-analysis",model=model, tokenizer=tokenizer)
    list(map(sentiment_analyzer, text_list))
<some magic to prepare results to a proper list>

df['label','score'] = <some magic to prepare results to a proper list>
df.to_csv("filepath.csv", index=False) 

Solution

  • Could you try;

    label_score_flat = [dc for ls in map(sentiment_analyzer, text_list) for dc in ls]
    
    df['label'] = [dc['label'] for dc in label_score_flat ]
    df['score'] = [dc['score'] for dc in label_score_flat ]
    
     
    df.to_csv("filepath.csv", index=False) 
    

    I have not tested so there might be bugs