Search code examples
pythondictionarydataframetext-analysis

How to convert dictionary to dataframe in Python


data = {'documents': [{'score': 0.8806856870651245, 'id': '1'}, {'score': 0.15902310609817505, 'id': '2'}, {'score': 0.9225043058395386, 'id': '3'}, {'score': 0.9872093200683594, 'id': '4'}], 'errors': []}

comments = 
0    I love how we walk in to the fruit and vegetab...
1    When stores upgrade finished nothing to improve??
2    I was pleased with the cheerful efficiency wit...
3    Affordable prices, varieties and staff are ve..

There are two part of data. How to remove the data["errors"] and then convert to the data looks like below? After this merge the comments data which is Series?

score                        id       comments
0.8806856870651245            1       I love how
0.15902310609817505           2       When stores
0.9225043058395386            3       I was pleased with
0.9872093200683594            4       Affordable prices

Solution

  • You don't need to delete the errors, you just need to create the dataframe by accessing the documents within the data. This dictionary format will be automatically converted into a dataframe where the columns are the keys to the dictionary.

    Then just merge the comments after first converting it into a dataframe via to_frame(). Note that I used string values for the index to match those in the documents data.

    # Create sample comments.
    comments = pd.Series(['I love how', 'When stores', 'I was pleased with', 'Affordable prices'], 
                         index=['1', '2', '3', '4'])
    
    >>> pd.DataFrame(data['documents']).merge(
            comments.to_frame('comments'), left_on='id', right_index=True)
      id     score            comments
    0  1  0.880686          I love how
    1  2  0.159023         When stores
    2  3  0.922504  I was pleased with
    3  4  0.987209   Affordable prices