Search code examples
pythondataframecorpus

Transform entire column as a corpus


df have two columns containing text. I want to transform them to corpus separately.

df

id | Description 1                   |Description 2       |
-----------------------------------------------------------
1  |that book is good                | better than book2  |
2  |book 2 is not better than 1      | not good           |
.  |            .                    |      .             |
.  |            .                    |      .             |
.  |            .                    |      .             |

Consider Description 1 is the document and Description 2 is the query.

Expected Output

Corpus 1: that book is good book 2 is not better than 1..................
Corpus 2: better than book2 not good.....................

Solution

  • You need to join the every rows that avaliable in the column using join function and then append it.Output is in list format

    corpus = []
    for i  in range(len(df.columns)):
        corpus.append(' '.join(df.iloc[j,i] for j in range(len(df.iloc[:,i]))))