Search code examples
pythonnumpydataframenlpstanford-nlp

Python: Convert Dataframe into a natural language text


We're implementing NLP solution, where we have a bunch of paragraphs text and tables. We've used google's burt for NLP, and it works great on text. However, if we ask a question whose answer lies in a table value then our nlp solution wouldn't work. Because it only works on natural language text (sentence, paragraph etc).

So, in order to get the answer from a table (dataframe) we're thinking to convert the whole dataframe into a natural language text which perserve the relation of each cell with its corresponding column name and row. For example:

+------------+-----------+--------+--+
| First Name | Last Name | Gender |  |
+------------+-----------+--------+--+
| Ali        | Asad      | Male   |  |
| Sara       | Dell      | Female |  |
+------------+-----------+--------+--+

Will become:

  • First Name is Ali, Last Name is Asad, and Gender is Male
  • First Name is Sara, Last Name is Dell, and Gender is Female

This will help us to find the right answer, for example, if I ask 'What's the Gender of 'Ali', then our NLP solution will give us the answer 'Male'.

I'm wondering is there any library available in python that converts a dataframe into a natural language text. Or shall I have to do it manually?

Many thanks


Solution

  • If you want to store it in a list you can easily do

    text=[]
    for index,rows in df.iterrows():
      a='First Name is {0}, Last Name is {1} and Gender is {2}'.format(df['First Name'] 
      [index],df['Last Name'][index],df['Gender'][index])
      text.append(a)
    print(text)
    

    You can then convert this list in natural language so that model can understand.