Search code examples
python-3.xpandascsvstatisticst-test

How to import values from a column of csv dataset into python for t-test?


New coder here, trying to run some t-tests in Python 3.6. Right now, to run my t-tests between my 2 data sets, I have been doing the following:

import plotly.plotly as py
import plotly.graph_objs as go
from plotly.tools import FigureFactory as FF
import numpy as np
import pandas as pd
import scipy
from scipy import stats

long_term_survivor_GENE1 = [-0.38,-0.99,-1.04,0.1, etc..]
short_term_survivor_GENE1 = [0.32, 0.33,0.96, etc...]
stats.ttest_ind(long_term_survivor_GENE1,short_term_survivor_GENE1)

Which requires me to manually enter the values for each column of both data sets for each specific gene (GENE1 in this case). Is there any way to be able to call for the values from the data set so that Python can just read the values without me typing them out myself? For example, some way that I can just say:

long_term_survivor_GENE1 = ##call values from GENE1 column from dataset 1##
short_term_survivor_GENE1 = ## call values from GENE1 column from dataset 2## 

Thanks for any help, and sorry that I'm not very well-versed in this stuff. Appreciate any feedback/tips. If you have any other questions, please let me know!


Solution

  • If you've shoved your data into the columns of a pandas dataframe then it might be as easy as this.

    >>> import pandas as pd
    >>> long_term_survivor_GENE1 = [-0.38,-0.99,-1.04,0.1]
    >>> short_term_survivor_GENE1 = [0.32, 0.33,0.96, 0.56]
    >>> df = pd.DataFrame({'long_term_survivor_GENE1': long_term_survivor_GENE1, 'short_term_survivor_GENE1': short_term_survivor_GENE1})
    >>> from scipy import stats
    >>> stats.ttest_ind(df['long_term_survivor_GENE1'], df['short_term_survivor_GENE1'])
    Ttest_indResult(statistic=-3.615804684179662, pvalue=0.011153077626049458)
    

    It might be a good idea to review the statistics behind this though. If you haven't already got them in a dataframe then have a look for some of the many answers here on SO about using read_csv for assistance.