Search code examples
pythonloopsindexingscipypearson

Python nested loop - table index as variable


I am not Python programmer, but I need to use some method from SciPy library. I just want to repeat inner loop a couple of times, but with changed index of table. Here is my code for now:

from scipy.stats import pearsonr

fileName = open('ILPDataset.txt', 'r')
attributeValue, classValue = [], []

for index in range(0, 10, 1):
    for line in fileName.readlines():
        data = line.split(',')
        attributeValue.append(float(data[index]))
        classValue.append(float(data[10]))
    print(index)
    print(pearsonr(attributeValue, classValue))

And I am getting the following output:

0
(-0.13735062681256097, 0.0008840631556260505)
1
(-0.13735062681256097, 0.0008840631556260505)
2
(-0.13735062681256097, 0.0008840631556260505)
3
(-0.13735062681256097, 0.0008840631556260505)
4
(-0.13735062681256097, 0.0008840631556260505)
5
(-0.13735062681256097, 0.0008840631556260505)
6
(-0.13735062681256097, 0.0008840631556260505)
7
(-0.13735062681256097, 0.0008840631556260505)
8
(-0.13735062681256097, 0.0008840631556260505)
9
(-0.13735062681256097, 0.0008840631556260505)

As you can see index is changing, but the result of that function is always like the index would be 0.

When I am running script couple of times but with changing index value like this:

attributeValue.append(float(data[0]))
attributeValue.append(float(data[1]))
...
attributeValue.append(float(data[9]))

everything is ok, and I am getting correct results, but I can't do it in one loop statement. What am I doing wrong?

EDIT: Test file:

62,1,6.8,3,542,116,66,6.4,3.1,0.9,1
40,1,1.9,1,231,16,55,4.3,1.6,0.6,1
63,1,0.9,0.2,194,52,45,6,3.9,1.85,2
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,6.2,3,240,1680,850,7.2,4,1.2,1
20,1,1.1,0.5,128,20,30,3.9,1.9,0.95,2
84,0,0.7,0.2,188,13,21,6,3.2,1.1,2
57,1,4,1.9,190,45,111,5.2,1.5,0.4,1
52,1,0.9,0.2,156,35,44,4.9,2.9,1.4,1
57,1,1,0.3,187,19,23,5.2,2.9,1.2,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
30,1,1.3,0.4,482,102,80,6.9,3.3,0.9,1
17,0,0.7,0.2,145,18,36,7.2,3.9,1.18,2
46,0,14.2,7.8,374,38,77,4.3,2,0.8,1

Expected results of pearsonr for 9 script runs:

data[0] (0.06050513030608389, 0.8238536636813034)
data[1] (-0.49265895172303803, 0.052525691067199995)
data[2] (-0.5073312383613632, 0.0448647312201305)
data[3] (-0.4852842899321005, 0.056723468068371544)
data[4] (-0.2919584357031029, 0.27254138535817224)
data[5] (-0.41640591455640696, 0.10863082761524119)
data[6] (-0.46954072465442487, 0.0665061785375443)
data[7] (0.08874739193909209, 0.7437895010751641)
data[8] (0.3104260624799073, 0.24193152445774302)
data[9] (0.2943030868699842, 0.26853066217221616)

Solution

  • Turn each line of the file into a list of floats

    data = []
    with open'ILPDataset.txt') as fileName:
        for line in fileName:
            line = line.strip()
            line = line.split(',')
            line = [float(item) for item in line[:11]]
            data.append(line)
    

    Transpose the data so that each list in data has the column values from the original file. data --> [[column 0 items], [column 1 items],[column 2 items],...]

    data = zip(*data)    # for Python 2.7x
    #data = list(zip(*data))    # for python 3.x
    

    Correlate:

    for n in [0,1,2,3,4,5,6,7,8,9]:
        corr = pearsonr(data[n], data[10])
        print('data[{}], {}'.format(n, corr))