Search code examples
pythonarrayscsvdataset

How to transform a multi dimensional array from a CSV file into a list


screenshot of the csv file

Hi(sorry if this is a dump question)..i have a data set as CSV file ...every row contains 44 column and every cell containes 44 float number separated by two spaces like this(look at the screenshot) ...i tried CSV readline/s + numpy and non of them worked i want to take every row as a list with[1936] variable (44*44) and then combine the whole data set into 2d array ...my_data[n_of_samples][1936]


Solution

  • so as stated by user ybl, this is not a CSV. It's not even close to being a CSV.

    This means that you have to implement some processing to turn this into something useable. I put the screenshot through an OCR to extract the actual text values, but next time provide the input file. Screenshots of data are annoying to work with.

    The processing you need to to is to find the start and end of the rows, using the [ and ] characters respectively. Then you split this data with the basic string.split() which doesn't care about the number of spaces.

    Try the code below and see if that works for the input file.

    rows = []
    current_row = ""
    
    
    with open("somefile.txt") as infile:
        for line in infile.readlines():
            cleaned = line.replace('"', '').replace("\n", " ")
            if "]" in cleaned:
                current_row = f"{current_row} {cleaned.split(']')[0]}"
                rows.append(current_row.split())
                current_row = ""
                cleaned = cleaned.split(']')[1]
            if "[" in cleaned:
                cleaned = cleaned.split("[")[1]
            current_row = f"{current_row} {cleaned}"
    
    
    for row in rows:
        print(len(row))
    

    output

    44
    44
    44
    

    input file:

    "[ 1.79619717e+04    1.09988207e+02     4.13270009e+01    1.72227906e+01
       1.06178751e+01    5.20957856e+00     7.50891645e+00    4.57943370e+00
       2.65572713e+00    2.96725867e-01     2.43040664e+00    1.32822091e+00
       4.09853169e-01    1.18412873e+00     6.43398990e-01    1.23796528e+00
       9.63975374e-02    2.95295579e-01     7.68998970e-01    4.98040980e-01
       2.84036936e-01    1.76004564e-01     1.43527613e-01    1.64765236e-01
       1.51171075e-01    1.02586637e-01     3.27835810e-02    1.21872869e-02
       -7.59824907e-02   8.48217334e-02     7.29953754e-02    4.89750588e-02
       5.89426950e-02    5.05485266e-02     2.34761263e-02    -2.41095452e-02
       5.15952510e-02    1.39933210e-02     2.12354074e-02    3.40820680e-03
       -2.57466949e-03   -1.06481222e-02    -8.35155410e-03   1.21653512e-12]","[-6.12189619e+02       1.03584744e+04     2.34417495e+02     7.01761526e+01
       3.92495170e+01    1.81609738e+01     2.58114624e+01    1.52275550e+01
       8.59676934e+00    9.45036161e-01     7.71943506e+00    4.17516432e+00
       1.27920413e+00    3.68862368e+00     1.99582544e+00    3.82999035e+00
       2.96068511e-01    9.06341796e-01     2.35621065e+00    1.52094079e+00
       8.64565916e-01    5.34605108e-01     4.35456793e-01    4.99450615e-01
       4.57778770e-01    3.10324997e-01     9.90860520e-02    3.68281889e-02
       -2.29532895e-01   2.56108491e-01     2.20284123e-01    1.47727878e-01
       1.77724506e-01    1.52350751e-01     7.07318164e-02    -7.26252404e-02
       1.55364050e-01    4.21222079e-02     6.39113311e-02    1.02558665e-02
       -7.74736016e-03   -3.20368093e-02    -2.51241082e-02   1.21653512e-12]","[-5.03959282e+02       -5.64452044e+02    7.90433958e+03     1.94146598e+02
       1.06178751e+01    5.20957856e+00     7.50891645e+00    4.57943370e+00
       2.65572713e+00    2.96725867e-01     2.43040664e+00    1.32822091e+00
       4.09853169e-01    1.18412873e+00     6.43398990e-01    1.23796528e+00
       9.63975374e-02    2.95295579e-01     7.68998970e-01    4.98040980e-01
       2.84036936e-01    1.76004564e-01     1.43527613e-01    1.64765236e-01
       1.51171075e-01    1.02586637e-01     3.27835810e-02    1.21872869e-02
       -7.59824907e-02   8.48217334e-02     7.29953754e-02    4.89750588e-02
       5.89426950e-02    5.05485266e-02     2.34761263e-02    -2.41095452e-02
       5.15952510e-02    1.39933210e-02     2.12354074e-02    3.40820680e-03
       -2.57466949e-03   -1.06481222e-02    -8.35155410e-03   1.21653512e-12]"