Search code examples
pythonfunctionpandasstdinprogram-entry-point

How to read the returned value (from previous function) into pandas, python?


In the following program

I want to access/pipe the data from one function in the downstream function.

With the python code something like below:

def main():
data1, data2, data3 = read_file()
do_calc(data1, data2, data3)   

def read_file():
    data1 = ""
    data2 = ""
    data3 = ""

    file1 = open('file1.txt', 'r+').read()
    for line in file1
        do something....
        data1 += calculated_values

    file2 = open('file2.txt', 'r+').read()
    for line in file1
        do something...
        data2 += calculated_values    

    file1 = open('file1.txt', 'r+').read()
    for line in file1
        do something...
        data3 += calculated_values

    return data1, data2, data3

def do_calc(data1, data2, data3):
    d1_frame = pd.read_table(data1, sep='\t')
    d2_frame = pd.read_table(data2, sep='\t')
    d3_frame = pd.read_table(data3, sep='\t')

    all_data = [d1_frame, d2_frame, d3_frame]

main()

What is wrong with the given code? looks like panda isn't able to read the input files properly but is printing the values from data1, 2 and 3 to the screen.

read_hdf seems to read the file but not properly. Is there a way to read the data returned from function directly into pandas (without writing/reading into a file).

Error message:

Traceback (most recent call last):

  File "calc.py", line 757, in <module>

    main()

  File "calc.py", line 137, in main

    merge_tables(pop1_freq_table, pop2_freq_table, f1_freq_table)

  File "calc.py", line 373, in merge_tables

    df1 = pd.read_table(pop1_freq_table, sep='\t')

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 645, in parser_f

    return _read(filepath_or_buffer, kwds)

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 388, in _read

    parser = TextFileReader(filepath_or_buffer, **kwds)

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 729, in __init__

    self._make_engine(self.engine)

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 922, in _make_engine

    self._engine = CParserWrapper(self.f, **self.options)

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 1389, in __init__

    self._reader = _parser.TextReader(src, **kwds)

  File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4019)

  File "pandas/parser.pyx", line 665, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:7967)

FileNotFoundError: File b'0.667,0.333\n2\t15800126\tT\tT,A\t0.667,0.333\n2\t15800193\tC\tC,T\t0.667,0.333\n2\t15800244\tT\tT,C\......

I would appreciate any explanation.


Solution

  • pd.read_table(data1, sep='\t') considers data1 as a filepath since it doesn't have a read method. You can see in the stacktrace that it tries to open a file with the name of a csv file content.

    from read_table help:

    Parameters
    --------
    filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any
    object with a read() method (such as a file handle or StringIO)
    

    You should convert it to a io.StringIO object so it can be read

    Quickfix:

    pd.read_table(io.StringIO(data1), sep='\t')
    

    but that creates a copy of the data. The best fix would be to create io.StringIO buffers directly:

    def read_file():
        data1 = io.StringIO()
    
    
        file1 = open('file1.txt', 'r+').read()
        for line in file1
            do something....
            data1.write(calculated_values)
    
        # in the end
        data1.seek(0)  # reset to start of "file"