Search code examples
pythonpandasvisual-studio-codeparquet

Pandas read_parquet works on Python, but not in VSCode


I'm trying to read a parquet folder using the following code:

import pandas as pd
df = pd.read_parquet('PASP0001.parquet')

I'm working on a virtual environment. The code works perfectly if I open a Python session (note the which command):

(.venv) (base) vado@DESKTOP-JROHEGR:~/python-projects/SUSano/pysus$ which python
/home/vado/python-projects/SUSano/.venv/bin/python
(.venv) (base) vado@DESKTOP-JROHEGR:~/python-projects/SUSano/pysus$ python
Python 3.12.3 | packaged by Anaconda, Inc. | (main, May  6 2024, 19:46:43) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> df = pd.read_parquet('PASP0001.parquet')
>>> df
       PA_CONDIC PA_GESTAO PA_CODUNI PA_DATREF PA_CODPRO  ...    PA_NUMAPA PA_CODOCO PA_CIDPRI PA_CIDSEC PA_MORFOL
0             PB    353950    000779    200001   0201205  ...  00000000000       S04                              
1             PB    353950    000779    200001   0201205  ...  00000000000       S04                              
2             PB    353950    000779    200001   0201205  ...  00000000000       S04                              
3             PB    353950    000779    200001   0201205  ...  00000000000       S04                              
4             PB    353950    000779    200001   0201205  ...  00000000000       S04                              
...          ...       ...       ...       ...       ...  ...          ...       ...       ...       ...       ...
725999        MP    354870    016046    200001   1101138  ...  00011401577       S01      Q610      N180          
726000        MP    354870    016046    200001   1101138  ...  00011401588       S01      N039      N180          
726001        MP    354870    016046    200001   1101138  ...  00011401599       S01      N039      N180          
726002        MP    354870    016046    200001   1101138  ...  00011401600       S01      I10       N180          
726003        MP    354870    016046    200001   1101138  ...  00011401610       S01      N390      N180          

[726004 rows x 24 columns]
>>> 

But when I execute it in VSCode (I'm new to VSCode) I get an error message:

ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
A suitable version of pyarrow or fastparquet is required for parquet support.
 - Missing optional dependency 'pyarrow'. pyarrow is required for parquet support. Use pip or conda to install pyarrow.
 - Missing optional dependency 'fastparquet'. fastparquet is required for parquet support. Use pip or conda to install fastparquet.

In both cases the same Python interpreter is being used: enter image description here

I'm working on WSL Ubuntu.

I need help!!


Solution

  • Thanks, Jay and Javad for your kind answers. After trying many of your suggestions I found how to fix it. First of all, I found where VSCode shows the interpreter in use. In fact, it is quite evident. VSCode screen shot

    Then, I discovered that if you start VSCode from a bash terminal using code ., you should do this from the folder that contains the .venv directory. I was doing code . with venv activated, but from a sub-directory.