Search code examples
pythonpandasdockerdocker-imageseal

How to access CSV file (located in pc hdd) from a docker container with python pandas?


I want to implement a Machine Learning algorithm which can operate on homomorphic data using PySEAL library. PySEAL library is released as a docker container with an 'examples.py' file which shows some homomorphic encryption example. I want to edit the 'examples.py' file to implement the ML algorithm. I trying to import a CSV file in this way -

dataset = pd.read_csv ('Dataset.csv')

I have imported pandas library successfully. I have tried many approaches to import the CSV file but failed. How can I import it?

I am new to Docker. Detailed procedure will be really helpful.


Solution

  • You can either do it via the Docker build process (assuming you are the one creating the image) or through a volume mapping that would be accessed by the container during runtime.

    Building source with Dataset.csv within

    For access through the build, you could do a Docker Copy command to get the file within the workspace of the container

    FROM 3.7
    
    COPY /Dataset.csv /app/Dataset.csv
    ...
    

    Then you can directly access the file via /app/Dataset.csv from the container using pandas.read_csv() function, like -

    data=pandas.read_csv('/app/Dataset.csv')
    

    Mapping volume share for Dataset.csv

    If you don't have direct control over the source image creation, or do not want the dataset packaged with the container (which may be the best practice depending on the use case). You can share it through a volume mapping when starting the container:

    dataset = pd.read_csv ('app/Dataset.csv')
    

    Assuming your Dataset.csv is in my/user/dir/Dataset.csv

    From CLI:

    docker run -v my/user/dir:app my-python-container
    

    The benefit of the latter solution is you can then continue to edit the file 'Dataset.csv' on your host and the file will reflect changes made by you OR the python process should that occur.