Search code examples
ubuntuanacondagoogle-colaboratorybioinformaticsvirtual-environment

How do I create a virtual environment for BiG-SCAPE in Google Colab using Conda?


I would like to use BiG-SCAPE (https://git.wageningenur.nl/medema-group/BiG-SCAPE/-/wikis/home) in Google Colab. How can I set it up and run an example?


Solution

  • You must first have Anaconda installed.

    The following commands will create a virtual environment and install BiG-SCAPE into it:

    %%shell
    eval "$(conda shell.bash hook)" # copy conda command to shell
    
    # Create virtual environment for BiG-SCAPE, then install dependencies, BiG-SCAPE, and databases into it (this will take a while)
    conda create --prefix /usr/local/envs/bigscape python==3.6 -y
    conda install --name bigscape hmmer biopython mafft fasttree networkx numpy scipy scikit-learn=0.19.1 -y
    conda activate bigscape
    
    # Clone BiG-SCAPE from Git and install in virtual environment
    cd /usr/local/envs/bigscape
    git clone https://git.wur.nl/medema-group/BiG-SCAPE.git
    
    # Download Pfam database and check that everything was installed properly
    cd BiG-SCAPE
    mkdir -p databases
    cd databases
    wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam32.0/Pfam-A.hmm.gz && gunzip Pfam-A.hmm.gz 
    hmmpress Pfam-A.hmm
    
    # Check that everything was installed correctly
    cd ..
    python bigscape.py --version
    conda deactivate
    

    The following commands will download an example dataset and run BiG-SCAPE on it:

    %%shell
    eval "$(conda shell.bash hook)" # copy conda command to shell
    
    # Download example dataset
    cd /usr/local/envs/bigscape/BiG-SCAPE
    mkdir -p demo
    cd demo
    wget https://raw.githubusercontent.com/nselem/bigscape-corason/master/scripts/data_bigscape_corason.sh
    chmod a+x data_bigscape_corason.sh
    bash data_bigscape_corason.sh -b
    
    # Run BiG-SCAPE on example dataset
    conda activate bigscape
    
    python /usr/local/envs/bigscape/BiG-SCAPE/bigscape.py \
      --inputdir /usr/local/envs/bigscape/BiG-SCAPE/demo/gbks \
      --outputdir /gdrive/My\ Drive/Github/cluster_identification/demo/output/BiG-SCAPE \
      --pfam_dir /usr/local/envs/bigscape/BiG-SCAPE/databases
    
    conda deactivate