Python project file structure with relative imports, or how to properly structure a project like described

I have been trying to solve bioinformatics problems from the website, and I am now facing some trouble when I want to perform some simple testing.

My project is structured in the following way:

├─ bioinformatics_stronghold/
│  ├─ data/
│  ├─ modules/
│  │  ├─
│  │  ├─
│  ├─
│  ├─
├─ tests/
│  ├─
│  ├─
│  ├─

The goal here is to be able to test all the individual files (, etc.) in the bioinformatics stronghold folder. The problem that I have encountered however is:

  • Running works as expected.
  • Running does NOT work.

See all the affected file below:

import pytest
from bioinformatics_stronghold.IEV import calculate_offspring

def test_calculate_offspring():
    assert calculate_offspring([1, 0, 0, 1, 0, 1]) == 3.5
    assert calculate_offspring([1, 1, 1, 1, 1, 1]) == 8.5

def calculate_offspring(input_list:list[int]) -> float:
    """This function will take an input list of non-negative integers no larger than 20,000. The function will then calculate the expected offspring showing the dominant phenotype.

        input_list (list): Input a list of integers representing the number of couples

        float: The expected number of offspring
    input_list = input_list
    expected_dominant_offspring = 0
    # For all cases, it is assumed that all couples will have exactly 2 calculate_offspring
    for index, count in enumerate(input_list):
        print("Index:", index, "   ", "Num couples:", count)
        # Case AA-AA, all offspring will be dominant phenotye
        if index == 0:
            expected_dominant_offspring += count * 2 * 1 
        # Case AA-Aa, all offspring will be dominant phenotype
        elif index == 1:
            expected_dominant_offspring += count * 2 * 1
        # Case AA-aa, all offspring will be dominant phenotype
        elif index == 2:
            expected_dominant_offspring += count * 2 * 1

        # Case Aa-Aa, 3 out of 4 offspring will be dominant genotype
        elif index == 3:
            expected_dominant_offspring += count * (2 * (3/4))

        # Case Aa-aa, 1 out of 4 offspring will be dominant phenotype
        elif index == 4:
            expected_dominant_offspring += count * (2 * (2/4))

        # Case aa-aa, no offspring will be dominant phenotype
        elif index == 5:
            expected_dominant_offspring += count * 2 * 0

    return expected_dominant_offspring

These two works just fine.

Now to the problematic files...

import pytest
from bioinformatics_stronghold.CONS import find_consensus_sequence

def test_find_consensus_sequence():
    assert find_consensus_sequence("tests\\data\\CONS_sample_data.fasta") == [[5, 1, 0, 0, 5, 5, 0, 0], [0, 0, 1, 4, 2, 0, 6, 1], [1, 1, 6, 3, 0, 1, 0, 0], [1, 5, 0, 0, 0, 1, 1, 6]], ['A', 'T', 'G', 'C', 'A', 'A', 'C', 'T']

Adding the line from bioinformatics_stronghold.modules.read_fasta import read_fasta_file just gives me an import error ModuleNotFound. Adding . or .. in from results in ImportError: attempted relative import with no known parent package.

from modules.read_fasta import read_fasta_file

def find_consensus_sequence(fasta_location):
    This function will read a given fasta file and extract all sequences using the module.
    The function will then create a profile matrix as well as a consensus sequence, both as lists.

        fasta_location (str): The location of the fasta file as a string.

        profile_matrix (list[lists]): The profile matrix of all given sequences. 
        consensus_sequence (list): The consensus sequences of all given sequences.
    fasta_content = read_fasta_file(fasta_location, debug=False)
    # Create a matrix with all sequences
    sequence_matrix = []
    for item in fasta_content:
    # print(sequence_matrix)
    # Create the empty profile matrix
    # [A, C, G, T]
    profile_matrix = [[0]*len(sequence_matrix[0]), [0]*len(sequence_matrix[0]), [0]*len(sequence_matrix[0]), [0]*len(sequence_matrix[0])]
    # print(profile_matrix)
    # Add to the nucleotide count depending on the sequence
    for index, sublist in enumerate(sequence_matrix):
        for index, nucleotide in enumerate(sublist):
            if nucleotide == "A":
                profile_matrix[0][index] += 1
            if nucleotide == "C":
                profile_matrix[1][index] += 1
            if nucleotide == "G":
                profile_matrix[2][index] += 1
            if nucleotide == "T":
                profile_matrix[3][index] += 1
    # print(profile_matrix)

    consensus_sequence = []
    # NOTE: Ugly solution, but it seems to work. Quite ineffective, but not sure how to improve at this time.
    # For each position in the sequence, check which "letter" is larger than all other
    for index in range(len(profile_matrix[0])):
        if profile_matrix[0][index] > profile_matrix[1][index] and profile_matrix[0][index] > profile_matrix[2][index] and          profile_matrix[0][index] > profile_matrix[3][index]:
        elif profile_matrix[1][index] > profile_matrix[0][index] and profile_matrix[1][index] > profile_matrix[2][index] and          profile_matrix[1][index] > profile_matrix[3][index]:
        elif profile_matrix[2][index] > profile_matrix[0][index] and profile_matrix[2][index] > profile_matrix[1][index] and          profile_matrix[2][index] > profile_matrix[3][index]:
        elif profile_matrix[3][index] > profile_matrix[0][index] and profile_matrix[3][index] > profile_matrix[1][index] and          profile_matrix[3][index] > profile_matrix[2][index]:
    # print(consensus_sequence)
    return profile_matrix, consensus_sequence just wont work. The problem seems to be that the modules folder cannot be found.

Adding an to the bioinformatics_stronghold folder does not solve this problem.

If I move the tests folder into the bioinformatics_stronghold folder, pytest just breaks with no apparent error messages and I cannot setup testing in VSCodium.

My question then is:

  • Why can pytest not import the read_fasta function in the modules?
  • How should I arrange a project like this to allow me to have several of these small scripts while still being able to test them.


  • I think changing this should do it:

    from .modules.read_fasta import read_fasta_file

    If that doesn't do it, there's probably some sort of import issue in, and I'd encourage you to comment here with the full error traceback you're seeing, rather than just the error message.

    Note: Your naming conventions do not follow PEP8 guidelines.

    Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability.

    Edit: Here's an example on how to structure your project and make it callable.

    ├─ bioinformatics_stronghold/
    │  ├─ data/
    │  ├─ modules/
    │  │  ├─
    │  │  ├─
    │  ├─
    │  ├─
    │  ├─
    ├─ tests/
    │  ├─
    │  ├─
    │  ├─

    from .modules import read_fasta

    To execute this, you just type python -m bioinformatics_stronghold in the terminal. With a single main entrypoint, you can do all sorts of things, like accepting user input, adding an argparse interface, etc.

    More reading: