Search code examples
mysqlgitdockercontent-management-systemmodx

How should I dockerize a CMS, such that MySQL works nice with git?


I want to dockerize a MODX application for development and store it in git (again, for development.) There's a solution here, but all the MySQL files are now in binary, plus the database cares about their permissions. I'd like to either

  • put all of mysql's data in a single massive binary file, so I don't have to care about permissions and I can put it in LFS or
  • Somehow export the database to an SQL file on container shutdown and import it on launch, so I can use diffs.

Solution

  • So I've actually implemented a partial solution to your problem (though I'm still learning to use docker, this potential solution encapsulates everything else).

    I use MODx as my CMS of choice, however, theoretically this should work for other CMS'es too.

    In my git workflow, I have a pre-commit hook set to mysqldump the database into a series of SQL files, which when implementing in production represent the inputs into mysql to recreate the entire database.

    Some of the code in the following example is not directly related to the answer, and it is also worth noting that I personally implement a final column within each table of the database that actually separates out different rows of data into different branches of the git repo (because my chosen workflow involves 3 parallel branches, each for local development, staging, and production respectively).

    The sample code below is a pre-commit hook of one of my older projects, which I no longer use, but the same code is still relatively in use (with a few exceptions unrelated to this post). It goes far beyond the question, because it is verbatim from my repo, but perhaps it might spark some inspiration.

    In this example you'll also see references to "lists" which are text files containing the various individual repos and some settings, which are imploded into bash associative arrays, which requires bash 4.0 or higher. There is also a reference to 'mysql-defaults' which is a text file that contains my database credentials so the script can run without interruption.

    #!/bin/bash
    
    # Set repository and script variables
    REPO_NAME='MODX';REPO_FOLDER='modx';REPO_KEY='modx';REPO_TYPE='MODX';
    
    declare -a REPO_PREFIX_COUNTS=();
    
    MODULES_STRING=$(cat /Users/cjholowatyj/Dev/modules-list | tr "\n" " ");MODULES_ARRAY=(${MODULES_STRING});# echo ${MODULES_ARRAY[1]};
    PROJECTS_STRING=$(cat /Users/cjholowatyj/Dev/projects-list | tr "\n" " ");PROJECTS_ARRAY=(${PROJECTS_STRING});# echo ${PROJECTS_ARRAY[1]};
    THEMES_STRING=$(cat /Users/cjholowatyj/Dev/themes-list | tr "\n" " ");THEMES_ARRAY=(${THEMES_STRING});# echo ${THEMES_ARRAY[1]};
    
    alias mysql='/Applications/MAMP/Library/bin/mysql --defaults-file=.git/hooks/mysql-defaults';
    alias dump='/Applications/MAMP/Library/bin/mysqldump --defaults-file=.git/hooks/mysql-defaults';
    alias dump-compact='/Applications/MAMP/Library/bin/mysqldump --defaults-file=.git/hooks/mysql-defaults --no-create-info --skip-add-locks --skip-disable-keys --skip-comments --skip-extended-insert --compact';
    shopt -s expand_aliases
    
    # Print status message in terminal console
    /bin/echo "Running ${REPO_NAME} Pre-Commits...";
    
    # Switch to repository directory
    # shellcheck disable=SC2164
    cd "/Users/cjholowatyj/Dev/${REPO_FOLDER}/";
    
    # Fetch database tables dedicated to this repository
    mysql -N information_schema -e "select table_name from tables where table_schema = 'ka_local2019' and table_name like '${REPO_KEY}_%'" | tr '\n' ' ' > sql/${REPO_KEY}_tables.txt;
    tablesExist=$(wc -c "sql/${REPO_KEY}_tables.txt" | awk '{print $1}')
    
    # Reset pack_ sql files
    if [[ -f sql/pack_structure.sql ]]; then rm sql/pack_structure.sql; fi
    if [[ -f sql/pack_data.sql ]]; then rm sql/pack_data.sql; fi
    touch sql/pack_structure.sql
    touch sql/pack_data.sql
    
    dump --add-drop-database --no-create-info --no-data --skip-comments --databases ka_local2019 >> sql/pack_structure.sql
    
    # Process repository tables & data
    if [[ ${tablesExist} -gt 0 ]]; then
      dump --no-data --skip-comments ka_local2019 --tables `cat sql/${REPO_KEY}_tables.txt` >> sql/pack_structure.sql
      dump-compact ka_local2019 --tables `cat sql/${REPO_KEY}_tables.txt` --where="flighter_key IS NULL" >> sql/pack_data.sql
      sed -i "" "s/AUTO_INCREMENT=[0-9]+[ ]//g" sql/pack_structure.sql
    fi
    dump-compact ka_local2019 --where="flighter_key='${REPO_KEY}'" >> sql/pack_data.sql
    isLocalHead=$(grep -c cjholowatyj .git/HEAD);
    if [[ ${isLocalHead} = 1 ]]; then
      dump-compact ka_local2019 --where="flighter_key='${REPO_KEY}-local'" >> sql/pack_data.sql
      sed -i "" "s/\.\[${REPO_KEY}-local]//g" sql/pack_data.sql
    fi
    isDevelopHead=$(grep -c develop .git/HEAD);
    if [[ ${isDevelopHead} = 1 ]]; then
      dump-compact ka_local2019 --where="flighter_key='${REPO_KEY}-develop'" >> sql/pack_data.sql
      sed -i "" "s/\.\[${REPO_KEY}-develop]//g" sql/pack_data.sql
      sed -i "" "s/ka_local2019/ka_dev2019/g" sql/pack_structure.sql
      sed -i "" "s/ka_local2019/ka_dev2019/g" sql/pack_structure.sql
    fi
    isReleaseHead=$(grep -c release .git/HEAD);
    if [[ ${isReleaseHead} = 1 ]]; then
      dump-compact ka_local2019 --where="flighter_key='${REPO_KEY}-release'" >> sql/pack_data.sql
      sed -i "" "s/\.\[${REPO_KEY}-release]//g" sql/pack_data.sql
      sed -i "" "s/ka_local2019/ka_rel2019/g" sql/pack_structure.sql
      sed -i "" "s/ka_local2019/ka_rel2019/g" sql/pack_structure.sql
    fi
    
    # Create master structure sql file for this repository (and delete it once again if it is empty)
    awk '/./ { e=0 } /^$/ { e += 1 } e <= 1' < sql/pack_structure.sql > sql/${REPO_KEY}_structure.sql
    structureExists=$(wc -c "sql/${REPO_KEY}_structure.sql" | awk '{print $1}')
    if [[ ${structureExists} -eq 0 ]]; then rm sql/${REPO_KEY}_structure.sql; fi
    
    # Create master sql data file in case the entire database needs to be rebuilt from scratch
    awk '/./ { e=0 } /^$/ { e += 1 } e <= 1' < sql/pack_data.sql > sql/all_${REPO_KEY}_data.sql
    
    # Commit global repository sql files
    git add sql/all_${REPO_KEY}_data.sql
    if [[ ${structureExists} -gt 0 ]]; then git add sql/${REPO_KEY}_structure.sql; fi
    
    # Deleting any existing sql files to recreate them fresh below
    if [[ -f sql/create_modx_data.sql ]]; then rm sql/create_modx_data.sql; fi
    if [[ -f sql/create_flighter_data.sql ]]; then rm sql/create_flighter_data.sql; fi
    for i in "${MODULES_ARRAY[@]}"
    do
        if [[ -f sql/create_${i}_data.sql ]]; then rm sql/create_${i}_data.sql; fi
    done
    if [[ -f sql/create_${REPO_KEY}_data.sql ]]; then rm sql/create_${REPO_KEY}_data.sql; fi
    
    # Parse global repository data and separate out data filed by table prefix
    lastPrefix='';
    lastTable='';
    while IFS= read -r iLine;
    do
        thisLine="${iLine}";
        thisPrefix=$(echo ${thisLine} | grep -oEi '^INSERT INTO `([0-9a-zA-Z]+)_' | cut -d ' ' -f 3 | cut -d '`' -f 2 | cut -d '_' -f 1);
        thisTable=$(echo ${thisLine} | grep -oEi '^INSERT INTO `([0-9a-zA-Z_]+)`' | cut -d ' ' -f 3 | cut -d '`' -f 2);
        if [[ $(echo -n ${thisPrefix} | wc -m) -gt 0 ]]; then
             if [[ -n "${REPO_PREFIX_COUNTS[$thisPrefix]}" ]]; then
                  if [[ ${REPO_PREFIX_COUNTS[$thisPrefix]} -lt 1  ]]; then
                       if [[ -f sql/create_${thisPrefix}_data.sql ]]; then rm sql/create_${thisPrefix}_data.sql; fi
                       touch "sql/create_${thisPrefix}_data.sql";
                  fi
                  REPO_PREFIX_COUNTS[$thisPrefix]=0;
              fi
             REPO_PREFIX_COUNTS[$thisPrefix]+=1;
             echo "${thisLine}" >> sql/create_${thisPrefix}_data.sql;
             if [[ ${thisTable} != ${lastTable} ]]; then
                 if [[ ${thisPrefix} != ${lastPrefix} ]]; then
                     if [[ -f sql/delete_${thisPrefix}_data.sql ]]; then rm sql/delete_${thisPrefix}_data.sql; fi
                     touch "sql/delete_${thisPrefix}_data.sql";
                 fi
                 if [[ $(echo -n ${thisTable} | wc -m) -gt 0 ]]; then
                     echo "DELETE FROM \`${thisTable}\` WHERE \`flighter_key\` LIKE '${REPO_KEY}%';" >> sql/delete_${thisPrefix}_data.sql
                 fi
             fi
             # Add previous prefix sql file to git if lastPrefix isn't ''
             if [[ $(echo -n ${lastPrefix} | wc -m) -gt 0 ]]; then
                 git add "sql/create_${lastPrefix}_data.sql";
                 git add "sql/delete_${lastPrefix}_data.sql";
             fi
        fi
        lastPrefix=${thisPrefix};
        lastTable=${thisTable};
    done < sql/all_${REPO_KEY}_data.sql
    # Add previous prefix sql file to git for the final lastPrefix value
    git add "sql/create_${lastPrefix}_data.sql";
    git add "sql/delete_${lastPrefix}_data.sql";
    
    # Clean up unused files
    rm "sql/${REPO_KEY}_tables.txt";
    rm "sql/pack_data.sql";
    rm "sql/pack_structure.sql";
    
    git add sql/;
    
    

    A couple nuances worth noting are... (1) My code strips out all the auto_increment cursors from each table because they were creating a lot of unnecessary changes in the sql files and that ended up making commits more complicated. (2) My code also strips out the database name itself, because on the production server, I will be specifying the database that will be used and it is not the same database name as the one I use for local development and we don't want the data going to the wrong place. (3) This workflow also separates database structure and the data itself in the files I commit to git, which may be a source of confusion if you didn't already pick up on that.

    On the flip side, when implementing the project on a server, I also have code which intuitively iterates through all the *.sql files and imports them into my database one at a time. I won't share the exact code for security reasons, but the general gist is... mysql mysql_database < database_file.sql