Search code examples
node.jsamazon-web-servicesdockeraws-lambdapdf2htmlex

Install Linux package in Node.js image for AWS Lambda


I am trying to build a container image for a Node.js Lambda function. My base image is like this:

FROM public.ecr.aws/lambda/nodejs:20

COPY index.js ${LAMBDA_TASK_ROOT}

CMD [ "index.handler" ]

However, my Node.js function also uses pdf2htmlEX package. One way to install it is with apt-get. Running apt-get in the above dockerfile will return an error "command not found". Understandable, because apt-get is not available in the Node.js image from AWS.

Maybe that's not be the way to do it. Ultimately, how do I get a Node.js Lambda function to execute a Linux package (pdf2htmlEX in this case)?


Update 1: After testing out various options and combinations, I created an alternative base image in order to install pdf2htmlEX, node.js and npm :

ARG FUNCTION_DIR="/function"
FROM ubuntu:18.04
ARG FUNCTION_DIR
ENV NODE_VERSION=16.13.0
RUN apt-get update
COPY ./pdf2htmlEX.deb /tmp
# Install pdf2htmlEX and node.js and related packages
RUN apt-get install -y /tmp/pdf2htmlEX.deb curl cmake autoconf libtool
RUN curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
ENV NVM_DIR=/root/.nvm
RUN . "$NVM_DIR/nvm.sh" && nvm install ${NODE_VERSION}
RUN . "$NVM_DIR/nvm.sh" && nvm use v${NODE_VERSION}
RUN . "$NVM_DIR/nvm.sh" && nvm alias default v${NODE_VERSION}
ENV PATH="/root/.nvm/versions/node/v${NODE_VERSION}/bin/:${PATH}"
RUN mkdir -p ${FUNCTION_DIR}
COPY index.js package.json ${FUNCTION_DIR}
WORKDIR ${FUNCTION_DIR}
RUN npm install aws-lambda-ric
CMD [ "index.handler" ]

The build failed when installing aws-lambda-ric, the runtime interface client required when using an alternative base image. The error log is too long to post here, so maybe it's not the right config.


Update 2: Another attempt using node-20:buster:

ARG FUNCTION_DIR="/function"

FROM node:20-buster as build-image

# Include global arg in this stage of the build
ARG FUNCTION_DIR

COPY ./pdf2htmlEX.deb /tmp
# Install build dependencies
RUN apt-get update && \
apt-get install -y \
/tmp/pdf2htmlEX.deb

Got another type of error:

The following packages have unmet dependencies:
pdf2htmlex : Depends: libjpeg-turbo8 but it is not installable
Unable to correct problems, you have held broken packages.

Solution

  • Summary

    • public images from AWS for lambda with nodejs are based on centos or fedora:
      • FROM public.ecr.aws/lambda/nodejs:20
      • FROM amazon/aws-lambda-nodejs:18
    • the tool pdf2htmlEX don't support centos nor fedora
    • If you build your own docker image from ubuntu without following the aws specifications, will throw errors or unexpected behaviors if you deploy it on a real aws account
    • The specs that I discovered are: read only os expect /tmp and aws-lambda-ric is required
    • To build in your localhost being compatible with the real aws servers needs the usage of aws-lambda-runtime-interface-emulator

    Attempts

    • I tried to convert the .deb to .rpm (using alien tool) and then install it on the centos image. After fix almost all the errors, one last error was fatal: Some library about jpg

    Steps

    • Build a base image compatible with real aws servers and local testing
    • At the end, install the pdf2htmlEX tool with pre requisites

    Folder and files

    I tried this image locally and in a real aws account.

    The folder should looks like

    enter image description here

    Dockerfile

    # Get a base image
    FROM public.ecr.aws/ubuntu/ubuntu:22.04
    
    # Set some defaults
    ARG LAMBDA_TASK_ROOT="/app"
    ARG LAMBDA_RUNTIME_DIR="/usr/local/bin"
    ARG PLATFORM="linux/amd64"
    
    RUN groupadd --gid 1000 node; \
        useradd --uid 1000 --gid node --shell /bin/bash --create-home node
    
    # node
    ENV NVM_DIR /usr/local/nvm
    ENV NODE_VERSION v20.13.0
    RUN mkdir -p /usr/local/nvm && apt-get update && echo "y" | apt-get install curl
    RUN curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
    RUN /bin/bash -c "source $NVM_DIR/nvm.sh && nvm install $NODE_VERSION && nvm use --delete-prefix $NODE_VERSION"
    ENV NODE_PATH $NVM_DIR/versions/node/$NODE_VERSION/bin
    ENV PATH $NODE_PATH:$PATH
    
    WORKDIR /app
    
    ## Install aws-lambda-ric
    RUN apt-get update; \
        apt-get install -y \
            g++ \
            make \
            cmake \
            unzip \
            libcurl4-openssl-dev \
            autoconf \
            automake \
            build-essential \
            libtool \
            m4 \
            python3 \
            unzip \
            libssl-dev; \       
        rm -rf /var/lib/apt/lists/*;
    RUN npm install aws-lambda-ric -g
    
    # Copy function code
    COPY ./app/ ${LAMBDA_TASK_ROOT}/
    RUN npm install 
    
    
    # Prevent this warn
    # npm WARN logfile Error: ENOENT: no such file or directory, scandir '/home/sbx_user1051/.npm/_logs'
    # https://stackoverflow.com/a/73394694/3957754
    RUN mkdir -p /tmp/.npm/_logs
    ENV npm_config_cache /tmp/.npm
    
    # (Optional) Add Lambda Runtime Interface Emulator and use a script in the ENTRYPOINT for simpler local runs
    WORKDIR ${LAMBDA_TASK_ROOT}
    ADD "https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie" "/usr/bin/aws-lambda-rie"
    COPY entry.sh /
    RUN chmod 755 "/usr/bin/aws-lambda-rie" "/entry.sh"
    
    ## Install pdf2htmlEX
    
    ENV DEBIAN_FRONTEND=noninteractive
    
    RUN apt-get update; \
        apt-get -y install tzdata libjpeg-turbo8 wget gpg curl xz-utils jq libglib2.0-dev libcairo2-dev
    
    RUN curl -s https://api.github.com/repos/pdf2htmlEX/pdf2htmlEX/releases/latest  | jq -r '.assets[] | select(.name=="pdf2htmlEX-0.18.8.rc1-master-20200630-Ubuntu-bionic-x86_64.deb").browser_download_url'  | wget -qi - -O /tmp/pdf2htmlEX.deb
    
    RUN dpkg -i /tmp/pdf2htmlEX.deb
    RUN pdf2htmlEX -v
    
    
    ENTRYPOINT [ "/entry.sh" ]
    CMD [ "app.handler" ]
    

    entry.sh

    #!/bin/sh
    if [ -z "${AWS_LAMBDA_RUNTIME_API}" ]; then
      exec /usr/bin/aws-lambda-rie npx aws-lambda-ric $1
    else
      exec npx aws-lambda-ric $1
    fi
    
    

    app.js

    const exec = require('util').promisify(require('child_process').exec);
    
    exports.handler = async (event, context) => {
    
        let out = await exec(`pdf2htmlEX -v`).catch(e => e);
    
        console.log("pdf2htmlEX command", JSON.stringify(out));
    
        return {
            statusCode: 200,
            body: {
                code: 200,
                message: "Hell!!"
            }
        }; 
    }
    

    pdf2htmlEX

    As a proof that pdf2htmlEX was installed I printed the pdf2htmlEX -v

    in localhost

    enter image description here

    inside the container

    enter image description here

    real aws account

    enter image description here

    dockerhub

    I publish the image so is ready to use

    https://hub.docker.com/repository/docker/jrichardsz/aws-lambda-nodejs/general