Usage
Deployment to AWS
Follow the steps to build and deploy the application to AWS. For information about the architecture deployed please see Architecture.
Quick Start
This list outlines the basic steps for deployment. For more details please see the following sections.
Install prerequisites
Check the config JSONs (parameters and state) and edit the values as desired
Run
make deployto deploy the stacks to AWSRun
make load.database release=<version>to load Neo4jRun
make get.neo4jto get the URL for the Neo4j browser
Prerequisites
Please refer to the respective documentation for specific installation instructions.
GNU Make 3.81
coreutils (optional for macOS)
AWS CLI
SAM CLI
Docker
jq
Environment
AWS Credentials
Valid AWS credentials must be available to AWS CLI and SAM CLI. The easiest way to do this is with the following steps.
Run
aws configureand follow the prompts, or copy/paste them into~/.aws/credentialsExport the
AWS_PROFILEvariable for the chosen profile to the shell environment.
export AWS_PROFILE=default
For more information visit the documentation page: Configuration and credential file settings
Shell Variables
These variables must be present in the shell environment before running
Make. The best way to set these variables is with a .env file
following these steps.
Create a
.envfile in the project root and add the values.
STAGE=<dev or prod>
APP_NAME=gfe-db
AWS_REGION=<AWS region>
NEO4J_USERNAME=<secret>
NEO4J_PASSWORD=<secret>
GITHUB_PERSONAL_ACCESS_TOKEN=<secret>
Source the variables to the environment.
# Export .env file variables
set -a && source .env && set +a
# Check that the variables were set
env
Important
Always use a .env file, AWS SSM Parameter Store, or
Secrets Manager for sensitive variables like credentials and API keys.
Never hard-code them, including when developing. AWS will quarantine an
account if any credentials get accidentally exposed and this will cause
problems. Make sure to update .gitignore to avoid pushing sensitive
data to public repositories.
Makefile Usage
Once an AWS profile is configured and environment variables are
exported, the application can be deployed using make. For more make
commands please see Makefile Command Reference.
make deploy
It is also possible to deploy or update individual services.
# Deploy/update only the infrastructure service
make deploy.infrastructure
# Deploy/update only the database service
make deploy.database
# Deploy/update only the pipeline service
make deploy.pipeline
Note
It is recommended to only deploy from the project root. This is because common parameters are passed from the root Makefile to nested Makefiles. If a stack has not been changed, the deployment script will continue until it reaches a stack with changes and deploy that.
Deploying Configuration Files
To deploy updates to state and/or pipeline input parameters, run the command.
make deploy.config
Clean Up
To tear down resources run the command. You will need to manually delete the data in the S3 bucket first.
make delete
Use the following commands to tear down individual services.
# Delete only the infrastructure service.
make delete.infrastructure
# Delete only the database service
make delete.database
# Delete only the pipeline service
make delete.pipeline
Warning
Deleting and re-deploying a layer may cause the parameters shared by other layers to go out of date. To avoid this the recommendation is to deploy sequentially and teardown in reverse sequence. For example, tearing down and re-deploying the database stack may affect parameters shared with the pipeline stack, causing the pipeline stack to fail. The solution would be to also be tear down and re-deploy the pipeline stack.
Loading Releases
Input Parameters
Base input parameters (excluding the releases value) are passed to
the Step Functions State Machine and determine it’s behavior during
build. These are stored in a configuration file in S3 (see the Data Pipeline
configuration reference) but can be overridden. The releases value
is appended at runtime by the trigger Lambda when it finds a new release
in the source repository.
Variable |
Example Value |
Type |
Description |
|---|---|---|---|
LIMIT |
100 |
string |
Number of alleles to build. Leave blank (“”) to build all alleles. |
ALIGN |
False |
string |
Include or exclude alignments in the build |
KIR |
False |
string |
Include or exclude KIR data alignments in the build |
MEM_PROFILE |
False |
string |
Enable memory profiling (for catching memory leaks during build) |
Pipeline Execution
The data pipeline can also be invoked from the command line.
make load.database releases=<version> align=<boolean> kir=<boolean> limit=<int>
Retrieving logs, data and parameters
See the reference section Retrieve logs, data and configuration values for useful commands.
Developing Locally
Note
This information is incomplete but will be updated soon.
Creating a Python Virtual Environment
When developing locally, you will need to create an individual virtual
environment to run scripts in the jobs or functions directories,
since they require different dependencies.
cd <specific job or function directory>
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
To use the virtual environment inside a Jupyter Notebook, first activate the virtual environment, then create a kernel for it.
# Install ipykernal
pip install ipykernel python-dotenv
# Add the kernel
python3 -m ipykernel install --user --name=<environment name>
# Remove the kernel
jupyter kernelspec uninstall <environment name>