So you want to use Postgres in Beehive?

Andrew Dai / May 17, 2018

tl;dr: pg_initalaser is a short and sweet SQL script for creating a user and database on a Postgres server packaged as a Docker container made for easy deployment as an Init Container in Kubernetes to allow for the automatic creation of deployment-specific databases.

AKA arguably the world’s simplest use case for Init Containers in k8s

Background

But first, let’s talk about MongoDB1.

A reasoned discussion about the pros and cons of certain technologies

For the most part, performance issues have never rarely been an issue with our deployed apps. The choice of database or other technologies has mostly been driven by what people are familiar with or want to use. Most HackGT applications use MongoDB, taking advantage of its simple set up in development environments and natural and easy integration with Node.js.

MongoDB allows for the easy (automatic) creation of new databases2. Whenever a connection is made to a MongoDB host, a new database is automatically created if the requested database does not already exist.

When applications in Beehive request a MongoDB resource, a connection string URI3 is made available as an environment variable for each instance. These connection strings specify different databases for each deployment to prevent deployments from interfering with each other. This is allows us to deploy the same application multiple times for simultaneous events (hosting registration for 2 events at a time) and keeping testing auto-deployments made by pull requests separate from interfering with production data and vice versa.

Accurate representation of our deployment system

Because MongoDB will automatically create a database if one does not already exist during a connection request, no additional steps are required. Beehive merely generates a unique-ish database name for each deployment, appends it to the host portion of a MongoDB connection string and then makes the URI available in the environment variables in production4.

The Problem

This does not work for Postgres.

Due to preference and learning opportunities and other forgotten, arbitrary reasons, we decided to use Postgres for SponsorshipPortal. But the same magic connection string generation logic in Beehive does not work for Postgres. This is because Postgres requires that databases already exist prior to establishing a connection. Connections made to non-existent databases on a given host will fail.

We wanted to keep the database-level separation at the infrastructure level rather than use schema-level separation because that would require additional (potentially buggy) application-level logic. This strategy would also keep our support for all two(!) datastores more consistent for the application developer.

The Solution

Introducing pg_initialaser.

We have a relatively simple problem to solve. A database needs to be created in Postgres before the application can be initialized and created. This must happen or else the container will enter a crash/backoff loop which routinely occassionally floods our #devops-onfire Slack channel.

Beehive uses Kubernetes (k8s) which formally adopted Init Containers in version 1.6. pg_initalaser is a very simple Init Container which creates a specified database (and a user) in Postgres before the app’s container is run. If a database or user already exists the container will fail silently, stop and allow the application to run.

When an application requests Postgres from Beehive, pg_initalaser is added as an Init Container in the app’s deployment.yaml in Kubernetes in order to create the database needed. Now, connection string logic5 analogous to our support for MongoDB will work for Postgres as well. The configuration for application developers remains the same but Beehive now supports Postgres!

Check out pg_initialaser and Beehive on Github, follow HackGT on Facebook, Twitter, Instagram, checkout our other projects, and stay tuned for info about HackGT 5 registration!


Author’s Note: Yes, the logic for pg_initalaser is only 2 lines long. Yes, much more time was spent learning and implementing the (straightforward) Init Container logic. Yes, even more time was spent coming up with a “clever” name for this project.

  1. MongoDB is web scale. See https://www.youtube.com/watch?v=b2F-DItXtZs 

  2. Here, a database is not the same as a database instance/server/host. A single MongoDB host can have multiple databases. Collections and data are associated with a database. The same applies for Postgres. 

  3. MongoDB connection string/URI documentation: https://docs.mongodb.com/manual/reference/connection-string/ 

  4. https://github.com/HackGT/biodomes/blob/218ac134a09164e87f33ab08443762e85b10bd7c/.travis.d/templates/deployment.yaml.erb#L24 

  5. https://github.com/HackGT/biodomes/blob/c277475436b5e82610cbdc31959bf32f289c1c2a/.travis.d/templates/deployment.yaml.erb#L43