Skip to content

WAL-E on PostgreSQL with AWS S3

I run PostgreSQL on two AWS EC2 instances and have binary streaming replication between them for high availability. Since both VMs are in AWS, it only makes sense to use S3 for archiving and backups. I will assume you already have two working EC2 instances running PostgreSQL with replication and only wish to add WAL-E with S3 into the mix.

Before we get started you will need to make sure you have python pip and python virtualenv installed. I also needed to install zlib but that may have been because I started from a very minimal install.

I will go over the setup once as I have my master and slave configured 100% identical with the exception that the master has recovery.conf named recovery.conf.use to keep it from becoming a slave.

We will need the AWS CLI tools before we get into WAL-E.

# pip install awscli

You will need an AWS account/IAM user that has permissions on the S3 bucket we will be using. If you haven't set an account or an S3 bucket do that now. Both my EC2 instance and S3 bucket are in the US-West region so be sure to adjust that if needed. Now we need to create config and credential files for user postgres:

$ sudo -u postgres -i

postgres ~ $ mkdir ~/.aws

postgres ~ $ echo -e "[default]\nregion = us-west-2\noutput = json" > ~/.aws/config

postgres ~ $ chmod 600 ~/.aws/config

postgres ~ $ echo -e "[default]\naws_secret_access_key = \naws_access_key_id = " > ~/.aws/credentials

postgres ~ $ chmod 600 ~/.aws/config

Now would be a good time to test your awscli install and your credentials:

postgres ~ $ aws s3 ls

If you get an error go back and check your credentials as well as your IAM permissions for the user and bucket.

Next we will install WAL-E:

# pip install wal-e

We need to make a directory to hold the configuration files:

# mkdir -p /etc/wal-e.d/env/

# chown root:postgres /etc/wal-e.d

# chmod 750 /etc/wal-e.d

# chown root:postgres /etc/wal-e.d/env

# chmod 750 /etc/wal-e.d/env

Now we need to create the configuration/credential files:

# echo "" > /etc/wal-e.d/env/AWS_ACCESS_KEY_ID

# echo "us-west-2" > /etc/wal-e.d/env/AWS_REGION

# echo "" > /etc/wal-e.d/env/AWS_SECRET_ACCESS_KEY

# echo "s3://" > /etc/wal-e.d/env/WALE_S3_PREFIX

# chown postgres:postgres /etc/wal-e.d/env/

# chmod 640 /etc/wal-e.d/env/*

Note that the S3 URL prefix must be a lowercase "s3://" or it will fail!

Now is a good place to stop and manually do a basebackup to test the WAL-E install, configuration, and credentials (be sure to adjust the path to match where your database resides on the filesystem):

postgres ~ $ envdir /etc/wal-e.d/env wal-e backup-push /var/lib/postgresql/9.5/main

If everything goes properly it should automatically start a backup, copy the files, and finish. If you encounter any errors be sure to double check your credentials and paths.

Next we will edit our postgresql.conf to use WAL-E for archiving:

archive_command = 'envdir /etc/wal-e.d/env /usr/bin/wal-e wal-push %p'

Then we will edit our recovery.conf to also use WAL-E:

restore_command = 'envdir /etc/wal-e.d/env /usr/bin/wal-e wal-fetch "%f" "%p"'

At this point we can restart postgresql to make use of the new configuration (be sure to adjust this commend if you are using systemd or another init system):

service postgresql-9.5 restart

We also want our basebackups to go to S3 so we'll create a cronjob for that:

postgres ~ $ crontab -e

Then add a line similar to this (be sure to adjust the path to match where your database resides on the filesystem):

0 2  postgres envdir /etc/wal-e.d/env wal-e backup-push /var/lib/postgresql/9.5/main

We should be good to go at this point and you can watch your postgresql log file (postmaster.log for me) to make sure everything is running smoothly on both the master and slave respectively.

If you plan on keeping a lot of archive files but do not need quick access to them you may want to consider setting up an S3 bucket policy that moves things to glacier after a certain number of days. For me I have it configured to move files older than one week into a cheaper storage medium and then into glacier after two weeks.

That's about all there is to it and if you need further reading check out the WAL-E github page and of course Google is your friend when it comes to error messages!

PostgreSQL on a Raspberry Pi 3 64-bit with Binary Streaming Replication

In my home office I run PostgreSQL with streaming binary replication between two servers. I don't need streaming replication in my home office, but it is a good way to learn the system and it's also nice having the ability to switch which server is the master during upgrades. After a hardware failure in my VM server I decided it was really stupid to have both the master and slave on the same physical machine. My database needs are small as I only use mediawiki, zabbix, and a few other minor things so I decided to look at using a Raspberry Pi 3 in 64-bit mode as a database server. I was also curious if PostgreSQL could replicate between two different architectures: aarch64 (arm 64-bit) and amd64 (x86_64) as the endianness is the same and the bitness is as well. I'm always on the lookout for new projects involving Raspberry Pis, Linux, and other related things and thought this might be a fun thing to try. I run Gentoo on all of my Linux machines and I also refuse to use SystemD and instead use OpenRC which means you'll need to adjust a few commands here and there if you use something else!

There is a Gentoo guide that covers installing on the Raspberry Pi 3 in 64-bit mode and is a separate beast from the normal install guide for other Raspberry Pis due to complications with getting the Pi3 into 64-bit mode.

Once you have your Pi3 up and running the first we need is to keyword the version of PostgreSQL we want to use by creating a file in /etc/portage/package.keywords/postgresql:

=app-eselect/eselect-postgresql-1.2.1 ~arm64
=dev-db/postgresql-9.5.5 ~arm64

Next we want to select the use flags we want to incorporate:

dev-db/postgresql nls pam python readline server ssl threads zlib

Gentoo has a great quickstart guide that you should look over before continuing to see if you need any additional settings.

Then install PostgreSQL as you normally would:

emerge -av =dev-db/postgresql-9.5.5

On my setup I have a slightly different datadir than the Gentoo default due to migration(s) from other distros and systems, so I had to edit /etc/conf.d/postgresql and change the line 'DATA_DIR="/var/lib/postgresql/9.3/data' to 'DATA_DIR="/var/lib/postgresql/9.3/main"'. If you are doing a fresh install you will not likely need to change this.

Now we need to initialize the database:

emerge --config dev-db/postgresql:9.5.5

If you need to make changes to the default configs, now is the time to do so. For me I had existing configs to copy over.

For streaming replication I followed the official streaming replication guide and official binary replication guide to get it all up and running on my previous setup.

Finally, we will set postgresql to start at boot and start the service:

rc-update add postgresql-9.5 default

service postgresql-9.5 start

We can check the status on the master with the following psql command:

postgres=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
30872 | 19764 | replication | walreceiver | | | 38556 | 2017-04-12 03:52:50.486721-07 | | streaming | 15C/A80133C0 | 15C/A80133C0 | 15C/A8012E38 | 15C/A8012E38 | 0 | async
(1 row)

That's about all there is to it!

Raspberry Pi 2 as a Zabbix Server

I was running a Zabbix server in a VM with 4G of RAM and 4 CPUs. One day I was looking at resources being utilized and realized the VM was way overkill. This was on my mind for a while and one day I was wanting to migrate several VMs off of one physical server to another and one of those happened to be the Zabbix server and I thought it would be nice to put that on it’s own dedicated machine and not on the same hardware it was being used to monitor. Plus, I’m always on the lookout for new projects involving Raspberry Pis, Linux, and other related things.

My monitoring needs are low: I monitor a total of 30 devices/VMs/machines. I use PostgreSQL as my database of choice which is running on its own VM, this is important for this project as it removes one of the resource consumers leaving us with only needing to run Zabbix and Apache.

At this point in time it no longer makes sense to do a new install using Zabbix 2.x, so I will be using the latest stable Zabbix 3.0.x here. I also run Gentoo on all of my machines and installation will not be covered here because Gentoo has a great wiki covering Gentoo on Raspberry PiGentoo on Raspberry Pi. I also use OpenRC and refuse to use SystemD; However, this guide would be the same regardless of the init system used with the exception of the service and rc commands.

The first step is to keyword Zabbix on arm by creating a file (or editing an existing one) in /etc/portage/package.keywords/zabbix with the following line:

net-analyzer/zabbix **

Next we need to select our use flags by creating a file (or editing an existing one) in /etc/portage/package.use/zabbix with the following line:

net-analyzer/zabbix agent curl frontend ipv6 -java -ldap libxml2 -mysql -odbc openipmi -oracle postgres -proxy server snmp -sqlite ssh ssl -static
app-eselect/eselect-php apache2
media-libs/gd png
dev-lang/php xmlreader gd sysvipc bcmath postgres truetype apache2 xmlwriter sockets

Now we can install Zabbix and Apache along with their dependencies:

emerge -av zabbix apache

This will take a while and for me was aided by setting up distcc and crossdev on another machine.

Once the emerge is finished we need to enable the Zabbix web frontend with webapp-config:

webapp-config -I zabbix 3.0.8

Be sure to replace “3.0.8” with the specific version you emerged.

Next we need to enable PHP in Apache by editing /etc/conf.d/apache2 and adding “-D PHP” to the APACHE2_OPTS.

Now we need to start Apache:

service apache2 start

At this point you should be able to configure Zabbix like normal following the official guide. Once you have finished with your Zabbix configuration you need to enable and start the Zabbix agent and Zabbix server:

rc-update add zabbix-server

rc-update add zabbix-agentd

service zabbix-server start

service zabbix-agentd start

That’s about all there is to the install. As for resource usage, here are some stats from mine:

# free -h
total used free shared buff/cache available
Mem: 925M 54M 453M 10M 417M 846M
Swap: 1.0G 0B 1.0G

top – 09:30:13 up 58 min, 1 user, load average: 0.00, 0.04, 0.05
Tasks: 137 total, 1 running, 136 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.9 us, 1.2 sy, 0.0 ni, 97.7 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 948016 total, 462944 free, 57204 used, 427868 buff/cache
KiB Swap: 1048572 total, 1048572 free, 0 used. 865836 avail Mem

That about covers the basics and should be enough to get you up and running!