Blue/Green Deployment for RDS.

We had a Blue/Green deployment process. But migrations were stuck in RDS occasionally. That’s why we had a downtime. I’ve done my investigation regarding the possibility of implementing Zero Downtime Deployment with RDS.
Surprisingly enough AWS does not offer out of the box solution for such situation. So the only one approach I’ve come up with was using AWS DMS (Database Migration Service).
The idea is that we are creating the second RDS DB while deployment process is taking place, cloning the first RDS DB into the second one and doing ongoing replication. DMS is doing continuous replication via transactions tracking. So the possibility of any data loss is negligible.
From the cost point of view it’s pretty cheap too as it requires EC2 instance for the time of deployment process in order to host DMS instance and additional RDS instance for data to be copied in.
Steps to implement:
- Create new RDS.
- Do migrations in the newly created RDS.
- Clone the existing RDS into the new RDS.
- Check that the new RDS is working and processing queries.
- Connect the new deployment to the new RDS.
- Roll new deployment to prod.
- Shut down the old RDS and DMS instance used for replication.
For the PostgreSQL database with 240GB of data the whole process took 4 minutes.
Problems with this approach.
This solution wasn’t implemented because we had a couple of tables with LOBs (Large OBjects). Our CTO decided to not mess with them and just be more careful with migrations.
What’s the problem with LOBs?
It’s just taking hours to clone them. DMS even has several strategies to deal with LOBs but neither of them met our needs.
Apart from aforementioned problem this approach totally deserves your attention if you are interested in zero-downtime with RDS.