Backing up a MongoDB Cluster via EBS Volumes
Backing up and restoring a MongoDB Cluster via EBS Volumes
Backing up and restoring MongoDB clusters at large scale is challenging. There are hosting providers like MongoDB Atlas which will handle it for you, but it can come with a price tag that many (like my organization) find prohibitive. This article will assume that you find yourself in that camp as well.
This is not a definitive guide, nor am I a leading expert on the subject - the methods listed herein are what has worked for my organization. Note that this assumes that your MongoDB servers are running in AWS EC2.
We are using i2.4xlarge instances, which have:
- 16 Cores -
/proc/cpuinfoshows them to be a Xeon E5-2670v2
- 120gb of memory apiece
- 3.2tb of Ephemeral SSD storage. These are part of a Logical Volume Group (more on that later!)
- 2000Mbps ethernet link
- A smaller instance that functions as a backup controller. He orchestrates backups of the Production cluster and restores of that data into our Test and Dev environments every week.
Our cluster configuration:
- Two groups of three servers apiece. Two servers per group are active, the third acts as a hidden secondary
- Each group contains multiple replica sets, with the primaries distributed evenly between the two instances
- Our environment is currently a mix of WiredTiger and MMAPv1 storage engines. I won't go into this oddity in detail - suffice it to say that while this is not a typical arrangement, it is entirely possible for the two to coexist even within the same replicaset.
- Our hidden secondaries, which are the hosts from which we take a data snapshot and backup, are both 100% WiredTiger. Roughly 10% of the Logical Volume Group on which our data resides are reserved for snapshotting purposes.
- All databases are sharded
- MongoS processes are run on any other EC2 resource that needs access to data from the cluster. MongoS are given a connection string containing the DNS addresses of all three config servers, which keep track of where sharded data chunks are stored across the cluster.
- Currently on MongoDB 3.0.12
Notes about Logical Volume Manager
We leverage LVM to ensure a consistent point-in-time snapshot and backup without stopping writes to the cluster. In order to do this, we leave 10% of the disk space in our volume group free - this space is utilized by the snapshot.
LVM snapshots only record data that has changed since the snapshot is taken - they start out very small, then grow as data is written to disk, removed from disk, or moved on disk. Therefore, the amount of space you need to leave free in your volume group needs to be large enough to accomodate the rate of change present in your Mongo cluster, and will be affected by the speed at which your backup occurs. If the backup takes too long and the snapshot runs out of free space to use in your volume group, LVM simply "drops" the snapshot - and obviously at that point, your backup will fail. Every cluster and workload is different, so I can't tell you what that might be for your environment.
10% has proven to be enough for us - we typically use well under 25% of that free space in the time it takes for our backup to finish.
We use two forms of backup. The first method, which won't be discussed in much detail in this article, leverages the mongodump utility. We use this to dump and restore every individual database in our cluster. This is useful if someone does something questionable with Production data and we need to restore it from backup without impacting any other database. We have databases ranging from a few megabytes in size to hundreds of gigabytes. We utilize Python, the mongodump utility, celery, RabbitMQ and S3 to achieve this.
The other used to backup and restore our entire cluster in one go - this is what we'd use in the event of a total disaster, like our primary EC2 AZ failing. This method utilizes EBS volumes and some custom python code.
Essentially, the process is:
- Stop shard balancing across the cluster. This will finish any current migrating chunks and then leave the data where it is until balancing is re-enabled. This is done via MongoS.
- Issue an fsyncLock against the cluster to temporarily stop writes and ensure a consistent state.
- On both hidden secondaries (both contain hidden members of half our replicasets):
- Take a snapshot using LVM
- issue an fsyncUnlock against the cluster to allow writes to recommence. fsyncLock should only be active for a few moments, snapshots are created instantaneously.
- Mount the snapshot. We'll use the snapshotted data to compress and copy to EBS volumes
- For each MongoD, create a /backup/rs1 type directory.
- For each MongoD, create an EBS volume. We check the size of each MongoD's data directory, then use boto to create an EBS volume of appropriate size. The size on disk will not be 1:1 with the size it will be on EBS as we will compress the data as we copy it to the EBS volume
- Attach and mount the appropriately sized EBS volume to each /backup/rsx directory
- Tar and zip the data into the backup directories for each MongoD
- Once the data has all been compressed and copied to EBS, drop the snapshot and detach the EBS (again, this is doable via boto)
- Snapshot each EBS volume and then create a new volume from each snapshot in another EC2 region to ensure that a meteor strike in Northern Virginia can't destroy our livelihoods.
Restoring from EBS Volumes
Restoring is fairly simple. We'd only use this method in a fairly dire situation, so we don't care about taking down our MongoDs in order to perform it. Procedure is:
- Attach a full set of EBS volumes to each instance. Which volume goes to which instance must be controlled via code, naturally
- Mount all of the EBS volumes
- Stop all MongoD processes
- wipe out all data inside each MongoD's data directory with rm -rf
- Untar each MongoD's data from the mounted EBS directory into the MongoD's data directory
- If deploying the backup onto new servers (ie new DNS addresses for each replicaset config), do that now.
- Drop the replset.minvalid collection. This is used to track replication status, you want it starting at a blank state or you'll get replicasets failing to start and other unpleasantness. It can be safely removed in this context.
- Start all MongoDs! Great success, your data is restored.
This methodology allows us to rapidly backup and restore large amounts of Mongo data on a daily basis. Mongodumping our whole cluster takes roughly 18 hours, and restoring it takes even longer - fine for doing a single database, very bad in a total disaster scenario. With this method, we can perform a complete backup in ~90 minutes and do a complete restore in roughly the same amount of time. We validate these backups weekly by restoring a complete copy of our Production data into our Test and Development environments. These have different DNS addresses, which necessitates automatic reconfiguration of each replicaset by issuing commands against each individual MongoD.