Changing your document structure without downtime or risk

Often, our document structure (the schema of our data) is forgotten about. It’s a hassle to change, so we don’t bother updating it. This, as we all know by now, is bad. We should be treating our data just like every other part of our system, and refactoring it regularly. This is the only way we can keep future updates simple and painless.

When the document structure changes we need to do two things; change the existing data, and change the code which uses the data. If we try to use the system when we have done one but not the other, we have a bad time. This is why we sometimes use a maintenance mode. Or, if we’re feeling reckless adventurous, we just do it on live and hope nobody clicks anything before we’re done.

giphy

But there’s a better way. And as with all great refactoring methods, it’s safe and less exciting. Here are the steps:

For this example lets assume we want to move from this:

{
    id,
    username,
    age,
    location
}

To this:

{
    id,
    username,
    bio: {
        age,
        location
    }
}

Step 1: Do both

The first thing we change is the code which generates or edits the data. We update it to save the data to include both the old document structure, and the new document structure. We will end up with data which looks like this:

{
    id,
    username,
    age,
    location,
    bio: {
        age,
        location
    }
}

Step 2: Update everyone

Write an update script (ideally re-using the well tested code from Step 1) which takes each document in your store and saves the data in the new structure and leaving the old structure in place.

At this point we know all the data will always be in both the new, and old, structures.

Step 3: Use the new structure

We can now update the code which reads the data to read from the new document structure. What before was a risky update, now is safe to do so. Worse case; we can roll back and all our data is still fine.

Step 4: Remove the old

Once we’re using the new structure, and we’re happy it’s all working, we can safely remove the old stuff; start with the code which updates the data in both locations, then update the data to remove the duplication.