Do your routine backend code releases require downtime?

koreth · 2 years ago

Do your routine backend code releases require downtime?

@none · 2 years ago

Zero downtime here. We use ECS with services sitting behind ALBs. At deploy time we spin up a new task sets, wait for the new tasks to become healthy and then direct a small amount of traffic towards the new set for evaluation. If no alarms go off due to a degradation in metrics, the amount of traffic is increased until the old version has 0 traffic. After a period of time to allow for instant rollbacks if necessary, the old version is shut down.

What’s more interesting to me is ways to accomplish the same thing for things that aren’t just web services where it’s trivial to direct traffic to one version or the other. For example, if you have workers consuming a queue, I haven’t found a way to gradually increase the amount of work available to the new version without implementing custom application logic (I work on a platform with thousands of services, so I’m looking for ways to do it on an infrastructure level rather than each service implementing something).