I’d love to hear some stories about how you or your organization is using Kubernetes for development! My team is experimenting with using it because our “platform” is getting into the territory of too large to run or manage on a single developer machine. We’ve previously used Docker Compose to enable starting things up locally, but that started getting complicated.

The approach we’re trying now is to have a Helm chart to deploy the entire platform to a k8s namespace unique to each developer and then using Telepresence to connect a developer’s laptop to the cluster and allow them to run specific services they’re working on locally.

This seems to be working well, but now I’m finding myself concerned with resource utilization in the cluster as devs don’t remember to uninstall or scale down their workloads when they’re not active any more, leading to inflation of the cluster size.

Would love to hear some stories from others!

  • @PriorProject
    link
    English
    4
    edit-2
    2 years ago

    Yes people do this kind of thing, as far as I know they all do it in fairly different ways… but what you’re describing sounds reasonable. Yes, it does tend to be expensive. The whole point, as you note, is that the env has grown to be large so you’re hosting a bunch of personal large environments which gets pricey when (not if) people aren’t tidy with them.

    Some strategies I’ve seen people employ to limit the cost implications:

    • Narrow the interface. Don’t give devs direct access to the infra, but rather given them build/tooling that saves some very rich observability data from each run. Think not just metrics/logs, but configurable tracing/debugging as well. This does limit certain debugging techniques by not granting full/unfettered access to the environment for your devs, but it now makes clear when an env is “in use”. Once the CI/build job is complete, the env can be reused or torn down and only the observability data/artifacts need to be retained, which is much cheaper.
    • Use pools of envs rather than personal envs. You still have to solve the problem of knowing when an env is “in use”, and now also have scheduling/reservation challenges that need to be addressed.
    • Or automatically tear down “idle” envs. The definition of “idle” is going to get complex, and your definitely going to tear down an env that someone still wants at some point. But if you establish the precedent that envs gets destroyed by default after some max-lifetime unless renewed, you can encourage people to treat them as ephemeral resources rather than a home away from home.

    None of these approaches are trivial to implement, and all have serious tradeoffs even when done well. But fundamentally, you can’t carry the cavalier attitude of how you treat your laptop as a dev env into the “cloud” (even if it’s a private cloud). Rather, the dev envs need to be immutable and ephemeral by default, those properties need to be enforced by frequent refreshes so people acclimate to the constraints they imply, and you need some kind of way to reserve, schedule, and do idle detection on the dev envs so they can be efficiently shared and reaped. Getting a version of these things that work can be a significant culture shock for eng teams used to extended intermittent debugging sessions and installing random tools on their laptop and having them available forever.

    • epchrisOP
      link
      fedilink
      English
      22 years ago

      Thanks for all of the suggestions!

      Right now our guidance is that each developer is given a namespace and a helm chart to install and the wording is such that developers wouldn’t think of it as an ephemeral resource (ie. people have their helm installation up for months, and periodically upgrade it).

      It would be nice to have user’s do a fresh install each time they “start” working, and have some way to automatically remove helm installations after a time period, but we do have times where it’s nice to have a longer-lived env because you’d working within some accumulated state.

      Maybe there’s something to automatically scaling down workloads on a cadence or after a certain time period, but it would be challenging to figure out the triggers for that.

      • @PriorProject
        link
        English
        12 years ago

        Right now our guidance is that each developer is given a namespace and a helm chart to install and the wording is such that developers wouldn’t think of it as an ephemeral resource (ie. people have their helm installation up for months, and periodically upgrade it).

        Right, the tradeoff here is that to maintain that state you’re paying for envs even when they’re not in use. Extended periods of “accumulated state” are definitely a thing, and you want some escape valve to enable them occasionally. But the way to reduce hosting costs is definitely to make them the exception rather than the rule, which involves adapting workflows to rely more on storing and offline analyzing telemetry rather than interactively debugging everything.

        Maybe there’s something to automatically scaling down workloads on a cadence or after a certain time period, but it would be challenging to figure out the triggers for that.

        A other approach here is using something like EBS so that stateful pods can be stopped and then reattached to persistent disk. But unless you do some kind of deep hibernation you lose memory state, and even if you do that you lose socket and other environmental state. IMO telemetry is a stronger long-term strategy as it can capture this state that hibernation destroys.

      • thelastknowngod
        link
        fedilink
        English
        11 year ago

        You can build a workflow for ephemeral environments with ArgoCD using an applicationset resource with the pull request generator and the CreateNamespace=true sync option.

        If a developer opens a pull request, create a generated namespace based on the branch name and PR number, then deploy their changes to the cluster, in the new namespace, automatically.

        With github, if there is no activity on a PR after X time frame, you can have the PR closed automatically. When it’s closed, Argo will not see it as an open PR anymore so it will automatically destroy the environment it created. If the dev wants to keep it active or reopen, just do normal git updates to the PR…