Is it possible to "manage up" on customer expectations? Or am I doomed to unreasonable SLAs? (Database as a Service Company)

th3raid0r · edit-2 9 months ago

@LordCrom · 9 months ago

Ive managed an SRE team. I see 3 issues in your story.

Upper MGMT and sales need to establish SLA and notification policy. Does the customer want to be notified of a high CPU over 5 min, or do they want notification of a slow down. If it’s the latter, then you are alerting on the wrong metric. Someone needs to set expectations. If the customer wants and pays for any anomaly, well then it’s your job to report it.
SRE should make sense out of metrics. If all you do is stare at dashboards, then you are OPS, not SRE. SRE should setup and gather metrics and present them in ways meaningful to Dev and ops.
If you are SRE, then man up and tell your manager what SREs should be doing and show some kind of idea or plan to push your monitori g forward. Be the Lead SRE if no one else is doing it.

th3raid0r · edit-2 9 months ago

They want to be notified of anything that could potentially slow down their system. So any anomaly. The catch being is that they constantly change patterns because they introduce new workloads weekly - which wouldn’t be a problem if they could better communicate their forecasts. And that’s just one of a few dozen customers - again all with unique cluster configuration and needs.
Yeah, it sucks. The first year was pretty great and we had a fully integrated and unified managed services team where we were getting some great automation done. Then they split the team in half in order to focus on a different flavor of our product (with an entirely new backend) and left folks who were newer (myself included) with maintaining the old product. We were even told that we should be doing minimal maintenance on the thing as the new product would be the new norm. Then once upper management remembered how contracts work, they decided we needed to support 3 new platforms without growing the team. All while onboarding new customers and growing the environment count. We’re now in operational overload after some turnover that was backfilled with offshore support that has a very minimal presence.
I have tried championing this, but I don’t expect an ableist, masculinity shaming person like you to understand a call for social pointers on how to “manage up”.

“Man Up” - good lord, way to be an ass.