"Microsoft hasn’t really been helpful in trying to track this, either. I’ve sent over logs and information, but they haven’t really followed this up. They seem more interested in closing the case.”
That’s the Microsoft way: ignore the bug report for a month or two then close the case for “inactivity”.
It’s sometimes the issue with relying on metrics and stuff and being purely quantitative. A lot of us of have worked at companies where it’s been like this. To deal with volume they need to rely on numbers to gauge so you tell the workers they’ll be ranked on closed cases.
With these updated routing tables, a lot of people were unable to make calls, as we didn’t have a correct state
You’re relying on windows for critical infrastructure? Are you nuts?
Linux can also die in weird ways…
It’s just that Windows is more prone to some issues.
Indeed, nothing is perfect, but closed source stuff doesn’t provide a lot of recourse. If you have a linux expert in your team, they can investigate and if need be even dig into the code of linux itself to find the core issue. Microsoft doesn’t provide anything even remotely similar.
How many dev teams have a kernel dev on them?
Don’t need one. If you can read C/C++ you can read the kernel code. And in most cases, you won’t have to, as the problem is probably in a component in the distro. Those are written in python, ruby, or bash, which are all much more readable than C/C++.
No such luck on windows
I worked at a small company without a kernel dev and we periodically looked into the code to solve problems. I don’t know how much we upstreamed, but we relied on Linux so it was either the or try to get someone on the mailing list to care.
It’s really not that hard to look through the kernel source, it’s pretty well written and documented. It’s a lot harder to be a kernel developer writing new code, but finding bugs and contributing fixes isn’t that bad.
The US navy ran on windows xp for so long that they paid Microsoft to continue maintaining it after EOL.
“feature”
This is the best summary I could come up with:
A few months ago, an engineer in a data center in Norway encountered some perplexing errors that caused a Windows server to suddenly reset its system clock to 55 days in the future.
“With these updated routing tables, a lot of people were unable to make calls, as we didn’t have a correct state!” the engineer, who asked to be identified only by his first name, Simen, wrote in an email.
Simen had experienced a similar error last August when a machine running Windows Server 2019 reset its clock to January 2023 and then changed it back a short time later.
Windows systems with clocks set to the wrong time can cause disastrous errors when they can’t properly parse timestamps in digital certificates or they execute jobs too early, too late, or out of the prescribed order.
The mechanism, Microsoft engineers wrote, “helped us to break the cyclical dependency between client system time and security keys, including SSL certificates.”
Simen and Ken, who both asked to be identified only by their first names because they weren’t authorized by their employers to speak on the record, soon found that engineers and administrators had been reporting the same time resets since 2016.
The original article contains 701 words, the summary contains 200 words. Saved 71%. I’m a bot and I’m open source!
Ignoring tickets and than closing it for inactivity is how big companies ignore their own fuck-ups.
deleted by creator
If you read the article it’s explained that some SSL implementations put random data in the time field (OpenSSL was given as an example). Microsoft knows about this and so needs a certain number of closely matching timestamps to be confident about the new time to change the system time. However, if you get particularly unlucky with a string of random timestamps that match, you end up with a random time.
Yes, it’s a dog shit implementation to rely 3rd parties to make guarantees about their data that they never agreed to.
Linux and MacOS handle this just fine. Why blame SSL when you’re the one using it wrong?
And most NTP clients already handle this by not changing the time automatically if it would be too much of a jump. Microsoft is trying to fix what’s not broken.
deleted by creator
I’ve read the documentation on that feature, and still don’t get over it. How can anyone with knowledge of computers be so dumb to even consider such an idea, lest implement it?
This feature is just a BIG flag waving “AbUsE mE!”
Fun fact: Apparently M$ laid off their QA team for Windows so if you’re wondering why updates break so much, that’s why.
deleted by creator
Well, now you know why.
And when they laid off their QA team with the testing lab of thousands of unique computers, they replaced it with VMs and AI. Because VMs are a totally good way to troubleshoot very specific bugs. The AI part is used to supposedly figure out when you’re “idle” so what Windows can update.
Imagine needing AI to update a computer lmao
They replaced it with VMs and AI
That… explains a lot.
“The false assumption is that most SSL implementations return the server time,” Simen said. “This was probably true in a Microsoft-only ecosystem back when they implemented it, but at that time [when STS was introduced], OpenSSL was already sending random data instead.”
This is so amazing, NTP is too insecure, so we relied on random data from random servers instead
Companies still using windows are causing problems