So if my model says RFK has a 98% probability of winning, then it is no more right or wrong than Silver’s model?
If so, then probability would be useless. But it isn’t useless. Probability is useful because it can make predictions that can be tested against reality.
In 2016, Silver’s model predicted that Clinton would win. Which was wrong. He knew his model was wrong, because he adjusted his model after 2016. Why change something that is working properly?
But for the person above to say Silver got something wrong because a lower probability event happened is a little silly. It’d be like flipping a coin heads side up twice in a row and saying you’ve disproved statistics because heads twice in a row should only happen 1/4 times.
Silver made a prediction. That’s the deliverable. The prediction was wrong.
Nobody is saying that statistical theory was disproved. But it’s impossible to tell whether Silver applied theory correctly, and it doesn’t even matter. When a Boeing airplane loses a door, that doesn’t disprove physics but it does mean that Boeing got something wrong.
Comparing it to Boeing shows you still misunderstand probability. If his model predicts 4 separate elections where each underdog candidate had a 1 in 4 chance of winning. If only 1 of those underdog candidates wins, then the model is likely working. But when that candidate wins everyone will say “but he said it was only a 1 in 4 chance!”. It’s as dumb as people being surprised by rain when it says 25% chance of rain. As long as you only get rain 1/4 of the time with that prediction, then the model is working. Presidential elections are tricky because there are so few of them, they test their models against past data to verify they are working. But it’s just probability, it’s not saying this WILL happen, it’s saying these are the odds at this snapshot in time.
Presidential elections are tricky because there is only one prediction.
Suppose your model says Trump has a 28% chance of winning in 2024, and mine says Trump has a 72% chance of winning in 2024.
There will only be one 2024 election. And suppose Trump loses it.
If that outcome doesn’t tell us anything about the relative strength of our models, then what’s the point of using a model at all? You might as well write a single line of code that spits out “50% Trump”, it is equally useful.
The point of a model is to make a testable prediction. When the TV predicts a 25% chance of rain, that means that it will rain on one fourth of the days that they make such a prediction. It doesn’t have to rain every time.
But Silver only makes a 2016 prediction once, and then he makes a new model for the next election. So he has exactly one chance to get it right.
His model has always been closer state to state, election to election than anyone else’s, which is why people use his models. He is basically using the same model and tweaking it each time, you make it sound like he’s starting over from scratch. When Trump won, none of the prediction models were predicting he would win, but his at least showed a fairly reasonable chance he could. His competitors were forecasting a much more likely Hillary win while he was showing that trump would win basically 3 out of 10 times. In terms of probability that’s not a blowout prediction. His model was working better than competitors. Additionally, he basically predicted the battleground states within a half percentage iirc, that happened to be the difference between a win/loss in some states.
So he has exactly one chance to get it right.
You’re saying it hitting one of those 3 of 10 is “getting it wrong”, that’s the problem with your understanding of probability. By saying that you’re showing that you don’t actually internalize the purpose of a predictive model forecast. It’s not a magic wand, it’s just a predictive tool. That tool is useful if you understand what it’s really saying, instead of extrapolating something it absolutely is not saying. If something says something will happen 3 of 10 times, it happening is not evidence of an issue with the model. A flawless model with ideal inputs can still show a 3 of 10 chance and should hit in 30% of scenarios. Certainly because we have a limited number of elections it’s hard to prove the model, but considering he has come closer than competitors, it certainly seems he knows what he is doing.
First, we need to distinguish Silver’s state-by-state prediction with his “win probability”. The former was pretty unremarkable in 2016, and I think we can agree that like everyone else he incorrectly predicted WI, MI, and PA.
However, his win probability is a different algorithm. It considers alternate scenarios, eg Trump wins Pennsylvania but loses Michigan. It somehow finds the probability of each scenario, and somehow calculates a total probability of winning. This does not correspond to one specific set of states that Silver thinks Trump will win. In 2016, it came up with a 28% probability of Trump winning.
You say that’s not “getting it wrong”. In that case, what would count as “getting it wrong”? Are we just supposed to have blind faith that Silver’s probability calculation, and all its underlying assumptions, are correct? Because when the candidate with a higher win probability wins, that validates Silver’s model. And when that candidate loses, that “is not evidence of an issue with the model”. Heads I win, tails don’t count.
If I built a model with different assumptions and came up with a 72% probability of Trump winning in 2016, that differs from Silver’s result. Does that mean that I “got it wrong”? If neither of us got it wrong, what does it mean that Trump’s probability of winning is simultaneously 28% and 72%?
And if there is no way for us to tell, even in retrospect, whether 28% is wrong or 72% is wrong or both are wrong, if both are equally compatible with the reality of Trump winning, then why pay any attention to those numbers at all?
I think you’re missing the point of predictive modeling. It’s probability of separate outcomes is built in. This isn’t fortune telling, there is no crystal ball. Two predictive models can have different predictions and they both may have value. Just like separate meteorologists can have different forecasts, but predict accurately the same amount over time, all be it at different intervals. IIRC, the average meteorologist correctly predicts rain over 80% of the time. They are far over predicting by chance. But if you look at the forecast in more than one place you often get slightly different forecasts. They have different models and yet arrive at similar conclusions usually getting it mostly accurate. It’s the same with political forecasts, they are only as valuable as your understanding of predictive modeling. If you think they are intended to mirror reality flawlessly, you will be sorely disappointed. That doesn’t make the models “wrong”, it doesn’t make them “right” either. They are just models that usually predict a probable outcome.
I don’t expect a model to be perfect. But it is certainly possible for one model to be better than another, for example one might think the Weather Channel forecast is less accurate than AccuWeather (at least for your region).
Which, in turn, means that it is possible to decide when a forecast is more “right” or “wrong” than another, because what other basis would you have for judging which is better?
I see what you’re not getting! You are confusing giving the odds with making a prediction and those are very different.
Let’s go back to the coin flips, maybe it’ll make things more clear.
I or Silver might point out there’s a 75% chance anything besides two heads in a row happening (which is accurate.) If, as will happen 1/4 times, two heads in a row does happen, does that somehow mean the odds I gave were wrong?
I or Silver might point out there’s a 75% chance anything besides two heads in a row happening (which is accurate.)
Is it?
Suppose I gave you two coins, which may or may not be weighted. You think they aren’t, and I think they are weighted 2:1 towards heads. Your model predicts one head, and mine predicts two heads.
We toss and get two heads. Does that mean the odds I gave are right? Does it mean the odds you gave are wrong?
In the real world, your odds will depends on your priors, which you can never prove or disprove. If we were working with coins, then we could repeat the experiment and possibly update our priors.
But suppose we only have one chance to toss them, and after which they shatter. In that case, the model we use for the coins, weighted vs unweighted, is just a means to arrive at a prediction. The prediction can be right or wrong, but the internal workings of a one-shot model - including odds - are unfalsifiable. Same with Silver and the 2016 election.
You can’t really falsify the claim “Clinton has a higher chance of winning”, at least the way Nate Silver models it. His model is based upon statistics, and he basically runs a bunch of simulations of the election. In more of these simulations, Clinton won, hence his claim. But we had exactly one actual election, and in the election, Trump won. Perhaps his model is just wrong, or perhaps the outcome matched one of the simulations in his model where Trump won. If we could somehow run the election hundreds of times (or observe what happened in hundreds of parallel universes) then maybe we could see if his model matched the outcome of a statistically significant number of election results. But nevertheless, Nate Silver had a model and statistics to back up his claim.
As for Michael Moore, I’m not sure exactly how he came up with his prediction, but I get the impression it was mostly a gut feeling based upon his observations of what was happening. Nevertheless, Michael Moore still could back up his statement by articulating why he was claiming that and the observations he had made.
Though one crucial difference is still the whole prediction thing. Michael Moore actually made a prediction of a Trump win. Whereas Nate Silver just stated that Clinton had a higher chance of winning, and once again that was not a prediction. So you’re really comparing two different things here.
Admittedly, 538 was pretty good about showing their work after. While individual events suffer from the unfalsifiability issue, 538 when Silver was around, did pretty good “how did we do for individual races/states” and compared their given odds to the actual results.
If you predict that a particular die will land on a 3-6 and it lands on a 2, then you were wrong. Predictions are occasionally wrong, that’s unavoidable in the real world. Maybe the die wasn’t fair and you should adjust your priors.
On the other hand, if you refuse to make a prediction but simply say a particular die has a >50% chance of landing above 2, then your claim is non-falsifiable. I could roll a hundred 1’s in a row, and you could say that your probability is correct and I was just unlucky. That’s why non-falsifiable claims are ultimately worthless.
Finally, if you claim that a theoretically fair die has a 2/3 probability of landing on 3-6 then you are correct, but that does not necessarily have anything to do with the real world of dice.
He said Trump had a 28% chance of winning, and Trump won. So he was also “right.” Do you see now why what you’re saying is incorrect?
If I say there is a 4 in 6 probability of a six-sided die rolling a 1-4, I’m correct, even though I’m going to be “wrong” many times. My probability is still correct, and we would verify that by rolling the die a thousand times and looking at the statistical distribution of each number coming up.
But you can’t rerun an election 1000 times to “prove” the probability.
It’s forecasting, not a prediction. If the weather forecast said there was a 28% chance of rain tomorrow and then tomorrow it rained would you say the forecast was wrong? You could say that if you want, but the point isn’t to give a definitive prediction of the outcome (because that’s not possible) it’s to give you an idea of what to expect.
If there’s a 28% chance of rain, it doesn’t mean it’s not going to rain, it actually means you might want to consider taking an umbrella with you because there’s a significant probability it will rain. If a batter with a .280 batting average comes to the plate with 2 outs at the bottom of the ninth, that doesn’t mean the game is over. If a politician has a 28% probability of winning an election, it’s not a statement that the politician will definitely lose the election.
If the weather forecast said there was a 28% chance of rain tomorrow and then tomorrow it rained would you say the forecast was wrong?
Is it possible for the forecast to be wrong?
I think so. If you look at all the times the forecast predicts a 28% chance of rain, then it should rain on 28% of those days. If it rained, say, on half the days that the forecast gave a 28% chance of rain then the forecast would be wrong.
With Silver, the same principle applies. Clinton should win at least 50% of the 2016 elections where she has at least a 50% chance of winning. She didn’t.
If Silver kept the same model over multiple elections, then we could look at his probabilities in finer detail. But he doesn’t.
Person B’s predicted outcome was closer to the truth.
Perhaps person A’s prediction would improve if multiple trials were allowed. Perhaps their underlying assumptions are wrong (ie the coins are not unweighted).
We are talking about testing a model in the real world. When you evaluate a model, you also evaluate the assumptions made by the model.
Let’s consider a similar example. You are at a carnival. You hand a coin to a carny. He offers to pay you $100 if he flips heads. If he flips tails then you owe him $1.
You: The coin I gave him was unweighted so the odds are 50-50. This bet will pay off.
Your spouse: He’s a carny. You’re going to lose every time.
The coin is flipped, and it’s tails. Who had the better prediction?
You maintain you had the better prediction because you know you gave him an unweighted coin. So you hand him a dollar to repeat the trial. You end up losing $50 without winning once.
You finally reconsider your assumptions. Perhaps the carny switched the coin. Perhaps the carny knows how to control the coin in the air. If it turns out that your assumptions were violated, then your spouse’s original prediction was better than yours: you’re going to lose every time.
Likewise, in order to evaluate Silver’s model we need to consider the possibility that his model’s many assumptions may contain flaws. Especially if his prediction, like yours in this example, differs sharply from real-world outcomes. If the assumptions are flawed, then the prediction could well be flawed too.
You do it by comparing the state voting results to pre-election polling. If the pre-election polling said D+2 and your final result was R+1, then you have to look at your polls and individual polling firms and determine whether some bias is showing up in the results.
Is there selection bias or response bias? You might find that a set of polls is randomly wrong, or you might find that they’re consistently wrong, adding 2 or 3 points in the direction of one party but generally tracking with results across time or geography. In that case, you determine a “house effect,” in that either the people that firm is calling or the people who will talk to them lean 2 to 3 points more Democratic than the electorate.
All of this is explained on the website and it’s kind of a pain to type out on a cellphone while on the toilet.
You are describing how to evaluate polling methods. And I agree: you do this by comparing an actual election outcome (eg statewide vote totals) to the results of your polling method.
But I am not talking about polling methods, I am talking about Silver’s win probability. This is some proprietary method takes other people’s polls as input (Silver is not a pollster) and outputs a number, like 28%. There are many possible ways to combine the poll results, giving different win probabilities. How do we evaluate Silver’s method, separately from the polls?
I think the answer is basically the same: we compare it to an actual election outcome. Silver said Trump had a 28% win probability in 2016, which means he should win 28% of the time. The actual election outcome is that Trump won 100% of his 2016 elections. So as best as we can tell, Silver’s win probability was quite inaccurate.
Now, if we could rerun the 2016 election maybe his estimate would look better over multiple trials. But we can’t do that, all we can ever do is compare 28% to 100%.
So if my model says RFK has a 98% probability of winning, then it is no more right or wrong than Silver’s model?
If so, then probability would be useless. But it isn’t useless. Probability is useful because it can make predictions that can be tested against reality.
In 2016, Silver’s model predicted that Clinton would win. Which was wrong. He knew his model was wrong, because he adjusted his model after 2016. Why change something that is working properly?
You’re conflating things.
Your model itself can be wrong, absolutely.
But for the person above to say Silver got something wrong because a lower probability event happened is a little silly. It’d be like flipping a coin heads side up twice in a row and saying you’ve disproved statistics because heads twice in a row should only happen 1/4 times.
Silver made a prediction. That’s the deliverable. The prediction was wrong.
Nobody is saying that statistical theory was disproved. But it’s impossible to tell whether Silver applied theory correctly, and it doesn’t even matter. When a Boeing airplane loses a door, that doesn’t disprove physics but it does mean that Boeing got something wrong.
Comparing it to Boeing shows you still misunderstand probability. If his model predicts 4 separate elections where each underdog candidate had a 1 in 4 chance of winning. If only 1 of those underdog candidates wins, then the model is likely working. But when that candidate wins everyone will say “but he said it was only a 1 in 4 chance!”. It’s as dumb as people being surprised by rain when it says 25% chance of rain. As long as you only get rain 1/4 of the time with that prediction, then the model is working. Presidential elections are tricky because there are so few of them, they test their models against past data to verify they are working. But it’s just probability, it’s not saying this WILL happen, it’s saying these are the odds at this snapshot in time.
Presidential elections are tricky because there is only one prediction.
Suppose your model says Trump has a 28% chance of winning in 2024, and mine says Trump has a 72% chance of winning in 2024.
There will only be one 2024 election. And suppose Trump loses it.
If that outcome doesn’t tell us anything about the relative strength of our models, then what’s the point of using a model at all? You might as well write a single line of code that spits out “50% Trump”, it is equally useful.
The point of a model is to make a testable prediction. When the TV predicts a 25% chance of rain, that means that it will rain on one fourth of the days that they make such a prediction. It doesn’t have to rain every time.
But Silver only makes a 2016 prediction once, and then he makes a new model for the next election. So he has exactly one chance to get it right.
His model has always been closer state to state, election to election than anyone else’s, which is why people use his models. He is basically using the same model and tweaking it each time, you make it sound like he’s starting over from scratch. When Trump won, none of the prediction models were predicting he would win, but his at least showed a fairly reasonable chance he could. His competitors were forecasting a much more likely Hillary win while he was showing that trump would win basically 3 out of 10 times. In terms of probability that’s not a blowout prediction. His model was working better than competitors. Additionally, he basically predicted the battleground states within a half percentage iirc, that happened to be the difference between a win/loss in some states.
You’re saying it hitting one of those 3 of 10 is “getting it wrong”, that’s the problem with your understanding of probability. By saying that you’re showing that you don’t actually internalize the purpose of a predictive model forecast. It’s not a magic wand, it’s just a predictive tool. That tool is useful if you understand what it’s really saying, instead of extrapolating something it absolutely is not saying. If something says something will happen 3 of 10 times, it happening is not evidence of an issue with the model. A flawless model with ideal inputs can still show a 3 of 10 chance and should hit in 30% of scenarios. Certainly because we have a limited number of elections it’s hard to prove the model, but considering he has come closer than competitors, it certainly seems he knows what he is doing.
First, we need to distinguish Silver’s state-by-state prediction with his “win probability”. The former was pretty unremarkable in 2016, and I think we can agree that like everyone else he incorrectly predicted WI, MI, and PA.
However, his win probability is a different algorithm. It considers alternate scenarios, eg Trump wins Pennsylvania but loses Michigan. It somehow finds the probability of each scenario, and somehow calculates a total probability of winning. This does not correspond to one specific set of states that Silver thinks Trump will win. In 2016, it came up with a 28% probability of Trump winning.
You say that’s not “getting it wrong”. In that case, what would count as “getting it wrong”? Are we just supposed to have blind faith that Silver’s probability calculation, and all its underlying assumptions, are correct? Because when the candidate with a higher win probability wins, that validates Silver’s model. And when that candidate loses, that “is not evidence of an issue with the model”. Heads I win, tails don’t count.
If I built a model with different assumptions and came up with a 72% probability of Trump winning in 2016, that differs from Silver’s result. Does that mean that I “got it wrong”? If neither of us got it wrong, what does it mean that Trump’s probability of winning is simultaneously 28% and 72%?
And if there is no way for us to tell, even in retrospect, whether 28% is wrong or 72% is wrong or both are wrong, if both are equally compatible with the reality of Trump winning, then why pay any attention to those numbers at all?
I think you’re missing the point of predictive modeling. It’s probability of separate outcomes is built in. This isn’t fortune telling, there is no crystal ball. Two predictive models can have different predictions and they both may have value. Just like separate meteorologists can have different forecasts, but predict accurately the same amount over time, all be it at different intervals. IIRC, the average meteorologist correctly predicts rain over 80% of the time. They are far over predicting by chance. But if you look at the forecast in more than one place you often get slightly different forecasts. They have different models and yet arrive at similar conclusions usually getting it mostly accurate. It’s the same with political forecasts, they are only as valuable as your understanding of predictive modeling. If you think they are intended to mirror reality flawlessly, you will be sorely disappointed. That doesn’t make the models “wrong”, it doesn’t make them “right” either. They are just models that usually predict a probable outcome.
I don’t expect a model to be perfect. But it is certainly possible for one model to be better than another, for example one might think the Weather Channel forecast is less accurate than AccuWeather (at least for your region).
Which, in turn, means that it is possible to decide when a forecast is more “right” or “wrong” than another, because what other basis would you have for judging which is better?
I see what you’re not getting! You are confusing giving the odds with making a prediction and those are very different.
Let’s go back to the coin flips, maybe it’ll make things more clear.
I or Silver might point out there’s a 75% chance anything besides two heads in a row happening (which is accurate.) If, as will happen 1/4 times, two heads in a row does happen, does that somehow mean the odds I gave were wrong?
Same with Silver and the 2016 election.
Is it?
Suppose I gave you two coins, which may or may not be weighted. You think they aren’t, and I think they are weighted 2:1 towards heads. Your model predicts one head, and mine predicts two heads.
We toss and get two heads. Does that mean the odds I gave are right? Does it mean the odds you gave are wrong?
In the real world, your odds will depends on your priors, which you can never prove or disprove. If we were working with coins, then we could repeat the experiment and possibly update our priors.
But suppose we only have one chance to toss them, and after which they shatter. In that case, the model we use for the coins, weighted vs unweighted, is just a means to arrive at a prediction. The prediction can be right or wrong, but the internal workings of a one-shot model - including odds - are unfalsifiable. Same with Silver and the 2016 election.
The thing is, Nate Silver did not make a prediction about the 2016 race.
He said that Hilary had a higher chance of winning. He didn’t say Hilary was going to win.
How can you falsify the claim “Clinton has a higher chance of winning”?
Alternately:
Silver said “Clinton has a higher chance of winning in 2016” whereas Michael Moore said “Trump has a higher chance of winning in 2016”.
In hindsight, is one of these claims more valid than the other? Because if two contradictory claims are equally valid, then they are both meaningless.
You can’t really falsify the claim “Clinton has a higher chance of winning”, at least the way Nate Silver models it. His model is based upon statistics, and he basically runs a bunch of simulations of the election. In more of these simulations, Clinton won, hence his claim. But we had exactly one actual election, and in the election, Trump won. Perhaps his model is just wrong, or perhaps the outcome matched one of the simulations in his model where Trump won. If we could somehow run the election hundreds of times (or observe what happened in hundreds of parallel universes) then maybe we could see if his model matched the outcome of a statistically significant number of election results. But nevertheless, Nate Silver had a model and statistics to back up his claim.
As for Michael Moore, I’m not sure exactly how he came up with his prediction, but I get the impression it was mostly a gut feeling based upon his observations of what was happening. Nevertheless, Michael Moore still could back up his statement by articulating why he was claiming that and the observations he had made.
Though one crucial difference is still the whole prediction thing. Michael Moore actually made a prediction of a Trump win. Whereas Nate Silver just stated that Clinton had a higher chance of winning, and once again that was not a prediction. So you’re really comparing two different things here.
Silver claimed that Trump had a 28% chance of winning in 2016.
Suppose I built a model that claimed Trump had a 72% chance of winning in 2016.
Given there is only one 2016 election and Trump won it, is there any reason to believe that Silver’s results are better or worse than mine?
Would you mind restating the prediction?
He predicted Clinton would win. That’s the only reasonable prediction if her win probability was over 50%
If I say a roll of a 6-sided die has a >50% chance of landing on a number above 2, and after a single roll it lands on 2, was I wrong?
If anything, the problem is in the unfalsifiability of the claim.
Admittedly, 538 was pretty good about showing their work after. While individual events suffer from the unfalsifiability issue, 538 when Silver was around, did pretty good “how did we do for individual races/states” and compared their given odds to the actual results.
If you predict that a particular die will land on a 3-6 and it lands on a 2, then you were wrong. Predictions are occasionally wrong, that’s unavoidable in the real world. Maybe the die wasn’t fair and you should adjust your priors.
On the other hand, if you refuse to make a prediction but simply say a particular die has a >50% chance of landing above 2, then your claim is non-falsifiable. I could roll a hundred 1’s in a row, and you could say that your probability is correct and I was just unlucky. That’s why non-falsifiable claims are ultimately worthless.
Finally, if you claim that a theoretically fair die has a 2/3 probability of landing on 3-6 then you are correct, but that does not necessarily have anything to do with the real world of dice.
He said Trump had a 28% chance of winning, and Trump won. So he was also “right.” Do you see now why what you’re saying is incorrect?
If I say there is a 4 in 6 probability of a six-sided die rolling a 1-4, I’m correct, even though I’m going to be “wrong” many times. My probability is still correct, and we would verify that by rolling the die a thousand times and looking at the statistical distribution of each number coming up.
But you can’t rerun an election 1000 times to “prove” the probability.
Suppose I said Trump had a 72% chance of winning the same election, which Trump won. Am I also “right”?
If so, how can it be that Trump has a 28% chance of winning and a 72% chance of winning?
If not, why is he right instead of me?
It’s forecasting, not a prediction. If the weather forecast said there was a 28% chance of rain tomorrow and then tomorrow it rained would you say the forecast was wrong? You could say that if you want, but the point isn’t to give a definitive prediction of the outcome (because that’s not possible) it’s to give you an idea of what to expect.
If there’s a 28% chance of rain, it doesn’t mean it’s not going to rain, it actually means you might want to consider taking an umbrella with you because there’s a significant probability it will rain. If a batter with a .280 batting average comes to the plate with 2 outs at the bottom of the ninth, that doesn’t mean the game is over. If a politician has a 28% probability of winning an election, it’s not a statement that the politician will definitely lose the election.
Is it possible for the forecast to be wrong?
I think so. If you look at all the times the forecast predicts a 28% chance of rain, then it should rain on 28% of those days. If it rained, say, on half the days that the forecast gave a 28% chance of rain then the forecast would be wrong.
With Silver, the same principle applies. Clinton should win at least 50% of the 2016 elections where she has at least a 50% chance of winning. She didn’t.
If Silver kept the same model over multiple elections, then we could look at his probabilities in finer detail. But he doesn’t.
How about this:
Two people give the odds for the result of a coin flip of non-weighted coins.
Person A: Heads = 50%, Tails = 50%
Person B: Heads = 75%, Tails = 25%
The result of the coin flip ends up being Heads. Which person had the more accurate model? Did Person A get something wrong?
Person B’s predicted outcome was closer to the truth.
Perhaps person A’s prediction would improve if multiple trials were allowed. Perhaps their underlying assumptions are wrong (ie the coins are not unweighted).
But in this hypothetical scenario of explicitly unweighted coins, Person A was entirely correct in the odds they gave. There’s nothing to improve.
We are talking about testing a model in the real world. When you evaluate a model, you also evaluate the assumptions made by the model.
Let’s consider a similar example. You are at a carnival. You hand a coin to a carny. He offers to pay you $100 if he flips heads. If he flips tails then you owe him $1.
You: The coin I gave him was unweighted so the odds are 50-50. This bet will pay off.
Your spouse: He’s a carny. You’re going to lose every time.
The coin is flipped, and it’s tails. Who had the better prediction?
You maintain you had the better prediction because you know you gave him an unweighted coin. So you hand him a dollar to repeat the trial. You end up losing $50 without winning once.
You finally reconsider your assumptions. Perhaps the carny switched the coin. Perhaps the carny knows how to control the coin in the air. If it turns out that your assumptions were violated, then your spouse’s original prediction was better than yours: you’re going to lose every time.
Likewise, in order to evaluate Silver’s model we need to consider the possibility that his model’s many assumptions may contain flaws. Especially if his prediction, like yours in this example, differs sharply from real-world outcomes. If the assumptions are flawed, then the prediction could well be flawed too.
Yes. But you’d have to run the test repeatedly and see if the outcome, i.e. Clinton winning, happens as often as the model predicts.
But we only get to run an election once. And there is no guarantee that the most likely outcome will happen on the first try.
If you can only run an election once, then how do you determine which of these two results is better (given than Trump won in 2016):
You do it by comparing the state voting results to pre-election polling. If the pre-election polling said D+2 and your final result was R+1, then you have to look at your polls and individual polling firms and determine whether some bias is showing up in the results.
Is there selection bias or response bias? You might find that a set of polls is randomly wrong, or you might find that they’re consistently wrong, adding 2 or 3 points in the direction of one party but generally tracking with results across time or geography. In that case, you determine a “house effect,” in that either the people that firm is calling or the people who will talk to them lean 2 to 3 points more Democratic than the electorate.
All of this is explained on the website and it’s kind of a pain to type out on a cellphone while on the toilet.
You are describing how to evaluate polling methods. And I agree: you do this by comparing an actual election outcome (eg statewide vote totals) to the results of your polling method.
But I am not talking about polling methods, I am talking about Silver’s win probability. This is some proprietary method takes other people’s polls as input (Silver is not a pollster) and outputs a number, like 28%. There are many possible ways to combine the poll results, giving different win probabilities. How do we evaluate Silver’s method, separately from the polls?
I think the answer is basically the same: we compare it to an actual election outcome. Silver said Trump had a 28% win probability in 2016, which means he should win 28% of the time. The actual election outcome is that Trump won 100% of his 2016 elections. So as best as we can tell, Silver’s win probability was quite inaccurate.
Now, if we could rerun the 2016 election maybe his estimate would look better over multiple trials. But we can’t do that, all we can ever do is compare 28% to 100%.