I let people rate how much they like different things on a scale of 1-10. How do I actually tell if people like one thing more than another thing if the sample sizes are different? This is not about any real scientific study, more like a personal test :)

For example, if one thing got voted on 10 times and has an average value of 6.5, and another thing got voted on 6 times and has a 6.1, is the 6.5 thing actually more liked? Or is this small sample size still so random that it could with a high chance go both ways?

I’ve never done anything like this, if someone could explain it or direct me to the correct key words/links, that would be hugely appreciated :)

I’ve read up a bit on p-value determination, but I’m not sure what my “null hypothesis” is here actually, numerically. If I’d put it in words I guess my hypothesis would be “this thing is more liked than the other thing”, but honestly, it seems like my specific case would be much simpler than all the stuff I’m reading here :D

  • @[email protected]
    link
    fedilink
    English
    51 year ago

    Your situation reminded me of the way IMDB sorts movies by rating, even though different movies may receive vastly different total number of votes. They use something called a credibility formula which is apparently a Bayesian statistics way of doing it, unlike the frequentist statistics with p-values and null hypotheses that you are looking for atm.