It’s vital to “keep humans in the loop” to avoid humanizing machine-learning models in research
Machine-learning models are quickly becoming common tools in scientific research. These artificial intelligence systems are helping bioengineers discover new potential antibiotics, veterinarians interpret animals’ facial expressions, papyrologists read words on ancient scrolls, mathematicians solve baffling problems and climatologists predict sea-ice movements. Some scientists are even probing large language models’ potential as proxies or replacements for human participants in psychology and behavioral research. In one recent example, computer scientists ran ChatGPT through the conditions of the Milgram shock experiment—the famous study on obedience in which people gave what they believed were increasingly painful electric shocks to an unseen person when told to do so by an authority figure—and other well-known psychology studies. The artificial intelligence model responded in a similar way as humans did—75 percent of simulated participants administered shocks of 300 volts and above.
But relying on these machine-learning algorithms also carry risks. Some of those risks are commonly acknowledged, such as generative AI’s tendency to spit out occasional “hallucinations” (factual inaccuracies or nonsense). Artificial intelligence tools can also replicate and even amplify human biases about characteristics such as race and gender. And the AI boom, which has given rise to complex, trillion-variable models, requires water- and energy-hungry data centers that likely have high environmental costs.
The part you’re calling “a hell of a stretch” is actually the reason LLMs work. It’s not a good text parser. It’s a great pattern matcher. And it matches patterns that aren’t obvious or intuitive.
Many of the listed uses are actually great for this type of tech.
In theory, because of the amount of data used, there should be matched patterns that would allow it to be used for psychological research. Replicating well known studies in that area with the tech is a good way to test that theory.
Using it as a first-line simulation might not be a bad idea as long as its followed up with a real study to validate the results.
We just need to make sure that humans are checking the work properly because, as you say, it’s not sentient, nor is it really capable of following a code, like the scientific method.
The real thing to fear is humans not doing their part out of greed, laziness, or malice.