It is by no means based on the aggregate of all human knowledge. It is based on the aggregate of all human knowledge that techbrodudes could easily rip off on the Internet.
There are enormous swaths of material that is not incorporated into them. There are likely entire LANGUAGES that are under-represented if not flatly absent from the training data. Approximately 50% to 70% or even beyond (depending on the specific analyses involved) of the training material pulled into LLMs, according to the Allen Institute, is in English. About 17% of the planet speaks English. “All human knowledge” indeed. There are approximately 7000 living languages on the planet. The best of the LLMs barely cover 50 of them to any degree of linguistic or cultural competence. (I know ChatGPT claims coverage of 80+ languages. I’ve also seen its unfortunate attempts at the outlying ones…)
And then there is a whole lot of knowledge and information in print form which is not yet incorporated. As a trivial example of this, the very important book in tea production and consumption circles, 中国茶经 (not to be confused with the ancient classic 茶经), is not available in any electronic form anywhere. Its encyclopedic coverage of the fractally complicated Chinese tea sphere is not in any LLM anywhere. Books of this calibre number in the thousands, possibly hundreds of thousands, and are not in any LLM anywhere. This means that if you query an LLM about tea, you’re going to get the amalgamated opinions of dumbasses on Reddit instead of authoritative sources like 中国茶经.
(And I’m not even going to start going down the epistemic rabbit warren of non-textual knowledge. Go ahead and ask your LLMbecile what it feels like when the clay is too wet on the wheel, or how to read a hostile room before negotiations. It will generate text … but what is the source of the physicality and instinct? It has none. It regurgitates what some dumbass on Reddit said.)
You’re right, but I was hoping people wouldn’t take my comment literally :) It’s not ALL human knowledge, obviously. But if it was a tool from humans to humans, instead of from companies to make money of, we could add more and more of our global knowledge to it and have more to win from the tool.
I also am fully aware that this tool is not applicable to EVERY situation, and everyone should also be aware of this.
Developed by the company OpenAI, ChatGPT is an example of a deepneural network, a type of machine learning system that has made its way into virtually every aspect of science and technology.
Large language models refer to the use of deepneural network [sic] to predict the next word, and these models are large in the sense that they have billions, or hundreds of billions of parameters.
These models far exceed the complexity of conventional neural networks, often encompassing dozens ofneural networklayers and containing billions to trillions of parameters.
Are you sure this is the hill you want to die on, Sparky?
It is a great technology that is based on the agglomerate of all human knowledge. It’s a pity it’s in the wrong hands.
It is by no means based on the aggregate of all human knowledge. It is based on the aggregate of all human knowledge that techbrodudes could easily rip off on the Internet.
There are enormous swaths of material that is not incorporated into them. There are likely entire LANGUAGES that are under-represented if not flatly absent from the training data. Approximately 50% to 70% or even beyond (depending on the specific analyses involved) of the training material pulled into LLMs, according to the Allen Institute, is in English. About 17% of the planet speaks English. “All human knowledge” indeed. There are approximately 7000 living languages on the planet. The best of the LLMs barely cover 50 of them to any degree of linguistic or cultural competence. (I know ChatGPT claims coverage of 80+ languages. I’ve also seen its unfortunate attempts at the outlying ones…)
And then there is a whole lot of knowledge and information in print form which is not yet incorporated. As a trivial example of this, the very important book in tea production and consumption circles, 中国茶经 (not to be confused with the ancient classic 茶经), is not available in any electronic form anywhere. Its encyclopedic coverage of the fractally complicated Chinese tea sphere is not in any LLM anywhere. Books of this calibre number in the thousands, possibly hundreds of thousands, and are not in any LLM anywhere. This means that if you query an LLM about tea, you’re going to get the amalgamated opinions of dumbasses on Reddit instead of authoritative sources like 中国茶经.
(And I’m not even going to start going down the epistemic rabbit warren of non-textual knowledge. Go ahead and ask your LLMbecile what it feels like when the clay is too wet on the wheel, or how to read a hostile room before negotiations. It will generate text … but what is the source of the physicality and instinct? It has none. It regurgitates what some dumbass on Reddit said.)
You’re right, but I was hoping people wouldn’t take my comment literally :) It’s not ALL human knowledge, obviously. But if it was a tool from humans to humans, instead of from companies to make money of, we could add more and more of our global knowledge to it and have more to win from the tool.
I also am fully aware that this tool is not applicable to EVERY situation, and everyone should also be aware of this.
It’s a technological dead end. Neural Networks are much more promising in the long run
LLMs are neural networks my man
No, they’re not.
MIT’s McGovern Institute disagrees with you: https://mcgovern.mit.edu/2023/03/27/smart-bots-what-language-models-like-chatgpt-tell-us-about-intelligence-and-the-human-brain/
The University of Michigan disagrees with you: https://online.umich.edu/collections/artificial-intelligence/short/what-is-generative-ai-what-are-llm/?playlist=ai-foundations
Surveys of academic literature disagree with you: https://ar5iv.labs.arxiv.org/html/2412.03220v1
Are you sure this is the hill you want to die on, Sparky?
I’m afraid they are
Whatever you say, Dunning-Krueger
Don’t let a google search stop ya