Blaed

Blaed

cross-posted from: https://lemmy.world/post/228673

I would like to share with everyone an incredible resource put together by the LMSYS Org.

In essence, it is a new Elo rating leaderboard for large language models based on anonymous voting data collected in the wild. I will be sharing major updates to this leaderboard regularly.

If you have no idea what is going on in AI and simply want to be in the know in regards to which models in the open-source communities are the best - this is the thread for you.

Consider visiting and subscribing to /c/FOSAI if you want to keep up with the latest and greatest advancements in free, open-source artificial intelligence.

You should also bookmark or save this leaked article from one of Google’s employees. It does a great job illustrating how big tech companies are scrambling to contain free, open-source AI.

Google “We Have No Moat, And Neither Does OpenAI”

It is a fascinating insight into some of the minds that pioneered this technology in the first place. We can do a deeper conversational post on this article later on, but watching how it ages over these coming months will be interesting to say the least.

I did not make this leaderboard, so if you want more insights into the rankings please visit the LMSYS site to show your support and dive deeper into the models.

For fun, I asked GPT-4 to convert the table into eSport inspired ladder rankings. These were the results. Hopefully this helps give you a better frame of reference for this field of emerging tech.

Challenger #1 - GPT-4 by OpenAI (Elo Rating: 1225) - This is a proprietary model.

Challenger #2 - Claude-v1 by Anthropic (Elo Rating: 1195) - This is also a proprietary model.

Challenger #3 - Claude-instant-v1 (Elo Rating: 1153) - This model is lighter, less expensive, and much faster version of Claude. It’s a proprietary model as well.

Master - GPT-3.5-turbo by OpenAI (Elo Rating: 1143) - Another proprietary model from OpenAI.

Diamond I - Vicuna-13B (Elo Rating: 1054) - This is a chat assistant fine-tuned from LLaMA on user-shared conversations by LMSYS. The weights are available for non-commercial use.

Diamond II - PaLM 2 (Elo Rating: 1042) - This is a chat model tuned for chat and powers Bard. It’s a proprietary model.

Diamond III - Vicuna-7B (Elo Rating: 1007) - Similar to Vicuna-13B, this model has also been fine-tuned from LLaMA on user-shared conversations by LMSYS. The weights are available for non-commercial use.

Platinum I - Koala-13B (Elo Rating: 980) - A dialogue model for academic research by BAIR. The weights are available for non-commercial use.

Platinum II - mpt-7b-chat (Elo Rating: 952) - This is a chatbot fine-tuned from MPT-7B by MosaicML. It is released under CC-By-NC-SA-4.0 license.

Platinum III - FastChat-T5-3B (Elo Rating: 941) - This chat assistant was fine-tuned from FLAN-T5 by LMSYS. It’s available under Apache 2.0 license.

Gold I - Alpaca-13B (Elo Rating: 937) - This model is fine-tuned from LLaMA on instruction-following demonstrations by Stanford. The weights are available for non-commercial use.

Gold II - RWKV-4-Raven-14B (Elo Rating: 928) - An RNN with transformer-level LLM performance. It’s available under Apache 2.0 license.

Gold III - Oasst-Pythia-12B (Elo Rating: 921) - An Open Assistant for everyone by LAION. It’s also available under Apache 2.0 license.

Silver I - ChatGLM-6B (Elo Rating: 921) - An open bilingual dialogue language model by Tsinghua University. The weights are available for non-commercial use.

Silver II - StableLM-Tuned-Alpha-7B (Elo Rating: 882) - Stability AI language models. These models are released under CC-BY-NC-SA-4.0 license.

Silver III - Dolly-V2-12B (Elo Rating: 866) - An instruction-tuned open large language model by Databricks. It is

I think we all recognize #1. But notice how fast we are in catching up. In case you forgot, GPT-4 was released by OpenAI on March 14, 2023. Three months later we have multiple competing models with many commercially available ones not far behind.

This leaderboard is a testament to the power of open-source. Perhaps there is a moat, perhaps there isn’t. I don’t know the answer to this, but I am very hopeful we’ll see some incredible breakthroughs in our lifetimes. Perhaps even in the near future if we continue advancing at this rapid pace.

It’s also important to consider NVIDIA’s Grace Hopper AI Superchip was announced not too long ago. With AMDs surge towards AI, we’re in for a boom in the coming years.

What does that mean for us? For you? For technology as a whole?

I know this is a lot to keep up with, so if you’re overwhelmed, consider subscribing. All will be explained in due time! This next year is going to be very interesting. I appreciate you taking a break from Reddit and sharing this journey with us here.

We have many more exciting news around the corner. We’ll do our best to break it down ELI5 style and in-depth for those who want a deep dive into the technology and how it all works.

AI Chatbot Arena Leaderboard & Elo Rankings (June 2023)

AI Chatbot Arena Leaderboard & Elo Rankings (June 2023)