Comparison of popular LLM Models
Model/Model Family Name
Created By
Sizes
Versions
Pretraining Data
Fine-tuning and Alignment Details
License
What’s Interesting
Architectural Notes
GPT-4
OpenAI
Not specified (rumored to have >170 trillion parameters)
Not specified
Not specified
Reinforcement Learning from Human Feedback, adversarial testing
Not specified
Multimodal, excels in complex reasoning, advanced coding
First multimodal model, improved factuality
GPT-3
OpenAI
Various (e.g., GPT-3, GPT-3.5)
Multiple
Large-scale text corpora
Not specified
Open-source
Record-breaking 175 billion parameters, revolutionized NLP
Decoder-only transformer architecture
GPT-3.5
OpenAI
Not specified
Not specified
Large-scale text corpora
Reinforcement learning from human feedback
Open-source
Reduced parameter count, serves as underlying technology for ChatGPT
Offers GPT-3.5 turbo, fast inference
Gemini
Google
Not specified
Not specified
Not specified
Fine-tuned on various datasets
Not specified
Outperforms ChatGPT in understanding text, images, videos, speech
Multimodal, excels in academic tests
LLaMA
Meta AI
Various (e.g., LLaMA-7B, LLaMA-65B)
Not specified
Not specified
Not specified
Open-source
Diverse range of models, superior performance compared to GPT-3
Empowers developers with open-source models
PaLM 2 (Bison-001)
Google AI
Up to 540 billion parameters
Not specified
Large-scale text corpora
Multilingual proficiency, comprehension of idioms
Not specified
Advanced proficiency in formal logic, mathematical equations
Multilingual, quick response
Bard
Google AI
1.6 trillion parameters
Not specified
Not specified
Tailored for natural conversations, internet-connected
Not specified
Real-time access to online information, tailored for dialogue
Internet-connected, tailored for conversations
Claude v1
Anthropic
Not specified
Not specified
Not specified
Not specified
Not specified
Outperforms PaLM 2 in benchmark tests, offers 100k token context window
Competing with GPT-4, superior performance
Falcon
Technology Innovation Institute(TII), UAE
Not specified
Not specified
Web text, curated sources
Incorporates enhancements like rotary positional embeddings
Open-source
Outranks other open-source models, improved performance
Trained on extensive dataset, multi-query attention
Cohere
Cohere
Various (e.g., 6B, 52B)
Not specified
Not specified
Custom-trained and fine-tuned to specific company’s use case
Commercial
Customizable for enterprise applications
Custom-trained and fine-tuned models
Orca
Microsoft
13 billion parameters
Not specified
Not specified
Synthetic training dataset, Prompt Erasure technique
Not specified
Comparable performance to GPT-4, efficient on laptops
Fine-tuned version of LLaMA 2, uses synthetic data
Guanaco
Not specified
Various (e.g., Guanaco-7B, Guanaco-65B)
Not specified
OASST1 dataset
QLoRA fine-tuning technique
Not specified
Surpasses GPT-3.5 in performance, optimized memory usage
Trained on OASST1 dataset, QLoRA technique
Vicuna
LMSYS
Not specified
Not specified
User-shared ChatGPT conversations
Trained on a budget, high performance for its size
Not specified
Efficient training process, competitive performance
Trained on user-shared conversations
MPT-30B
Not specified
Not specified
Not specified
Various datasets
Long context lengths, exceeds quality of GPT-3
Apache 2.0
Various model configurations, optimized for specific requirements
Fine-tuned on massive corpus of data
30B Lazarus
CalderaAI
Not specified
Not specified
LoRA-tuned datasets
Exceptional performance, top open-source model for text generation
Not specified
Excels in text generation, supports specific use cases
Utilizes LoRA-tuned datasets, specific use cases
Flan-T5
Google researchers
Various (e.g., Flan-T5-Large)
Not specified
Supervised, unsupervised datasets
Supports various language tasks, text-to-text paradigm
Open-source
Supports multiple language tasks, detects “toxic” language
Encoder-decoder model, text-to-text paradigm
WizardLM
Not specified
Not specified
Not specified
Evol-instruct approach
Impressive performance despite 13B parameters
Open-source
Efficient and compact, excels in executing complex instructions
Utilizes Evol-instruct approach for fine-tuning
Alpaca 7B
Stanford University
7 billion parameters
Not specified
Not specified
Cost-effective creation, quantitative comparison to text-davinci-003
Not specified
Cost-effective, comparable performance to text-davinci-003
Utilizes mixed precision, Fully Sharded Data Parallel training
LaMDA
Google
Not specified
Not specified
Billions of documents, dialogs, utterances
Crafted responses, access to symbolic text processing systems
Not specified
Versatile, access to multiple symbolic text processing systems
Relies on powerful Transformer architecture
BERT
Google
Not specified
Not specified
Large-scale text corpora
Standard in NLP tasks, open-source
Open-source
Pioneering model in NLP, standard for language understanding
Transformer architecture, open-source...