class: center, middle, inverse, title-slide .title[ # Introduction to Large Language Models
🤖
] .author[ ### S. Mason Garrison ] --- layout: true <div class="my-footer"> <span> <a href="https://DataScience4Psych.github.io/DataScience4Psych/" target="_blank">Data Science for Psychologists</a> </span> </div> --- class: middle # Getting Started with Large Language Models --- ## What are Large Language Models? .pull-left[ - Large Language Models (LLMs) - are a type of artificial intelligence designed to - understand and - generate human language. - These advanced AI systems are trained on *vast* amounts of text data. ] -- .pull-right[ Key characteristics: - Massive scale (billions of parameters) - Self-supervised learning - Ability to generate human-like text - Versatile applications and examples - GPT (Generative Pre-trained Transformer) - BERT (Bidirectional Encoder Representations from Transformers) - T5 (Text-to-Text Transfer Transformer) ] --- # Evolution of LLMs .question[ How have LLMs evolved over time? ] <img src="img/chattimeline.png" width="50%" style="display: block; margin: auto;" /> .footnote[ Image Source: DALL-E: Based on the following prompt: "A visual timeline showcasing the evolution of large language models. The timeline starts from early simple models, progressing through key milestones such as the introduction of transformers, GPT-3, and up to GPT-4. Each milestone is marked with distinct icons and dates, illustrating the increasing complexity and capabilities of these models. The background features a blend of abstract digital patterns representing data and artificial intelligence. The color scheme is professional with shades of blue and gray, with highlighted elements in contrasting colors to emphasize progress." ] --- # History of LLMs: - Early models: - Rule-based systems - Statistical models - Modern models: - OpenAI’s LLM ChatGPT (Chat Generative Pre-Trained Transformer) (November 2022) -Stanford CRFM and MosaicML's BioMedLM (Biomedical Language Model) (Dec/January 2023) - Domain-specific LLMs for biomedical research (trained on PubMed - Meta AI's LLaMA (Large Language Model Meta AI) (Feb 2023) - OpenAI's GPT-4 (Generative Pre-trained Transformer) (March 2023) - Google's LaMDA (Language Model for Dialogue Applications) (May 2023) .footnote[Alan D. Thompson has a really nice timeline of AI [timelineof AI](https://lifearchitect.ai/timeline/) and [table of models](https://lifearchitect.ai/models-table/) ] --- class: middle # How do LLMs work? --- # Architecture of LLMs .pull-left[ Key components: - Transformer architecture - Self-attention mechanism - Positional encoding - Layer normalization - Feed-forward neural networks ] .pull-right[ <img src="img/transformer.png" width="60%" style="display: block; margin: auto;" /> ] -- .footnote[ Image Source: DALL-E: Based on the following prompt: "A detailed infographic highlighting the architecture of transformer models. The diagram includes key components such as the encoder, decoder, attention mechanisms, and layers. Arrows indicate the flow of information between these components. The background features a modern design with subtle tech-related patterns. Use professional colors like shades of blue and gray, with important elements highlighted in contrasting colors to draw attention." ] --- # Training Process - Pre-training on large corpus of text data - Fine-tuning for specific tasks - Prompt engineering for task-specific performance -- .question[ What are the challenges in training LLMs? ] --- # Applications in Data Science LLMs can be used for various tasks in data science: - Text generation and summarization - Sentiment analysis - Named Entity Recognition (NER) - Question answering - Text classification - Language translation --- class: middle # Wrapping Up... --- # Sources - Claude Sonnet 3.5. on 07/18/2024 - TBD ---