89 Lecture: What are Large Language Models?

You can follow along with the slides here if they do not appear below.

89.1 Data Science and LLMs

Although we have covered some of the basic concepts of LLMs in previous lectures, we will review them here. We primarily used them in the context of APIs and debugging. However, I’d be remiss if I didn’t mention them in the context of data science.

89.1.1 What are Large Language Models?

Large Language Models (LLMs) are a type of artificial intelligence designed to understand and generate human language. These advanced AI systems are trained on vast amounts of text data. Key characteristics of LLMs include: Massive scale (billions of parameters), self-supervised learning and the ability to generate human-like text. There are versatile applications of LLMs in data science, including text generation, summarization, sentiment analysis, question answering, and more. Some popular LLMs include GPT (Generative Pre-trained Transformer), BERT (Bidirectional Encoder Representations from Transformers), and T5 (Text-to-Text Transfer Transformer).

89.1.2 History of LLMs:

Early models include rule-based systems such as ELIZA (1966) and statistical models like IBM’s Watson (2011). Modern models include OpenAI’s LLM ChatGPT (Chat Generative Pre-Trained Transformer; November 2022), Stanford CRFM and MosaicML’s BioMedLM (Biomedical Language Model; December/January 2023), Meta AI’s LLaMA (Large Language Model Meta AI; February 2023), OpenAI’s GPT-4 (Generative Pre-trained Transformer; March 2023), and Google’s LaMDA (Language Model for Dialogue Applications; May 2023).

Alan D. Thompson has a really nice timeline of AI timelineof AI and table of models that you might find interesting.

89.1.3 How do LLMs work?

The architecture of LLMs is based on the transformer model, which includes key components such as the encoder, decoder, attention mechanisms, and layers. The transformer model is a neural network architecture that uses self-attention mechanisms to process sequential data. Other components include positional encoding, layer normalization, and feed-forward neural networks. The training process involves pre-training on a large corpus of text data, fine-tuning for specific tasks, and prompt engineering for task-specific performance.


89.1.4 Applications in Data Science

LLMs can be used for various tasks in data science, including: text generation and summarization, sentiment analysis, named entity recognition (NER), question answering, text classification, and language translation. These applications are used in a wide range of industries, including healthcare, finance, marketing, and customer service. LLMs have the potential to automate and improve many aspects of data science, from data processing to decision-making. We’ve seen some of these applications in our previous lectures, and we’ll continue to explore them in future lectures. But for now, let’s dive into the world of LLMs and explore their capabilities and limitations. It’s also important to note that many of the models we’ve discussed are still in development, and new models are being released regularly. So, stay tuned for the latest updates on LLMs and their applications in data science. Nevertheless, it looks like we’re just scratching the surface of what LLMs can do.

However, as I’ve written elsewhere (Garrison & Tyson, 2026), we need to use these tools in a way that doesn’t inhibit your creativity, critical thinking, or development. My advice is to use these tools as a way to enhance your work, not replace it.And that means understanding the limitations of these tools and using them in conjunction with your own expertise and judgment. LLMs can greatly enhance productivity at both the novice and expert level. But in order to leave the novice stage, you need to understand the underlying principles of the models you are using. If you can’t explain how a model works or why it’s making a particular prediction, then you probably shouldn’t be using it. My general rule of thumb is to use these tools to help you think, not to do the thinking for you.