100 ACT: Working with OpenAI’s API
This module introduces the basics of interacting with OpenAI’s API from R. We’ll explore how to make API calls, handle responses, and integrate AI capabilities into data science workflows. You can find the API documentation here and the R package documentation for httr and jsonlite for making HTTP requests and handling JSON data.
100.1 Getting Started
First, we need to load the required packages:
100.1.1 API Authentication
To use OpenAI’s API, you’ll need an API key. Like we learned with other APIs, it’s important to keep this secure:
# Store API key securely (NEVER commit to Git!)
openai_api_key <- readLines("path/to/api_key.txt")
100.1.2 Making API Requests
The core workflow involves:
- Constructing the API request
- Sending it to OpenAI’s endpoint
- Processing the response
Next, we define a function to generate text using OpenAI’s API. The function takes a prompt as input and returns the generated text.
Here’s a basic function for text generation:
generate_text <- function(prompt, model = "gpt-5-nano", max_output_tokens = 200) {
response <- POST(
# curl https://api.openai.com/v1/chat/completions
url = "https://api.openai.com/v1/chat/completions",
# -H "Authorization: Bearer $OPENAI_API_KEY"
add_headers(Authorization = paste("Bearer", openai_api_key)),
# -H "Content-Type: application/json"
content_type_json(),
# -d '{
# "model": "gpt-5-nano",
# "messages": [{"role": "user", "content": "What is a banana?"}]
# }'
encode = "json",
body = list(
model = model,
messages = list(list(
role = "user", content = prompt,
max_output_tokens = max_output_tokens
))
)
)
str_content <- content(response, "text", encoding = "UTF-8")
parsed <- fromJSON(str_content)
# return(parsed$choices[[1]]$text)
return(parsed)
}I have included comments in the code to show how the API request corresponds to a typical curl command you might use in the terminal.
100.2 Example Usage and Handling the Response
Now that we’ve defined our generate_text() function, let’s test it by sending a request to OpenAI’s API and working with the response.
100.2.2 Step 2: Examine the Raw API Response
When we call the generate_text(prompt) function, OpenAI’s API returns a structured response in JSON format, which R reads as a list. This response contains multiple components, but the most important part is the generated text.
Let’s print the raw response to see its structure.
print(generated_text)
#> $id
#> [1] "chatcmpl-DmR864WSRQ8fdAjP4FWiaX54ubBW6"
#>
#> $object
#> [1] "chat.completion"
#>
#> $created
#> [1] 1780436402
#>
#> $model
#> [1] "gpt-5-nano-2025-08-07"
#>
#> $choices
#> index message.role
#> 1 0 assistant
#> message.content
#> 1 Data threads whisper\npatterns bloom in cold dashboards\nanswers spark in graphs
#> message.refusal message.annotations finish_reason
#> 1 NA NULL stop
#>
#> $usage
#> $usage$prompt_tokens
#> [1] 14
#>
#> $usage$completion_tokens
#> [1] 663
#>
#> $usage$total_tokens
#> [1] 677
#>
#> $usage$prompt_tokens_details
#> $usage$prompt_tokens_details$cached_tokens
#> [1] 0
#>
#> $usage$prompt_tokens_details$audio_tokens
#> [1] 0
#>
#>
#> $usage$completion_tokens_details
#> $usage$completion_tokens_details$reasoning_tokens
#> [1] 640
#>
#> $usage$completion_tokens_details$audio_tokens
#> [1] 0
#>
#> $usage$completion_tokens_details$accepted_prediction_tokens
#> [1] 0
#>
#> $usage$completion_tokens_details$rejected_prediction_tokens
#> [1] 0
#>
#>
#>
#> $service_tier
#> [1] "default"
#>
#> $system_fingerprint
#> NULLAs you can see, the response is a nested list containing various metadata (e.g., request ID, model name, creation time), the AI-generated response (inside $choices[[1]]$message$content), token usage information (inside \(usage\)total_tokens), and more.
100.2.3 Step 3: Extract the AI-Generated Text
Since the response contains both metadata and content, we need to extract only the generated text. The key part of the response is stored in:
Now, let’s print the AI-generated text:
print(ai_response)
#> [1] "Data threads whisper\npatterns bloom in cold dashboards\nanswers spark in graphs"Ok, so that wasn’t really readable. Let’s try to format it a bit better:
Data threads whisper patterns bloom in cold dashboards answers spark in graphs
Now we can see the haiku about data science that the model generated in response to our prompt. This is the core workflow for interacting with OpenAI’s API: send a request, receive a structured response, and extract the relevant content for use in your applications.
100.2.4 Step 4: Understanding Token Usage
Since OpenAI charges based on token usage, it’s useful to monitor how many tokens are used per request. The API response includes:
- usage$prompt_tokens → Tokens in the input prompt
- usage$completion_tokens → Tokens generated by the model
- usage$total_tokens → The total token count for billing
To check token usage:
print(generated_text$usage$total_tokens) # Total tokens used
#> [1] 677
print(generated_text$usage$completion_tokens) # Tokens used for output
#> [1] 663
print(generated_text$usage$prompt_tokens) # Tokens used for input
#> [1] 14The token usage information can help you optimize your prompts and manage costs when using the API.
100.3 Error Handling
Like we’ve seen with other APIs, it’s important to handle errors gracefully. As with any API call, errors can occur due to network issues, invalid requests, or rate limits. To ensure our script doesn’t crash, we can wrap API calls in tryCatch():
generate_text_safe <- function(prompt) {
tryCatch(
{
generate_text(prompt)
},
error = function(e) {
warning("API call failed: ", e$message)
return(NULL)
}
)
}Now, we can use generate_text_safe() to handle errors. If an error occurs, the function will return NULL and print a warning message.
100.4 Processing Multiple Requests
When working with multiple prompts, we can use purrr::map_chr() to process them efficiently:
library(purrr)
prompts <- c(
"Define p-value",
"Explain Type I error",
"What is statistical power?"
)
responses <- list()
responses <- map(prompts, generate_text_safe)This code generates text for each prompt in the prompts vector. If an error occurs, the response will be NULL. After running this code, we can examine the responses and handle any errors. I’ve included a table below to display the responses.
As you can see, the table displays the prompts, AI-generated responses, token usage, model name, and completion time for each request. This information can help us monitor the API usage and response quality.
100.4.1 Rate Limiting
OpenAI has rate limits we need to respect. We can add delays between requests to avoid exceeding these limits. Here’s a throttled version of the generate_text() function:
generate_text_throttled <- function(prompt) {
Sys.sleep(1) # Wait 1 second between requests
generate_text_safe(prompt)
}This function adds a 1-second delay between requests to avoid exceeding OpenAI’s rate limits. You can adjust the delay as needed based on the API’s rate limits.
100.5 Your Turn!
Now it’s your turn to experiment with the OpenAI API! Try different prompts, explore various models, and see how you can integrate AI-generated text into your projects. Remember to monitor your token usage and handle errors gracefully as you work with the API.
I’ve crafted a prompt to generate your very own activity for this module. You can modify the prompt to create different activities or explore other topics. Here’s the prompt I used:
activity_prompt <- "Create a meme-tastic data science activity for graduate students learning about using OpenAI's API in Tidyverse. The activity should involve making API calls, handling responses, and analyzing the results. Include clear, concise instructions and learning objectives."
activity_response <- generate_text(activity_prompt,
model = "gpt-5-nano",
max_output_tokens = 3000
)
writeLines(activity_response$choices$message$content, "includes/activity.txt")Because the response is quite long (at 8426 tokens), I’ve written it to a text file in the includes directory. You can open that file to see the generated activity. The activity is designed to help students learn how to use OpenAI’s API in R, including making API calls, handling responses, and analyzing results. It includes clear instructions and learning objectives to guide students through the process.
You may notice that the activity is generated each time I render this book. If you want to keep a specific version of the activity, you can find it in the commit history of the includes/activity.txt file in the GitHub repository for this book. You can also modify the prompt to generate a new activity or explore different topics as you see fit. Happy experimenting!
Remember that this activity is generated by the OpenAI API, so it requires careful review and editing to ensure it is accurate, clear, and appropriate. Always review AI-generated content before using it. This advice is especially important in an educational setting to ensure it meets your standards and learning objectives. Don’t be just an AI passenger. Trust but verify, as they say.
Click to see the generated activity
#> Meme-tastic data science with OpenAI API in the Tidyverse
#>
#> Idea in short:
#> Graduate students build a small, repeatable workflow that calls OpenAI’s chat API from R, generates meme captions for a few classic templates, and then wrangles, analyzes, and visualizes the results with tidyverse tools. The activity emphasizes prompt design, handling API responses, data wrangling, and simple text analytics.
#>
#> Learning objectives
#> - Call OpenAI’s Chat API from R (via httr/jsonlite) and handle responses and usage data.
#> - Engineer prompts and vary model parameters (model, temperature, max_tokens) to study effects on output.
#> - Use the Tidyverse (dplyr, purrr, tidyr, ggplot2, tidytext) to wrangle, analyze, and visualize API results.
#> - Perform lightweight text analytics (token counts, sentiment) to compare captions across meme templates and styles.
#> - Discuss cost, rate limits, and ethical content considerations in real-world API usage.
#>
#> Estimated time
#> - Setup: 15 minutes
#> - API calls and data collection: 30–60 minutes (depends on batch size and rate limits)
#> - Analysis and visualization: 30–45 minutes
#> - Discussion and reflection: 15–20 minutes
#> Total: ~2–2.5 hours
#>
#> What you’ll need
#> - R (4.x) and RStudio (or any R environment)
#> - OpenAI API key (set as environment variable OPENAI_API_KEY)
#> - Packages: httr, jsonlite, dplyr, purrr, tidyr, tibble, stringr, ggplot2, tidytext, readr, glue, lubridate (optional)
#>
#> Setup and prerequisites
#> - Obtain an OpenAI API key from your OpenAI account.
#> - In R, set the API key (do not hard-code it):
#> - Sys.setenv(OPENAI_API_KEY = "sk-...")
#> - Or add it to your .Renviron and restart R.
#>
#> - Install required packages (one line):
#> install.packages(c("httr","jsonlite","dplyr","purrr","tidyr","tibble","stringr","ggplot2","tidytext","readr","glue"))
#>
#> - Load libraries (R):
#> library(httr)
#> library(jsonlite)
#> library(dplyr)
#> library(purrr)
#> library(tidyr)
#> library(tibble)
#> library(stringr)
#> library(ggplot2)
#> library(tidytext)
#> library(glue)
#>
#> Code you can drop into a script
#>
#> 1) API helper: call OpenAI Chat API, returning content and usage
#> - This function returns both the caption and the usage tokens.
#> - It uses the gpt-3.5-turbo model by default, but you can switch to gpt-4-turbo if available.
#>
#> # OpenAI API helper
#> get_openai_response <- function(messages,
#> model = "gpt-3.5-turbo",
#> temperature = 0.6,
#> max_tokens = 180) {
#> api_key <- Sys.getenv("OPENAI_API_KEY")
#> if (api_key == "") stop("OPENAI_API_KEY not set. Please set your API key.")
#> url <- "https://api.openai.com/v1/chat/completions"
#> body <- list(model = model,
#> messages = messages,
#> temperature = temperature,
#> max_tokens = max_tokens)
#> res <- POST(url,
#> add_headers(Authorization = paste("Bearer", api_key)),
#> content_type_json(),
#> body = toJSON(body, auto_unbox = TRUE))
#> stop_for_status(res)
#> parsed <- jsonlite::fromJSON(httr::content(res, as = "text", encoding = "UTF-8"))
#> content <- parsed$choices[[1]]$message$content
#> usage <- if (!is.null(parsed$usage)) parsed$usage else NULL
#> list(content = content, usage = usage)
#> }
#>
#> # Build a pair of messages (system + user) for each caption task
#> build_messages <- function(template, style) {
#> list(
#> list(role = "system",
#> content = "You are a witty meme caption generator known for concise, punchy lines."),
#> list(role = "user",
#> content = glue("Create a short, funny caption for the meme template '{template}'. Style: {style}. Keep it under 180 characters. The caption should be relevant to the template and meme culture.")
#> )
#> )
#> }
#>
#> # Wrapper: generate caption for a given template and style
#> generate_caption <- function(template, style, temperature = 0.6, max_tokens = 180) {
#> msgs <- build_messages(template, style)
#> resp <- get_openai_response(msgs,
#> model = "gpt-3.5-turbo",
#> temperature = temperature,
#> max_tokens = max_tokens)
#> caption <- str_trim(resp$content)
#> list(caption = caption,
#> usage = resp$usage)
#> }
#>
#> 2) Meme templates (data) and styles (prompt variation
#> - You’ll generate captions for a small set of classic meme formats.
#> - Styles are qualitative adjectives you might use to prompt different vibes.
#>
#> templates <- tibble(
#> template = c("Doge", "Distracted Boyfriend", "Two Buttons", "This Is Fine", "Expanding Brain"),
#> description = c("Shiba Inu meme with Comic Sans caption",
#> "Man looking at another woman while his girlfriend looks on",
#> "Two buttons with conflicting choices",
#> "A dog in a burning room",
#> "Progression of brain expansion")
#> )
#>
#> styles <- c("sarcastic", "wholesome", "absurd")
#>
#> # Create all template-style combinations
#> combinations <- templates %>% crossing(style = styles)
#>
#> 3) Generate captions for each combination
#> - This loop will call the OpenAI API for each row, with a short delay to respect rate limits.
#>
#> set.seed(123) # for reproducibility of any random aspects if added later
#>
#> captions_df <- combinations %>%
#> mutate(res = map2(template, style, ~ generate_caption(.x, .y, temperature = 0.6, max_tokens = 180)),
#> caption = map_chr(res, ~ .x$caption),
#> prompt_tokens = map_int(res, ~ if (!is.null(.x$usage)) .x$usage$prompt_tokens else NA),
#> completion_tokens = map_int(res, ~ if (!is.null(.x$usage)) .x$usage$completion_tokens else NA),
#> total_tokens = map_int(res, ~ if (!is.null(.x$usage)) .x$usage$total_tokens else NA)
#> ) %>%
#> select(-res)
#>
#> Notes:
#> - The total_tokens, prompt_tokens, completion_tokens come from the API usage data. Some environments may not return usage; handle with NA as above.
#> - You may want to add a short delay between requests: Sys.sleep(0.7) inside the map function if needed.
#>
#> 4) Basic text analytics with tidyverse
#> - Add word counts, sentiment, and simple readability-ish proxy metrics.
#>
#> # Word count per caption
#> captions_df <- captions_df %>%
#> mutate(caption_id = row_number())
#>
#> # Add a read-friendly word count (rough)
#> captions_df <- captions_df %>%
#> mutate(word_count = str_count(caption, "\\S+"))
#>
#> # Sentiment (AFINN) via tidytext
#> sentiment_df <- tibble(text = captions_df$caption, id = captions_df$caption_id) %>%
#> unnest_tokens(word, text) %>%
#> inner_join(get_sentiments("afinn"), by = "word") %>%
#> group_by(id) %>%
#> summarize(sentiment_score = sum(value, na.rm = TRUE)) %>%
#> ungroup()
#>
#> captions_df <- captions_df %>%
#> left_join(sentiment_df, by = c("caption_id" = "id"))
#>
#> # Quick categorical flag: positive vs non-positive
#> captions_df <- captions_df %>%
#> mutate(sentiment_label = case_when(
#> is.na(sentiment_score) ~ "unknown",
#> sentiment_score > 0 ~ "positive",
#> sentiment_score < 0 ~ "negative",
#> TRUE ~ "neutral"
#> ))
#>
#> 5) Visualizations (ggplot2)
#> - Compare caption length vs. sentiment, by meme template/style.
#>
#> # Plot: caption length by template-style
#> ggplot(captions_df, aes(x = template, y = word_count, fill = sentiment_label)) +
#> geom_bar(stat = "identity", position = "dodge") +
#> coord_flip() +
#> labs(title = "Caption Length by Meme Template-Style",
#> x = "Meme template",
#> y = "Word count",
#> fill = "Sentiment") +
#> theme_minimal()
#>
#> # Plot: distribution of sentiment scores
#> ggplot(captions_df, aes(x = sentiment_score)) +
#> geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
#> labs(title = "Distribution of Caption Sentiment Scores (AFINN)",
#> x = "Sentiment score",
#> y = "Frequency") +
#> theme_minimal()
#>
#> 6) Interpretation questions you can pose to students
#> - How did changing temperature affect caption creativity vs. coherence with the meme template?
#> - Are some meme templates more tolerant of absurd/sarcastic prompts than others?
#> - Do sentiment scores align with the intended style (e.g., sarcastic captions with negative sentiment, wholesome captions with positive sentiment)?
#> - What are the costs (tokens) associated with each caption? How would you budget a larger batch?
#> - How would you automate quality checks (e.g., filter out captions that are nonsensical or off-brand for your template)?
#>
#> 7) Extensions (optional)
#> - Compare multiple models (gpt-3.5-turbo vs. gpt-4-turbo) and timings.
#> - Add embeddings/semantic similarity to rank captions by relevance to the meme template.
#> - Implement a lightweight human-in-the-loop rubric: students rate a subset of captions; correlate with automated sentiment/length features.
#> - Save results to a tidy CSV for reproducibility:
#> write_csv(captions_df, "meme_captions_openai.csv")
#>
#> 8) Ethical, practical, and safety notes
#> - Create prompts that avoid generating harmful content. Use a system prompt that nudges towards light-hearted, non-abusive humor.
#> - Be mindful of OpenAI usage costs and rate limits; batch calls and include delays.
#> - Do not expose API keys; avoid sharing notebooks that contain keys.
#> - If you plan to publish results, consider licensing and attribution around meme formats and generated content.
#>
#> What to deliver (for students)
#> - A reproducible R script or RMarkdown notebook that:
#> - Sets up packages and API key.
#> - Defines the API wrapper and prompt logic.
#> - Generates captions for a small set of meme templates with several styles.
#> - Produces a tidy data frame of results with usage stats.
#> - Performs basic text analytics (word count, sentiment).
#> - Produces a couple of plots summarizing results.
#> - Includes a short write-up section interpreting findings and discussing limitations.
#>
#> Compact rubric (quick grading guide)
#> - API integration: correct setup and handling of response/usage data (30%)
#> - Data wrangling: clean, tidy data frame with captions and metrics (25%)
#> - Analysis: reasonable use of tidytext for sentiment and basic metrics (20%)
#> - Visualization: clear, informative plots (15%)
#> - Reflection: thoughtful discussion of prompts, costs, limitations (10%)
#>
#> Ready-to-run starter snippet (full script outline)
#> - This is a compact roadmap you can copy into a script and expand.
#>
#> # Setup
#> library(httr); library(jsonlite); library(dplyr); library(purrr); library(tidyr)
#> library(tibble); library(stringr); library(ggplot2); library(tidytext); library(glue)
#>
#> # API helpers (as above)
#> get_openai_response <- function(messages, model = "gpt-3.5-turbo", temperature = 0.6, max_tokens = 180) {
#> api_key <- Sys.getenv("OPENAI_API_KEY")
#> if (api_key == "") stop("OPENAI_API_KEY not set.")
#> url <- "https://api.openai.com/v1/chat/completions"
#> body <- list(model = model, messages = messages, temperature = temperature, max_tokens = max_tokens)
#> res <- POST(url, add_headers(Authorization = paste("Bearer", api_key)),
#> content_type_json(), body = toJSON(body, auto_unbox = TRUE))
#> stop_for_status(res)
#> parsed <- jsonlite::fromJSON(httr::content(res, as = "text", encoding = "UTF-8"))
#> content <- parsed$choices[[1]]$message$content
#> usage <- if (!is.null(parsed$usage)) parsed$usage else NULL
#> list(content = content, usage = usage)
#> }
#> build_messages <- function(template, style) {
#> list(
#> list(role = "system", content = "You are a witty meme caption generator."),
#> list(role = "user", content = glue("Create a short, funny caption for the meme template '{template}'. Style: {style}. Keep under 180 characters."))
#> )
#> }
#> generate_caption <- function(template, style, temperature = 0.6, max_tokens = 180) {
#> msgs <- build_messages(template, style)
#> resp <- get_openai_response(msgs, model = "gpt-3.5-turbo", temperature = temperature, max_tokens = max_tokens)
#> list(caption = str_trim(resp$content), usage = resp$usage)
#> }
#>
#> # Data & run
#> templates <- tibble(template = c("Doge","Distracted Boyfriend","Two Buttons","This Is Fine","Expanding Brain"),
#> description = c("Shiba Inu meme", "Man looks at another woman", "Two buttons choices", "Room on fire", "Brain expansion"))
#>
#> styles <- c("sarcastic","wholesome","absurd")
#> combinations <- templates %>% crossing(style = styles)
#>
#> captions_df <- combinations %>%
#> mutate(res = map2(template, style, ~ generate_caption(.x, .y, temperature = 0.6, max_tokens = 180)),
#> caption = map_chr(res, ~ .x$caption),
#> prompt_tokens = map_int(res, ~ if (!is.null(.x$usage)) .x$usage$prompt_tokens else NA),
#> completion_tokens = map_int(res, ~ if (!is.null(.x$usage)) .x$usage$completion_tokens else NA),
#> total_tokens = map_int(res, ~ if (!is.null(.x$usage)) .x$usage$total_tokens else NA)
#> ) %>% select(-res)
#>
#> # Simple analytics
#> captions_df <- captions_df %>% mutate(caption_id = row_number(), word_count = str_count(caption, "\\S+"))
#> sentiment_df <- tibble(text = captions_df$caption, id = captions_df$caption_id) %>%
#> unnest_tokens(word, text) %>%
#> inner_join(get_sentiments("afinn"), by = "word") %>%
#> group_by(id) %>% summarize(sentiment_score = sum(value, na.rm = TRUE))
#> captions_df <- captions_df %>% left_join(sentiment_df, by = c("caption_id" = "id"))
#>
#> # Visualize
#> ggplot(captions_df, aes(x = template, y = word_count, fill = factor(ifelse(is.na(sentiment_score), "unknown", ifelse(sentiment_score>0,"positive","negative"))))) +
#> geom_bar(stat = "identity", position = "dodge") + coord_flip() +
#> labs(title = "Caption Length by Meme Template and sentiment", x = "Meme Template", y = "Word Count", fill = "Sentiment") +
#> theme_minimal()
#>
#> This activity provides a hands-on, meme-flavored way to practice OpenAI API integration with the Tidyverse, while also teaching students how to structure experiments, analyze results, and reflect on the tradeoffs of prompt design and API usage. If you want a more advanced version, you can add embeddings-based ranking, multi-turn prompts, or a human-in-the-loop evaluation.
100.6 Best Practices
- Always keep your API key secure and never hard-code it in your scripts.
- Monitor token usage to manage costs effectively.
- Handle errors gracefully to ensure your application remains robust.
- Use batching and throttling to manage multiple requests and respect rate limits.
- Regularly check OpenAI’s API documentation for updates and changes to endpoints, parameters, and best practices.
100.7 Conclusion
In this guide, we’ve covered how to generate text using OpenAI’s API in R. We’ve defined a function to interact with the API, handled responses, extracted generated text, monitored token usage, and processed multiple requests. We’ve also discussed error handling, rate limiting, and best practices for working with the API.
Now that you have the basics down, you can start experimenting with different prompts, models, and applications. The OpenAI API is powerful and flexible, allowing you to integrate AI capabilities into a wide range of projects, from chatbots to content generation to data analysis. Happy coding!