97 ACT: Working with OpenAI’s API

This module introduces the basics of interacting with OpenAI’s API from R. We’ll explore how to make API calls, handle responses, and integrate AI capabilities into data science workflows. You can find the API documentation here and the R package documentation for httr and jsonlite for making HTTP requests and handling JSON data.

97.1 Getting Started

First, we need to load the required packages:

97.1.1 API Authentication

To use OpenAI’s API, you’ll need an API key. Like we learned with other APIs, it’s important to keep this secure:

# Store API key securely (NEVER commit to Git!)
openai_api_key <- readLines("path/to/api_key.txt")

97.1.2 Making API Requests

The core workflow involves:

Constructing the API request
Sending it to OpenAI’s endpoint
Processing the response

Next, we define a function to generate text using OpenAI’s API. The function takes a prompt as input and returns the generated text.

Here’s a basic function for text generation:

generate_text <- function(prompt, model = "gpt-5-nano", max_output_tokens = 200) {
  response <- POST(
    # curl https://api.openai.com/v1/chat/completions
    url = "https://api.openai.com/v1/chat/completions",
    # -H "Authorization: Bearer $OPENAI_API_KEY"
    add_headers(Authorization = paste("Bearer", openai_api_key)),
    # -H "Content-Type: application/json"
    content_type_json(),
    # -d '{
    #   "model": "gpt-5-nano",
    #   "messages": [{"role": "user", "content": "What is a banana?"}]
    # }'
    encode = "json",
    body = list(
      model = model,
      messages = list(list(role = "user", content = prompt,
                           max_output_tokens = max_output_tokens
                           ))
    )
  )

  str_content <- content(response, "text", encoding = "UTF-8")
  parsed <- fromJSON(str_content)

  # return(parsed$choices[[1]]$text)
  return(parsed)
}

I have included comments in the code to show how the API request corresponds to a typical curl command you might use in the terminal.

97.2 Example Usage and Handling the Response

Now that we’ve defined our generate_text() function, let’s test it by sending a request to OpenAI’s API and working with the response.

97.2.1 Step 1: Send a Request

prompt <- "Write a haiku about data science."
generated_text <- generate_text(prompt)

97.2.2 Step 2: Examine the Raw API Response

When we call the generate_text(prompt) function, OpenAI’s API returns a structured response in JSON format, which R reads as a list. This response contains multiple components, but the most important part is the generated text.

Let’s print the raw response to see its structure.

print(generated_text)
#> $id
#> [1] "chatcmpl-DIi1whGCdWHMnactoM864UX0CPz5G"
#> 
#> $object
#> [1] "chat.completion"
#> 
#> $created
#> [1] 1773351528
#> 
#> $model
#> [1] "gpt-5-nano-2025-08-07"
#> 
#> $choices
#>   index message.role
#> 1     0    assistant
#>                                                              message.content
#> 1 Data hums softly\nPatterns emerge from the noise\nInsights bloom in graphs
#>   message.refusal message.annotations finish_reason
#> 1              NA                NULL          stop
#> 
#> $usage
#> $usage$prompt_tokens
#> [1] 14
#> 
#> $usage$completion_tokens
#> [1] 1304
#> 
#> $usage$total_tokens
#> [1] 1318
#> 
#> $usage$prompt_tokens_details
#> $usage$prompt_tokens_details$cached_tokens
#> [1] 0
#> 
#> $usage$prompt_tokens_details$audio_tokens
#> [1] 0
#> 
#> 
#> $usage$completion_tokens_details
#> $usage$completion_tokens_details$reasoning_tokens
#> [1] 1280
#> 
#> $usage$completion_tokens_details$audio_tokens
#> [1] 0
#> 
#> $usage$completion_tokens_details$accepted_prediction_tokens
#> [1] 0
#> 
#> $usage$completion_tokens_details$rejected_prediction_tokens
#> [1] 0
#> 
#> 
#> 
#> $service_tier
#> [1] "default"
#> 
#> $system_fingerprint
#> NULL

As you can see, the response is a nested list containing various metadata (e.g., request ID, model name, creation time), the AI-generated response (inside $choices[[1]]$message$content), token usage information (inside $usage$total_tokens), and more.

97.2.3 Step 3: Extract the AI-Generated Text

Since the response contains both metadata and content, we need to extract only the generated text. The key part of the response is stored in:

ai_response <- generated_text$choices$message$content

Now, let’s print the AI-generated text:

print(ai_response)
#> [1] "Data hums softly\nPatterns emerge from the noise\nInsights bloom in graphs"

Ok, so that wasn’t really readable. Let’s try to format it a bit better:

cat(ai_response, sep = "\n")

Data hums softly Patterns emerge from the noise Insights bloom in graphs

Now we can see the haiku about data science that the model generated in response to our prompt. This is the core workflow for interacting with OpenAI’s API: send a request, receive a structured response, and extract the relevant content for use in your applications.

97.2.4 Step 4: Understanding Token Usage

Since OpenAI charges based on token usage, it’s useful to monitor how many tokens are used per request. The API response includes:

usage$prompt_tokens → Tokens in the input prompt
usage$completion_tokens → Tokens generated by the model
usage$total_tokens → The total token count for billing

To check token usage:

print(generated_text$usage$total_tokens) # Total tokens used
#> [1] 1318
print(generated_text$usage$completion_tokens) # Tokens used for output
#> [1] 1304
print(generated_text$usage$prompt_tokens) # Tokens used for input
#> [1] 14

The token usage information can help you optimize your prompts and manage costs when using the API.

97.3 Error Handling

Like we’ve seen with other APIs, it’s important to handle errors gracefully. As with any API call, errors can occur due to network issues, invalid requests, or rate limits. To ensure our script doesn’t crash, we can wrap API calls in tryCatch():

generate_text_safe <- function(prompt) {
  tryCatch(
    {
      generate_text(prompt)
    },
    error = function(e) {
      warning("API call failed: ", e$message)
      return(NULL)
    }
  )
}

Now, we can use generate_text_safe() to handle errors. If an error occurs, the function will return NULL and print a warning message.

97.4 Processing Multiple Requests

When working with multiple prompts, we can use purrr::map_chr() to process them efficiently:

library(purrr)
prompts <- c(
  "Define p-value",
  "Explain Type I error",
  "What is statistical power?"
)
responses <- list()
responses <- map(prompts, generate_text_safe)

This code generates text for each prompt in the prompts vector. If an error occurs, the response will be NULL. After running this code, we can examine the responses and handle any errors. I’ve included a table below to display the responses.

As you can see, the table displays the prompts, AI-generated responses, token usage, model name, and completion time for each request. This information can help us monitor the API usage and response quality.

97.4.1 Rate Limiting

OpenAI has rate limits we need to respect. We can add delays between requests to avoid exceeding these limits. Here’s a throttled version of the generate_text() function:

generate_text_throttled <- function(prompt) {
  Sys.sleep(1) # Wait 1 second between requests
  generate_text_safe(prompt)
}

This function adds a 1-second delay between requests to avoid exceeding OpenAI’s rate limits. You can adjust the delay as needed based on the API’s rate limits.

97.5 Your Turn!

Now it’s your turn to experiment with the OpenAI API! Try different prompts, explore various models, and see how you can integrate AI-generated text into your projects. Remember to monitor your token usage and handle errors gracefully as you work with the API.

I’ve crafted a prompt to generate your very own activity for this module. You can modify the prompt to create different activities or explore other topics. Here’s the prompt I used:

activity_prompt <- "Create a meme-tastic data science activity for graduate students learning about using OpenAI's API in Tidyverse. The activity should involve making API calls, handling responses, and analyzing the results. Include clear, concise instructions and learning objectives."


activity_response <- generate_text(activity_prompt, 
                                   model = "gpt-5-nano", 
                                   max_output_tokens = 4000)

writeLines(activity_response$choices$message$content, "includes/activity.txt")

Because the response is quite long (at 7822 tokens), I’ve written it to a text file in the includes directory. You can open that file to see the generated activity. The activity is designed to help students learn how to use OpenAI’s API in R, including making API calls, handling responses, and analyzing results. It includes clear instructions and learning objectives to guide students through the process.

You may notice that the activity is generated each time I render this book. If you want to keep a specific version of the activity, you can find it in the commit history of the includes/activity.txt file in the GitHub repository for this book. You can also modify the prompt to generate a new activity or explore different topics as you see fit. Happy experimenting!

Remember that this activity is generated by the OpenAI API, so it requires careful review and editing to ensure it is accurate, clear, and appropriate. Always review AI-generated content before using it. This advice is especially important in an educational setting to ensure it meets your standards and learning objectives. Don’t be just an AI passenger. Trust but verify, as they say.

Click to see the generated activity

#> Title: Meme-tastic Data Science with OpenAI API and the Tidyverse
#> 
#> Overview
#> A hands-on graduate activity that blends OpenAI API usage with tidyverse data wrangling, prompting, and text analysis. You’ll:
#> - make API calls to generate meme captions
#> - handle and parse API responses robustly
#> - analyze and visualize the generated captions (length, sentiment, word usage)
#> - reflect on prompt design, reproducibility, and cost considerations
#> 
#> Learning objectives
#> - Learn to call OpenAI’s API from R (authentication, request formatting, response parsing)
#> - Build tidyverse pipelines to manage prompts, responses, and metadata
#> - Experiment with prompt design and model parameters (temperature, max_tokens)
#> - Analyze generated text: length, sentiment, lexical diversity; create informative visuals
#> - Practice error handling, rate limiting, and reproducibility in API workflows
#> - Discuss ethical considerations, bias, safety, and cost when using LLMs
#> 
#> Prerequisites
#> - R and RStudio (or equivalent)
#> - Basic familiarity with tidyverse (dplyr, tibble, purrr, ggplot2)
#> - OpenAI API key (set as OPENAI_API_KEY in your environment)
#> - Internet access for API calls
#> 
#> Materials
#> - A small dataset of meme templates and themes (provided below)
#> - R script or R Markdown to run the activity
#> - Optional: Shiny app or R Markdown HTML report to showcase results
#> 
#> Dataset: meme templates and themes
#> The dataset has 8 meme templates and themes. You can expand or modify as you like.
#> 
#> - id: 1, template: "Drake Hotline Bling", theme: "grad school life vs. literature review"
#> - id: 2, template: "Distracted Boyfriend", theme: "choosing between R and Python for a project"
#> - id: 3, template: "Two Buttons", theme: "finding time for open science vs. chasing novelty"
#> - id: 4, template: "Expanding Brain", theme: "data wrangling stages: messy to tidy"
#> - id: 5, template: "Success Kid", theme: "getting a reproducible workflow on the first try"
#> - id: 6, template: "Mocking Spongebob", theme: "peers underestimating your model choice"
#> - id: 7, template: "Change My Mind", theme: "which visualization is most honest for your results?"
#> - id: 8, template: "Mocking Twitter", theme: "pre-registered vs. exploratory analyses"
#> 
#> Instructions (step-by-step)
#> 1) Setup (10–15 minutes)
#> - Install and load necessary packages
#> - Set the OpenAI API key
#> - Prepare the dataset
#> 
#> Code (you can copy/paste; replace with your own key or rely on an env var)
#> - Install packages (first time only)
#> install.packages(c("httr", "jsonlite", "tidyverse", "tidytext", "purrr", "stringr", "textdata", "ggplot2", "readr"))
#> 
#> - Load libraries
#> library(httr)
#> library(jsonlite)
#> library(tidyverse)
#> library(tidytext)
#> library(purrr)
#> 
#> - Set API key (don’t commit keys)
#> Sys.setenv(OPENAI_API_KEY = Sys.getenv("OPENAI_API_KEY"))  # ensure env var is set in your session
#> 
#> - Create the meme dataset
#> meme_df <- tibble(
#>   id = 1:8,
#>   template = c("Drake Hotline Bling","Distracted Boyfriend","Two Buttons","Expanding Brain",
#>                "Success Kid","Mocking Spongebob","Change My Mind","Mocking Twitter"),
#>   theme = c("grad school life vs. literature review",
#>             "choosing between R and Python for a project",
#>             "finding time for open science vs. chasing novelty",
#>             "data wrangling stages: messy to tidy",
#>             "getting a reproducible workflow on the first try",
#>             "peers underestimating your model choice",
#>             "which visualization is most honest for your results?",
#>             "pre-registered vs. exploratory analyses")
#> )
#> 
#> 2) Define a robust API call function (15–25 minutes)
#> What you’re doing: send a prompt to OpenAI’s chat/completions API, receive a caption, and store usage details for cost awareness.
#> 
#> Code: create a reusable OpenAI chat helper (with simple retry)
#> - You can tweak model, temperature, and max_tokens as part of experimentation.
#> 
#> openai_chat <- function(prompt_text, model = "gpt-3.5-turbo", temperature = 0.6, max_tokens = 60, max_retries = 3) {
#>   url <- "https://api.openai.com/v1/chat/completions"
#>   messages <- list(
#>     list(role = "system", content = "You are a witty meme caption generator that creates short, clean captions suitable for meme templates."),
#>     list(role = "user", content = prompt_text)
#>   )
#>   body <- list(model = model,
#>                messages = messages,
#>                temperature = temperature,
#>                max_tokens = max_tokens)
#>   for (i in seq_len(max_retries)) {
#>     resp <- httr::POST(url,
#>                        httr::add_headers(Authorization = paste("Bearer", Sys.getenv("OPENAI_API_KEY")),
#>                                          "Content-Type" = "application/json"),
#>                        body = jsonlite::toJSON(body, auto_unbox = TRUE),
#>                        encode = "json")
#>     if (httr::http_error(resp)) {
#>       message("OpenAI request failed: ", httr::http_status(resp)$message)
#>       Sys.sleep(2^(i))  # exponential backoff
#>       next
#>     } else {
#>       content_resp <- httr::content(resp, as = "parsed", type = "application/json")
#>       caption <- content_resp$choices[[1]]$message$content
#>       usage <- content_resp$usage
#>       return(list(caption = caption, usage = usage, raw = content_resp))
#>     }
#>   }
#>   stop("OpenAI API request failed after retries.")
#> }
#> 
#> 3) Prompt design and generation (30–45 minutes)
#> What you’ll do:
#> - For each meme row, build a concise prompt that describes the template and the theme.
#> - Call the API to generate a caption.
#> - Collect results into a tidy data frame.
#> 
#> Code: generate captions for all memes
#> generate_caption_row <- function(id, template, theme, temperature = 0.6, max_tokens = 60) {
#>   prompt <- paste0("Generate a concise, witty caption for the '", template, "' meme template about: ", theme,
#>                    ". Keep it under 12 words, avoid profanity, and aim for humor that a graduate student would appreciate.")
#>   out <- openai_chat(prompt, model = "gpt-3.5-turbo", temperature = temperature, max_tokens = max_tokens)
#>   tibble(
#>     id = id,
#>     template = template,
#>     theme = theme,
#>     caption = str_trim(out$caption),
#>     prompt_tokens = as.integer(out$usage$prompt_tokens),
#>     completion_tokens = as.integer(out$usage$completion_tokens),
#>     total_tokens = as.integer(out$usage$total_tokens)
#>   )
#> }
#> 
#> captions_df <- meme_df %>% mutate(across(everything(), as.character)) %>% 
#>   pmap_dfr(~ generate_caption_row(..1, ..2, ..3, temperature = 0.6, max_tokens = 60))
#> 
#> 4) Basic cleaning and metadata (5–10 minutes)
#> - Normalize captions (trim whitespace)
#> - Compute caption length in characters and words
#> - Optionally remove or flag captions that look empty or too long
#> 
#> Code:
#> captions_df <- captions_df %>%
#>   mutate(
#>     caption = str_squish(caption),
#>     word_count = str_count(caption, "\\S+"),
#>     char_count = nchar(caption)
#>   )
#> 
#> 5) Text analysis: sentiment and lexical patterns (20–30 minutes)
#> What you’ll do:
#> - Use a simple sentiment lexicon (AFINN) via tidytext to estimate caption sentiment.
#> - Explore word frequencies across all captions (after removing stop words).
#> 
#> Code:
#> # sentiment using AFINN
#> afinn <- get_sentiments("afinn")
#> 
#> # tokenize captions and compute sentiment per caption
#> word_sentiment <- captions_df %>%
#>   unnest_tokens(word, caption) %>%
#>   anti_join(stop_words, by = "word") %>%
#>   inner_join(afinn, by = "word") %>%
#>   group_by(id) %>%
#>   summarise(sentiment = sum(value), .groups = "drop")
#> 
#> captions_df <- captions_df %>% left_join(word_sentiment, by = "id")
#> 
#> # overall word frequencies (excluding stop words)
#> word_freq <- captions_df %>%
#>   unnest_tokens(word, caption) %>%
#>   anti_join(stop_words, by = "word") %>%
#>   count(word, sort = TRUE)
#> 
#> 6) Visualization and interpretation (20–40 minutes)
#> What you’ll produce:
#> - Histogram of caption lengths (characters or words)
#> - Bar chart of top 15 words across all captions
#> - Scatter or hexbin plot of caption length vs. sentiment
#> 
#> Code (examples):
#> # Caption length distribution
#> ggplot(captions_df, aes(x = word_count)) +
#>   geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
#>   labs(title = "Caption Word Count Distribution", x = "Word count", y = "Frequency") +
#>   theme_minimal()
#> 
#> # Top words
#> top_words <- word_freq %>% head(15)
#> ggplot(top_words, aes(x = reorder(word, n), y = n)) +
#>   geom_col(fill = "tomato") +
#>   coord_flip() +
#>   labs(title = "Top Words in Generated Captions", x = "Word", y = "Frequency") +
#>   theme_minimal()
#> 
#> # Caption length vs. sentiment
#> ggplot(captions_df, aes(x = char_count, y = sentiment)) +
#>   geom_point() +
#>   geom_smooth(method = "lm", se = FALSE, color = "blue") +
#>   labs(title = "Caption Length vs. Sentiment", x = "Character count", y = "Sentiment") +
#>   theme_minimal()
#> 
#> 7) Reflection and discussion (15–20 minutes)
#> Prompts to discuss in groups or write as short answers:
#> - How did temperature affect caption creativity vs. coherence?
#> - Are certain meme templates more prone to specific sentiment or word usage?
#> - How might you improve prompts to reduce ambiguity or improve rating consistency?
#> - What are the ethical considerations when generating humorous content with LLMs (bias, stereotypes, moderation)?
#> - How would you reproduce this workflow in a classroom vs. in a lab with real cost constraints?
#> 
#> 8) Deliverables
#> - A single R script or R Markdown document with:
#>   - Setup, API call helper, and caption generation
#>   - The tidyverse pipeline building the results
#>   - Text analysis and visualizations
#>   - Brief discussion prompts and reflection
#> - Optional: a small Shiny app or an HTML report summarizing results and memes
#> 
#> Extensions (optional, for more advanced work)
#> - Experiment with different models (gpt-4, gpt-3.5-turbo), different temperatures, and different max_tokens; compare results.
#> - Add a prompt-testing loop: generate multiple captions per meme with varying prompts and summarize differences.
#> - Build a Shiny gallery that displays the meme template name, theme, and the generated caption side-by-side, with a user-adjustable temperature slider.
#> - Include bootstrapping or resampling to quantify variability in captions across runs.
#> 
#> Notes and best practices
#> - Cost and rate limits: OpenAI API costs scale with tokens; cap max_tokens and batch requests to stay within budget. Be mindful of rate limits and include retry/backoff logic as shown.
#> - Reproducibility: Set seed where applicable (e.g., if you sample data or randomize prompts). Save results to a CSV/RDS and document session information (R version, package versions).
#> - Ethics and safety: Keep prompts respectful; avoid content that could generate harmful, hateful, or unsafe captions. Consider adding a safety check or moderation step if used in a classroom setting.
#> - Data provenance: Track the exact prompts used for each caption (template, theme, temperature, max_tokens) so results are reproducible.
#> 
#> Example prompts to spark ideas
#> - Prompt: “Generate a short meme caption for the ‘Drake Hotline Bling’ template about grad school life vs. literature review. The caption should be under 12 words and skew toward light-hearted humor.”
#> - Prompt: “For the ‘Distracted Boyfriend’ meme, craft a witty caption about choosing between R and Python in a data science project, focusing on the tension and humor of the choice.”
#> - Prompt: “Create a ‘Two Buttons’ meme caption about balancing open science and novelty in a long-term research project, with a playful, graduate-student tone.”
#> 
#> Closing note
#> This activity blends practical OpenAI API usage with the tidyverse’s strengths in data wrangling and visualization. It’s designed to be engaging (meme-tastic) while teaching robust API handling, data analysis, and critical thinking about prompt design and model behavior. If you’d like, I can tailor the prompts or provide a ready-to-run R Markdown file with the exact dataset and full code.

97.6 Best Practices

Always keep your API key secure and never hard-code it in your scripts.
Monitor token usage to manage costs effectively.
Handle errors gracefully to ensure your application remains robust.
Use batching and throttling to manage multiple requests and respect rate limits.
Regularly check OpenAI’s API documentation for updates and changes to endpoints, parameters, and best practices.

97.7 Conclusion

In this guide, we’ve covered how to generate text using OpenAI’s API in R. We’ve defined a function to interact with the API, handled responses, extracted generated text, monitored token usage, and processed multiple requests. We’ve also discussed error handling, rate limiting, and best practices for working with the API.

Now that you have the basics down, you can start experimenting with different prompts, models, and applications. The OpenAI API is powerful and flexible, allowing you to integrate AI capabilities into a wide range of projects, from chatbots to content generation to data analysis. Happy coding!