98 ACT: Working with OpenAI’s API

This module introduces the basics of interacting with OpenAI’s API from R. We’ll explore how to make API calls, handle responses, and integrate AI capabilities into data science workflows. You can find the API documentation here and the R package documentation for httr and jsonlite for making HTTP requests and handling JSON data.

98.1 Getting Started

First, we need to load the required packages:

98.1.1 API Authentication

To use OpenAI’s API, you’ll need an API key. Like we learned with other APIs, it’s important to keep this secure:

# Store API key securely (NEVER commit to Git!)
openai_api_key <- readLines("path/to/api_key.txt")

98.1.2 Making API Requests

The core workflow involves:

  • Constructing the API request
  • Sending it to OpenAI’s endpoint
  • Processing the response

Next, we define a function to generate text using OpenAI’s API. The function takes a prompt as input and returns the generated text.

Here’s a basic function for text generation:

generate_text <- function(prompt, model = "gpt-5-nano", max_output_tokens = 1000) {
  response <- POST(
    # curl https://api.openai.com/v1/chat/completions
    url = "https://api.openai.com/v1/chat/completions",
    # -H "Authorization: Bearer $OPENAI_API_KEY"
    add_headers(Authorization = paste("Bearer", openai_api_key)),
    # -H "Content-Type: application/json"
    content_type_json(),
    # -d '{
    #   "model": "gpt-5-nano",
    #   "messages": [{"role": "user", "content": "What is a banana?"}]
    # }'
    encode = "json",
    body = list(
      model = model,
      messages = list(list(role = "user", content = prompt,
                           max_output_tokens = max_output_tokens
                           ))
    )
  )

  str_content <- content(response, "text", encoding = "UTF-8")
  parsed <- fromJSON(str_content)

  # return(parsed$choices[[1]]$text)
  return(parsed)
}

I have included comments in the code to show how the API request corresponds to a typical curl command you might use in the terminal.

98.2 Example Usage and Handling the Response

Now that we’ve defined our generate_text() function, let’s test it by sending a request to OpenAI’s API and working with the response.

98.2.1 Step 1: Send a Request

prompt <- "Write a haiku about data science."
generated_text <- generate_text(prompt)

98.2.2 Step 2: Examine the Raw API Response

When we call the generate_text(prompt) function, OpenAI’s API returns a structured response in JSON format, which R reads as a list. This response contains multiple components, but the most important part is the generated text.

Let’s print the raw response to see its structure.

print(generated_text)
#> $id
#> [1] "chatcmpl-DCe9Lcjkv3sabbpva1unEkugqVB1v"
#> 
#> $object
#> [1] "chat.completion"
#> 
#> $created
#> [1] 1771906643
#> 
#> $model
#> [1] "gpt-5-nano-2025-08-07"
#> 
#> $choices
#>   index message.role
#> 1     0    assistant
#>                                                           message.content
#> 1 Data whispers truths\nModels seek the truth within\nData sparks insight
#>   message.refusal message.annotations finish_reason
#> 1              NA                NULL          stop
#> 
#> $usage
#> $usage$prompt_tokens
#> [1] 14
#> 
#> $usage$completion_tokens
#> [1] 1240
#> 
#> $usage$total_tokens
#> [1] 1254
#> 
#> $usage$prompt_tokens_details
#> $usage$prompt_tokens_details$cached_tokens
#> [1] 0
#> 
#> $usage$prompt_tokens_details$audio_tokens
#> [1] 0
#> 
#> 
#> $usage$completion_tokens_details
#> $usage$completion_tokens_details$reasoning_tokens
#> [1] 1216
#> 
#> $usage$completion_tokens_details$audio_tokens
#> [1] 0
#> 
#> $usage$completion_tokens_details$accepted_prediction_tokens
#> [1] 0
#> 
#> $usage$completion_tokens_details$rejected_prediction_tokens
#> [1] 0
#> 
#> 
#> 
#> $service_tier
#> [1] "default"
#> 
#> $system_fingerprint
#> NULL

As you can see, the response is a nested list containing various metadata (e.g., request ID, model name, creation time), the AI-generated response (inside $choices[[1]]$message$content), token usage information (inside \(usage\)total_tokens), and more.

98.2.3 Step 3: Extract the AI-Generated Text

Since the response contains both metadata and content, we need to extract only the generated text. The key part of the response is stored in:

ai_response <- generated_text$choices$message$content

Now, let’s print the AI-generated text:

print(ai_response)
#> [1] "Data whispers truths\nModels seek the truth within\nData sparks insight"

Ok, so that wasn’t really readable. Let’s try to format it a bit better:

cat(ai_response)

Data whispers truths Models seek the truth within Data sparks insight

Now we can see the haiku about data science that the model generated in response to our prompt. This is the core workflow for interacting with OpenAI’s API: send a request, receive a structured response, and extract the relevant content for use in your applications.

98.2.4 Step 4: Understanding Token Usage

Since OpenAI charges based on token usage, it’s useful to monitor how many tokens are used per request. The API response includes:

  • usage$prompt_tokens → Tokens in the input prompt
  • usage$completion_tokens → Tokens generated by the model
  • usage$total_tokens → The total token count for billing

To check token usage:

print(generated_text$usage$total_tokens) # Total tokens used
#> [1] 1254
print(generated_text$usage$completion_tokens) # Tokens used for output
#> [1] 1240
print(generated_text$usage$prompt_tokens) # Tokens used for input
#> [1] 14

The token usage information can help you optimize your prompts and manage costs when using the API.

98.3 Error Handling

Like we’ve seen with other APIs, it’s important to handle errors gracefully. As with any API call, errors can occur due to network issues, invalid requests, or rate limits. To ensure our script doesn’t crash, we can wrap API calls in tryCatch():

generate_text_safe <- function(prompt) {
  tryCatch(
    {
      generate_text(prompt)
    },
    error = function(e) {
      warning("API call failed: ", e$message)
      return(NULL)
    }
  )
}

Now, we can use generate_text_safe() to handle errors. If an error occurs, the function will return NULL and print a warning message.

98.4 Processing Multiple Requests

When working with multiple prompts, we can use purrr::map_chr() to process them efficiently:

library(purrr)
prompts <- c(
  "Define p-value",
  "Explain Type I error",
  "What is statistical power?"
)
responses <- list()
responses <- map(prompts, generate_text_safe)

This code generates text for each prompt in the prompts vector. If an error occurs, the response will be NULL. After running this code, we can examine the responses and handle any errors. I’ve included a table below to display the responses.

As you can see, the table displays the prompts, AI-generated responses, token usage, model name, and completion time for each request. This information can help us monitor the API usage and response quality.

98.4.1 Rate Limiting

OpenAI has rate limits we need to respect. We can add delays between requests to avoid exceeding these limits. Here’s a throttled version of the generate_text() function:

generate_text_throttled <- function(prompt) {
  Sys.sleep(1) # Wait 1 second between requests
  generate_text_safe(prompt)
}

This function adds a 1-second delay between requests to avoid exceeding OpenAI’s rate limits. You can adjust the delay as needed based on the API’s rate limits.

98.5 Your Turn!

Now it’s your turn to experiment with the OpenAI API! Try different prompts, explore various models, and see how you can integrate AI-generated text into your projects. Remember to monitor your token usage and handle errors gracefully as you work with the API.

I’ve crafted a prompt to generate your very own activity for this module. You can modify the prompt to create different activities or explore other topics. Here’s the prompt I used:

activity_prompt <- "Create a meme-tastic data science activity for graduate students learning about using OpenAI's API in Tidyverse. The activity should involve making API calls, handling responses, and analyzing the results. Include clear instructions and learning objectives."


activity_response <- generate_text(activity_prompt, 
                                   model = "gpt-5-nano", 
                                   max_output_tokens = 7000)

writeLines(activity_response$choices$message$content, "includes/activity.txt")

Because the response is quite long (at 8253 tokens), I’ve written it to a text file in the includes directory. You can open that file to see the generated activity. The activity is designed to help students learn how to use OpenAI’s API in R, including making API calls, handling responses, and analyzing results. It includes clear instructions and learning objectives to guide students through the process.

You may notice that the activity is generated each time I render this book. If you want to keep a specific version of the activity, you can find it in the commit history of the includes/activity.txt file in the GitHub repository for this book. You can also modify the prompt to generate a new activity or explore different topics as you see fit. Happy experimenting!

Remember that this activity is generated by the OpenAI API, so it requires careful review and editing to ensure it is accurate, clear, and appropriate. Always review AI-generated content before using it in an educational setting to ensure it meets your standards and learning objectives. Don’t be just an AI passenger. Trust but verify, as they say.

Click to see the generated activity
if(exists("activity_response")) {
  cat(activity_response$choices$message$content, sep = "\n")} else if (
  file.exists("includes/activity.txt")) {
  activity_content <- readLines("includes/activity.txt")
  cat(activity_content, sep = "\n")
} else {
  cat("Activity file not found. Please run the code to generate the activity.")
}
#> Meme-tastic data science activity: OpenAI API + Tidyverse for grad students
#> 
#> Overview
#> - Goal: Learn to call OpenAI's API from R, handle and parse responses, and analyze the outputs inside a tidyverse workflow.
#> - Theme: Meme-inspired prompts to generate concise, educational, and humorous captions. Templates include classic memes like Doge, Distracted Boyfriend, Two-Button, Expanding Brain, Drake Hotline Bling, and Futurama Fry.
#> - Deliverables: A reproducible R workflow (script or R Markdown) that makes API calls, stores results in a tidy data frame, computes simple analytics, and visualizes outputs.
#> 
#> Learning objectives
#> - Practice making REST API calls to OpenAI from R (tidyverse-friendly workflow).
#> - Learn to structure prompts, pass them through a system prompt, and handle responses (text + usage metadata).
#> - Use dplyr, purrr, tidyr, and stringr to wrangle API results into tidy data frames.
#> - Analyze and compare outputs across multiple meme-style prompts (length, token usage, readability proxies).
#> - Create a simple “Memeboard” to showcase best-performing captions and potential improvements.
#> 
#> Prerequisites
#> - R and RStudio (or any editor with R support).
#> - Packages: tidyverse (dplyr, purrr, tidyr, ggplot2, readr), httr, jsonlite, stringr, possibly glue.
#> - OpenAI API key stored as an environment variable OPENAI_API_KEY (recommended) or read from an .Renviron file.
#> - Basic familiarity with the tidyverse and basic API concepts.
#> 
#> Setup (one-time)
#> - Install packages (if needed):
#>   - install.packages(c("tidyverse", "httr", "jsonlite"))
#> - Load libraries:
#>   - library(tidyverse)
#>   - library(httr)
#>   - library(jsonlite)
#> 
#> - Save your OpenAI API key (recommended):
#>   - Sys.setenv(OPENAI_API_KEY = "your-key-here")
#>   - Or place in ~/.Renviron: OPENAI_API_KEY=your-key-here
#> 
#> - OpenAI Chat API endpoint:
#>   - endpoint <- "https://api.openai.com/v1/chat/completions"
#> 
#> - System prompt (shared persona for all prompts):
#>   - system_msg <- "You are a witty data-science meme generator. Produce a single concise meme caption tailored to the given meme template and concept. Keep it under 200 characters. If possible, reference the meme template by name in the caption."
#> 
#> R function to call the API
#> - This function sends a chat-style prompt to gpt-3.5-turbo (or another model), returns the caption and usage data.
#> 
#> - Code (paste into an R script or R Markdown cell):
#> 
#> ```r
#> library(httr)
#> library(jsonlite)
#> library(tidyverse)
#> 
#> endpoint <- "https://api.openai.com/v1/chat/completions"
#> 
#> system_msg <- "You are a witty data-science meme generator. Produce a single concise meme caption tailored to the given meme template and concept. Keep it under 200 characters. If possible, reference the meme template by name in the caption."
#> 
#> make_openai_call <- function(prompt_text, model = "gpt-3.5-turbo", temperature = 0.6, max_tokens = 180) {
#>   body <- list(
#>     model = model,
#>     messages = list(
#>       list(role = "system", content = system_msg),
#>       list(role = "user", content = prompt_text)
#>     ),
#>     temperature = temperature,
#>     max_tokens = max_tokens,
#>     n = 1
#>   )
#>   
#>   res <- POST(
#>     endpoint,
#>     add_headers(Authorization = paste("Bearer", Sys.getenv("OPENAI_API_KEY"))),
#>     content_type_json(),
#>     body = toJSON(body, auto_unbox = TRUE)
#>   )
#>   
#>   stop_for_status(res)
#>   raw <- content(res, as = "text", encoding = "UTF-8")
#>   parsed <- fromJSON(raw, flatten = TRUE)
#>   caption <- parsed$choices[[1]]$message$content
#>   usage <- parsed$usage
#>   list(caption = caption, usage = usage)
#> }
#> ```
#> 
#> Data: define prompts using meme templates and topics
#> - Create a small data frame of meme templates and data-science topics you want to explain or illustrate.
#> 
#> ```r
#> prompts <- tibble(
#>   id = 1:6,
#>   template = c("Doge", "Distracted Boyfriend", "Two-Button", "Expanding Brain", "Drake Hotline Bling", "Futurama Fry"),
#>   topic = c("gradient descent", "overfitting vs. underfitting", "cross-validation", "p-values vs. confidence intervals", "feature engineering vs. baselines", "Bayesian updating")
#> )
#> 
#> prompts <- prompts %>% 
#>   mutate(prompt = paste0(
#>     "Explain ", topic, " in the style of the ", template, " meme. ",
#>     "Provide a single concise caption suitable for a slide or social post. ",
#>     "Reference the meme name where appropriate."
#>   ))
#> ```
#> 
#> - Optional: print prompts to inspect
#> ```r
#> prompts
#> ```
#> 
#> Step-by-step activity (30–60 minutes, depending on group size)
#> 1) Generate meme captions
#> - For every row in prompts, call the API and store the result.
#> 
#> ```r
#> prompts <- prompts %>%
#>   mutate(result = map(prompt, ~ {
#>     Sys.sleep(0.5)  # gently rate-limit if running many prompts
#>     make_openai_call(.x)
#>   }))
#> 
#> # Unnest the results into tidy columns
#> results_tbl <- prompts %>%
#>   mutate(
#>     caption = map_chr(result, "caption"),
#>     usage = map(result, "usage")
#>   ) %>%
#>   select(-result) %>%
#>   unnest_wider(usage)
#> 
#> # If unnest_wider(usage) doesn't work due to structure, you can do:
#> # results_tbl <- prompts %>% mutate(ut = map(usage, ~ as.list(.x))) %>% unnest_wider(ut)
#> ```
#> 
#> 2) Quick quality checks (tidyverse style)
#> - Basic metadata
#> ```r
#> results_tbl <- results_tbl %>%
#>   mutate(
#>     word_count = str_count(caption, "\\S+"),
#>     char_count = str_length(caption)
#>   )
#> ```
#> 
#> - Summaries
#> ```r
#> results_tbl %>% 
#>   select(id, template, topic, caption, word_count, char_count) %>%
#>   arrange(id)
#> ```
#> 
#> 3) Visualize meme captions (analysis and communication)
#> - Caption length by meme template (for engagement intuition)
#> ```r
#> ggplot(results_tbl, aes(x = template, y = word_count, fill = template)) +
#>   geom_col(show.legend = FALSE) +
#>   coord_flip() +
#>   labs(title = "Caption length by meme template",
#>        x = "Meme template",
#>        y = "Word count") +
#>   theme_minimal()
#> ```
#> 
#> - Optional: distribution of token usage vs caption length (if you want to explore API costs)
#> ```r
#> results_tbl %>% 
#>   ggplot(aes(x = total_tokens, y = word_count)) +
#>   geom_point(alpha = 0.8) +
#>   geom_smooth(method = "lm", se = FALSE, color = "steelblue") +
#>   labs(title = "API usage vs caption length",
#>        x = "Total tokens used",
#>        y = "Caption word count")
#> ```
#> Note: The usage data typically includes prompt_tokens, completion_tokens, total_tokens depending on model version.
#> 
#> 4) Create a Memeboard (deliverable)
#> - Save a cute, shareable table of the best captions and their metrics
#> ```r
#> memeboard <- results_tbl %>%
#>   select(id, template, topic, caption, word_count, total_tokens) %>%
#>   arrange(desc(word_count), total_tokens)
#> 
#> write_csv(memeboard, "memeboard.csv")
#> ```
#> 
#> 5) Optional refinement: prompt experiment
#> - If you have budget or want to compare prompts, duplicate the prompt column and add a variant prompt_variant (e.g., more or less humorous, or ask for 2 options). Run API calls again and compare captions and token usage.
#> 
#> 6) Reflection prompts (for students)
#> - Which meme template produced the most informative caption for the concept?
#> - Did any prompts produce overly long captions or ambiguous captions?
#> - How did token usage vary by template or topic? What does this imply for cost-performance trade-offs?
#> 
#> Clean-up and best practices
#> - Respect rate limits and costs: insert small sleeps (e.g., Sys.sleep(0.5) between calls).
#> - Handle errors gracefully: wrap API calls in tryCatch to collect failed prompts for retry.
#> - Keep API keys secure: do not hard-code keys in scripts; rely on environment variables or .Renviron.
#> - Version control: track API prompts and results, but avoid sharing sensitive keys in repos.
#> - Reproducibility: pin a fixed seed for any randomness (if you add random prompts or sampling).
#> 
#> Optional extensions (for more advanced groups)
#> - Use a different model with higher quality (e.g., gpt-4-turbo if available; note cost).
#> - Run a mini A/B test: compare two different system prompts or two different prompts for the same topic; analyze which produces captions with higher readability or perceived humor (students can rate captions on a scale).
#> - Add sentiment or readability proxy metrics:
#>   - Basic readabilty proxy: word_count, sentence_count (if captions contain punctuation), average word length.
#>   - If you want more, bring in a simple Flesch-Kincaid-like proxy or use an external package for readability (e.g., quanteda or textstat, if you install extra packages).
#> 
#> Deliverables for the course
#> - A single R script or R Markdown notebook containing:
#>   - Setup code (library load, API key handling, endpoint, system prompt).
#>   - Data frame of prompts (meme templates and topics).
#>   - The API call function (make_openai_call).
#>   - The tidyverse pipeline to collect, unnest, and analyze results.
#>   - Visualizations (caption length by meme template, token usage vs length).
#>   - The Memeboard export (CSV) with the best captions and metrics.
#>   - Short reflective prompts or discussion questions.
#> 
#> Assessment rubric (quick guide)
#> - API integration (40%): correct API call wiring, error handling, and response parsing.
#> - Data wrangling (25%): tidy data frame construction, proper use of purrr and tidyr to flatten nested results.
#> - Analysis and visualization (20%): meaningful metrics (length, token usage) and clear visuals.
#> - Reproducibility and hygiene (10%): secure key handling, clear setup instructions, and an executable script.
#> - Creativity and engagement (5%): meme-tastic prompts that are both entertaining and educational.
#> 
#> Example outputs (mocked for illustration)
#> - Caption example 1 (Doge, gradient descent): "Much descent. Very converge. Wow, optimization done."
#> - Caption example 2 (Distracted Boyfriend, overfitting): "Boyfriend = model; Girlfriend = training data; New girl = real world. Overfit, whoops."
#> - Caption example 3 (Two-Button, cross-validation): "Yes: Reboot model with CV. No: Stay with train/test split."
#> - Caption example 4 (Expanding Brain, p-values): "Brain grows: p < .05 means significance. Brain bigger: confidence intervals tell the story."
#> 
#> Safety and costs note
#> - OpenAI API usage costs can accumulate quickly with many prompts; manage prompts and max_tokens intentionally.
#> - Do not share or hard-code API keys; use environment variables.
#> - Ensure prompts adhere to your institution’s safety and ethics guidelines; consider adding a brief discussion on bias, reliability, and the limitations of AI-generated explanations.
#> 
#> If you’d like, I can tailor the prompts to your graduate course topics (e.g., statistics, ML theory, NLP, causal inference) or provide a ready-to-run R Markdown file with all steps pre-woven.

98.6 Best Practices

  • Always keep your API key secure and never hard-code it in your scripts.
  • Monitor token usage to manage costs effectively.
  • Handle errors gracefully to ensure your application remains robust.
  • Use batching and throttling to manage multiple requests and respect rate limits.
  • Regularly check OpenAI’s API documentation for updates and changes to endpoints, parameters, and best practices.

98.7 Conclusion

In this guide, we’ve covered how to generate text using OpenAI’s API in R. We’ve defined a function to interact with the API, handled responses, extracted generated text, monitored token usage, and processed multiple requests. We’ve also discussed error handling, rate limiting, and best practices for working with the API.

Now that you have the basics down, you can start experimenting with different prompts, models, and applications. The OpenAI API is powerful and flexible, allowing you to integrate AI capabilities into a wide range of projects, from chatbots to content generation to data analysis. Happy coding!