92 Working with OpenAI’s API
This module introduces the basics of interacting with OpenAI’s API from R. We’ll explore how to make API calls, handle responses, and integrate AI capabilities into data science workflows.
92.1 Getting Started
First, we need to load the required packages:
92.1.1 API Authentication
To use OpenAI’s API, you’ll need an API key. Like we learned with other APIs, it’s important to keep this secure:
# Store API key securely (NEVER commit to Git!)
openai_api_key <- readLines("path/to/api_key.txt")
92.1.2 Making API Requests
The core workflow involves:
- Constructing the API request
- Sending it to OpenAI’s endpoint
- Processing the response
Next, we define a function to generate text using OpenAI’s API. The function takes a prompt as input and returns the generated text.
Here’s a basic function for text generation:
generate_text <- function(prompt) {
response <- POST(
# curl https://api.openai.com/v1/chat/completions
url = "https://api.openai.com/v1/chat/completions",
# -H "Authorization: Bearer $OPENAI_API_KEY"
add_headers(Authorization = paste("Bearer", openai_api_key)),
# -H "Content-Type: application/json"
content_type_json(),
# -d '{
# "model": "gpt-3.5-turbo",
# "messages": [{"role": "user", "content": "What is a banana?"}]
# }'
encode = "json",
body = list(
model = "gpt-3.5-turbo",
messages = list(list(role = "user", content = prompt))
)
)
str_content <- content(response, "text", encoding = "UTF-8")
parsed <- fromJSON(str_content)
#return(parsed$choices[[1]]$text)
return(parsed)
}
92.2 Example Usage and Handling the Response
Now that we’ve defined our generate_text() function, let’s test it by sending a request to OpenAI’s API and working with the response.
92.2.2 Step 2: Examine the Raw API Response
When we call the generate_text(prompt)
function, OpenAI’s API returns a structured response in JSON format, which R reads as a list. This response contains multiple components, but the most important part is the generated text.
Let’s print the raw response to see its structure.
print(generated_text)
#> $id
#> [1] "chatcmpl-B4CBXmLrA4uHUJR7Rdl94BxVjoQcI"
#>
#> $object
#> [1] "chat.completion"
#>
#> $created
#> [1] 1740339851
#>
#> $model
#> [1] "gpt-3.5-turbo-0125"
#>
#> $choices
#> index message.role
#> 1 0 assistant
#> message.content
#> 1 1. Define the problem: clearly identify the business problem or question that needs to be addressed with data analysis.\n\n2. Collect and prepare data: gather relevant data sources and clean, preprocess, and transform the data to make it usable for analysis.\n\n3. Explore and analyze data: conduct exploratory data analysis to understand patterns and relationships in the data, and use statistical and visualization techniques to gain insights.\n\n4. Model development: develop predictive models using machine learning algorithms to make predictions or find patterns in the data.\n\n5. Model evaluation: evaluate the performance of the models using metrics such as accuracy, precision, and recall to assess their effectiveness.\n\n6. Deployment: deploy the model in production to make predictions or recommendations for decision-making.\n\n7. Monitor and iterate: continuously monitor the model's performance and make improvements or updates as needed based on new data or changing requirements.
#> message.refusal logprobs finish_reason
#> 1 NA NA stop
#>
#> $usage
#> $usage$prompt_tokens
#> [1] 19
#>
#> $usage$completion_tokens
#> [1] 174
#>
#> $usage$total_tokens
#> [1] 193
#>
#> $usage$prompt_tokens_details
#> $usage$prompt_tokens_details$cached_tokens
#> [1] 0
#>
#> $usage$prompt_tokens_details$audio_tokens
#> [1] 0
#>
#>
#> $usage$completion_tokens_details
#> $usage$completion_tokens_details$reasoning_tokens
#> [1] 0
#>
#> $usage$completion_tokens_details$audio_tokens
#> [1] 0
#>
#> $usage$completion_tokens_details$accepted_prediction_tokens
#> [1] 0
#>
#> $usage$completion_tokens_details$rejected_prediction_tokens
#> [1] 0
#>
#>
#>
#> $service_tier
#> [1] "default"
#>
#> $system_fingerprint
#> NULL
As you can see, the response is a nested list containing various metadata (e.g., request ID, model name, creation time), the AI-generated response (inside $choices[[1]]$message$content
), token usage information (inside \(usage\)total_tokens), and more.
92.2.3 Step 3: Extract the AI-Generated Text
Since the response contains both metadata and content, we need to extract only the generated text. The key part of the response is stored in:
Now, let’s print the AI-generated text:
print(ai_response)
#> [1] "1. Define the problem: clearly identify the business problem or question that needs to be addressed with data analysis.\n\n2. Collect and prepare data: gather relevant data sources and clean, preprocess, and transform the data to make it usable for analysis.\n\n3. Explore and analyze data: conduct exploratory data analysis to understand patterns and relationships in the data, and use statistical and visualization techniques to gain insights.\n\n4. Model development: develop predictive models using machine learning algorithms to make predictions or find patterns in the data.\n\n5. Model evaluation: evaluate the performance of the models using metrics such as accuracy, precision, and recall to assess their effectiveness.\n\n6. Deployment: deploy the model in production to make predictions or recommendations for decision-making.\n\n7. Monitor and iterate: continuously monitor the model's performance and make improvements or updates as needed based on new data or changing requirements."
Ok, so that wasn’t really readable. Let’s try to format it a bit better:
Define the problem: clearly identify the business problem or question that needs to be addressed with data analysis.
Collect and prepare data: gather relevant data sources and clean, preprocess, and transform the data to make it usable for analysis.
Explore and analyze data: conduct exploratory data analysis to understand patterns and relationships in the data, and use statistical and visualization techniques to gain insights.
Model development: develop predictive models using machine learning algorithms to make predictions or find patterns in the data.
Model evaluation: evaluate the performance of the models using metrics such as accuracy, precision, and recall to assess their effectiveness.
Deployment: deploy the model in production to make predictions or recommendations for decision-making.
Monitor and iterate: continuously monitor the model’s performance and make improvements or updates as needed based on new data or changing requirements.
92.2.4 Step 4: Understanding Token Usage
Since OpenAI charges based on token usage, it’s useful to monitor how many tokens are used per request. The API response includes:
- usage$prompt_tokens → Tokens in the input prompt
- usage$completion_tokens → Tokens generated by the model
- usage$total_tokens → The total token count for billing
To check token usage:
92.3 Error Handling
Like we’ve seen with other APIs, it’s important to handle errors gracefully. As with any API call, errors can occur due to network issues, invalid requests, or rate limits. To ensure our script doesn’t crash, we can wrap API calls in tryCatch()
:
generate_text_safe <- function(prompt) {
tryCatch({
generate_text(prompt)
}, error = function(e) {
warning("API call failed: ", e$message)
return(NULL)
})
}
Now, we can use generate_text_safe()
to handle errors. If an error occurs, the function will return NULL
and print a warning message.
92.4 Processing Multiple Requests
When working with multiple prompts, we can use purrr::map_chr()
to process them efficiently:
library(purrr)
prompts <- c(
"Define p-value",
"Explain Type I error",
"What is statistical power?"
)
responses <- list()
responses <- map(prompts, generate_text_safe)
This code generates text for each prompt in the prompts
vector. If an error occurs, the response will be NULL
. After running this code, we can examine the responses and handle any errors. I’ve included a table below to display the responses.
As you can see, the table displays the prompts, AI-generated responses, token usage, model name, and completion time for each request. This information can help us monitor the API usage and response quality.
92.4.1 Rate Limiting
OpenAI has rate limits we need to respect. We can add delays between requests to avoid exceeding these limits. Here’s a throttled version of the generate_text()
function:
generate_text_throttled <- function(prompt) {
Sys.sleep(1) # Wait 1 second between requests
generate_text_safe(prompt)
}
This function adds a 1-second delay between requests to avoid exceeding OpenAI’s rate limits. You can adjust the delay as needed based on the API’s rate limits.
92.5 Conclusion
In this guide, we’ve covered how to generate text using OpenAI’s GPT-3 API in R. We’ve defined a function to interact with the API, handled responses, extracted generated text, monitored token usage, and processed multiple requests. We’ve also discussed error handling, rate limiting, and best practices for working with the API. By following these steps, you can effectively use OpenAI’s GPT-3 API to generate text in R for various applications. For the curious, yes, these prompts and responses are generated using the OpenAI API every time you render this notebook.