Interacting with APIs: Example with Spoonacular Food and Recipe API

Rina Deka 06/25/2023

Requirements
API Interaction Functions
- getRecipesByDiet
- getIngredientsByQuery
Exploratory Data Analysis
Conclusion

This document is a vignette to demonstrate how one may retrieve data from an API. In this tutorial, I will be interacting with Spoonacular’s food API. I have created a few functions to interact with a few endpoints for user ease, and I’ll use these to explore some of the data that was retrieved.

Requirements

To use the functions for interacting with the food API, I used the following packages:

httr : This is a package that contains useful tools for Working with URLs and HTTP, organised by HTTP verbs (such as GET() and POST()). We use this for the URL call GET() as indicated in the spoonacular documentation.
jsonlite: API interaction
tidyverse: A package which is in and of itself a set of packages that help you manipulate and visualize data.

Please note that the API key I used was “987c314948d14831ac64f7edaa24a25c”, but yours will be different.

myKey = "987c314948d14831ac64f7edaa24a25c"

API Interaction Functions

This is where I created functions for interacting with spoonacular’s complex search AP, which is itself a combination of searching by query, ingredients, and by nutrients into a single endpoint. I also created functions for interacting with spoonacular’s Glycemic Load calculator for ingredients. This way, you can parse through recipe data based on certain conditions (such as a dietary restriction), look at the nutritional information, and then also try to find grocery products that might suit your recipe.

`getRecipesByDiet`

Suppose that I would like to create a function that will return well-parsed data so that one can look at the corresponding ingredient and nutrition information for a recipe given some sort of dietary restriction. The function below allows the user to get pertinent information about recipes with the dietary restriction in question.

A full list of supported diets is available here.

Spoonacular defines the diets as such: > Diet Definitions Every API endpoint asking for an diet parameter can be fed with any of these diets. Gluten Free: Eliminating gluten means avoiding wheat, barley, rye, and other gluten-containing grains and foods made from them (or that may have been cross contaminated). Ketogenic: The keto diet is based more on the ratio of fat, protein, and carbs in the diet rather than specific ingredients. Generally speaking, high fat, protein-rich foods are acceptable and high carbohydrate foods are not. The formula we use is 55-80% fat content, 15-35% protein content, and under 10% of carbohydrates. Vegetarian: No ingredients may contain meat or meat by-products, such as bones or gelatin. Lacto-Vegetarian: All ingredients must be vegetarian and none of the ingredients can be or contain egg. Ovo-Vegetarian: All ingredients must be vegetarian and none of the ingredients can be or contain dairy. Vegan: No ingredients may contain meat or meat by-products, such as bones or gelatin, nor may they contain eggs, dairy, or honey. Pescetarian: Everything is allowed except meat and meat by-products - some pescetarians eat eggs and dairy, some do not. Paleo: Allowed ingredients include meat (especially grass fed), fish, eggs, vegetables, some oils (e.g. coconut and olive oil), and in smaller quantities, fruit, nuts, and sweet potatoes. We also allow honey and maple syrup (popular in Paleo desserts, but strict Paleo followers may disagree). Ingredients not allowed include legumes (e.g. beans and lentils), grains, dairy, refined sugar, and processed foods. Primal: Very similar to Paleo, except dairy is allowed - think raw and full fat milk, butter, ghee, etc.

Thus, the valid options for the “diet” argument include any of the above in a string. Note that strings such as “paleolithic” work for “paleo, and so on. Note that vegetarian implies lacto-ovo vegetarian.

getRecipesByDiet <- function(diet,number=10,apiKey="987c314948d14831ac64f7edaa24a25c") {
  #note: this function defaults to my API key, but you should use your own in the function argument
  endpoint <- "/recipes/complexSearch"
  
  # Set the parameters for the API request
  parameters <- list(
    apiKey = apiKey,
    number = number,  # Number of recipes to retrieve (default: 10)
    diet = diet,  # Dietary restriction (e.g., "vegan", "gluten-free", "keto")
    addRecipeNutrition = TRUE  # Include additional recipe information
    # You can add more parameters as required by the endpoint
  )
  
  # Build the URL
  base_url <- "https://api.spoonacular.com"
  url <- paste0(base_url, endpoint)
  
  # Send the API request
  response <- GET(url, query = parameters)
  
  # Check for successful response
  if (http_type(response) == "application/json") {
    content <- fromJSON(rawToChar(response$content), flatten = TRUE)
    
    # Extract relevant information from the response
    recipes <- content$results
    
    #store in a dataframe
    recipes <- as.data.frame(recipes)
    
    #removing the redundant columns
    recipes <- recipes %>% select(-c("gaps", "creditsText", "sourceName", "sourceUrl" ,"image","imageType","summary","analyzedInstructions","spoonacularSourceUrl","license","nutrition.properties","nutrition.flavonoids"))
    
    ###resetting the ingredients dataframe in the dataframe, so we can pivot wider
    
    #Rename "id" column in nutrition.ingredients to "ingredient_id"
    recipes$nutrition.ingredients <- lapply(recipes$nutrition.ingredients, function(x) {
      colnames(x) <- c("ingredient_id", names(x)[-1])
      x
    }) 
    
    #Rename "name" column in nutrition.ingredients to "ingredient_name"
     recipes$nutrition.ingredients <- lapply(recipes$nutrition.ingredients, function(x) {
      colnames(x)[colnames(x) == "name"] <- "ingredient_name"
      x
    })
     
    #Rename the "amount" column in nutrition.ingredients to "ingredient_amount"
     recipes$nutrition.ingredients <- lapply(recipes$nutrition.ingredients, function(x) {
      colnames(x)[colnames(x) == "amount"] <- "ingredient_amount"
      x
    })
     
    #Rename the "unit" column in nutrition.ingredients to "ingredient_unit"
     recipes$nutrition.ingredients <- lapply(recipes$nutrition.ingredients, function(x) {
      colnames(x)[colnames(x) == "unit"] <- "ingredient_unit"
      x
    })
     
     #Rename the "unit" column in nutrition.ingredients to "ingredient_unit"
     recipes$nutrition.ingredients <- lapply(recipes$nutrition.ingredients, function(x) {
      colnames(x)[colnames(x) == "nutrients"] <- "ingredient_nutrient"
      x
    })
     
   #unnest the ingredients column to wider 
   recipes<- unnest_wider(recipes, nutrition.ingredients)
   
   ###resetting the nutrients dataframe in the dataframe, so we can pivot wider

    #Rename "name" column in nutrition.nutrients to "nutrient"
     recipes$nutrition.nutrients <- lapply(recipes$nutrition.nutrients, function(x) {
      colnames(x)[colnames(x) == "name"] <- "nutrition_info"
      x
    })
     
      #Rename "amount" column in nutrition.nutrients to "nutrient_amount"
     recipes$nutrition.nutrients <- lapply(recipes$nutrition.nutrients, function(x) {
      colnames(x)[colnames(x) == "amount"] <- "nutrient_amount"
      x
    })
    
  #Rename "unit" column in nutrition.nutrients to "nutrient_unit"
     recipes$nutrition.nutrients <- lapply(recipes$nutrition.nutrients, function(x) {
      colnames(x)[colnames(x) == "unit"] <- "nutrient_unit"
      x
    })
   
      
   #unnest the ingredients column to wider 
   recipes<- unnest_wider(recipes, nutrition.nutrients)
     
   
    return(recipes)
  } else {
    stop("API request failed. Status code: ", status_code(response))
  }
}

`getIngredientsByQuery`

This function gets ingredient information for a given query. Suppose we would like to utilize macro-nutrient information for when we go shopping for our recipes, after we’ve made a judgement call on what restrictions we’d like to put in based on the available recipes given by the previous function.

I’ve created the function below to allow a user to filter grocery products based off of ingredient information, and specifically filter by macro-nutrients and dietary intolerance(/s). We’ll be using this as a demonstration of the limits of APIs at the end.

getIngredientsByQuery <- function(apiKey, query, carbs, fats, protein, intolerances) {
  # Prepare the query parameters
  query_params <- list(
    apiKey = apiKey,
    query = query,
    minCarbs = carbs,
    minFats = fats,
    minProtein = protein,
    intolerances = paste(intolerances, collapse = ",")
  )
  
  # Construct the API URL
  base_url <- "https://api.spoonacular.com/food/ingredients/search"
  api_url <- modify_url(base_url, query = query_params)
  
  # Make the API request
  response <- GET(api_url)
  
  # Check if the request was successful
  if (http_status(response)$category != "Success") {
    stop("API request failed: ", http_status(response)$reason)
  }
  
  # Parse the response JSON
  response_json <- content(response, "text", encoding = "UTF-8")
  ingredients <- fromJSON(response_json)
  
  # Check if there are no results
  if (ingredients$totalResults == 0) {
    message("No ingredients found for the given query and criteria.")
    return(NULL)
  }

  
  
  #return(ingredient_data)
  return(ingredients$results)
}

Here’s an example showing how a user can utilize the function above.

# Example usage
apiKey <- "987c314948d14831ac64f7edaa24a25c"
query <- "pasta"
carbs <- 10
fats <- 5
protein <- 20
intolerances <- c("vegetarian")

ingredient_df <- getIngredientsByQuery(apiKey, query, carbs, fats, protein, intolerances)

print(ingredient_df)

##          id               name            image
## 1     20420              pasta      fusilli.jpg
## 2  10118334        pasta dough        dough.jpg
## 3  11020420       pasta shells  shell-pasta.jpg
## 4     99036    pasta salad mix      fusilli.jpg
## 5  10520420 jumbo pasta shells jumbo-shells.jpg
## 6  10920420               orzo         orzo.jpg
## 7  11520420               ziti         ziti.jpg
## 8     20499        short pasta        elbow.jpg
## 9     20093        fresh pasta      fusilli.jpg
## 10 11320420    corkscrew pasta      fusilli.jpg

Exploratory Data Analysis

Suppose that I wanted to check how healthy the diets of two patients are: one who is a vegan and another who is ketogenic. Let’s produce the data frames for both of them. The data will come from 100 recipes.

vegan_df <- getRecipesByDiet("vegan",number=100,apiKey="987c314948d14831ac64f7edaa24a25c") %>% unnest("cuisines",keep_empty=TRUE) %>% 
  unnest("dishTypes",keep_empty=TRUE) %>%
  unnest("diets",keep_empty=TRUE) %>%
  unnest("occasions",keep_empty=TRUE) %>% group_by()
head(vegan_df)

## # A tibble: 6 × 38
##   vegetarian vegan glutenFree dairyFree veryHealthy cheap veryPopular sustainable lowFodmap weightWatcherSmartPoints
##   <lgl>      <lgl> <lgl>      <lgl>     <lgl>       <lgl> <lgl>       <lgl>       <lgl>                        <int>
## 1 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 2 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 3 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 4 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 5 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 6 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## # ℹ 28 more variables: preparationMinutes <int>, cookingMinutes <int>, aggregateLikes <int>, healthScore <int>,
## #   pricePerServing <dbl>, id <int>, title <chr>, readyInMinutes <int>, servings <int>, cuisines <chr>,
## #   dishTypes <chr>, diets <chr>, occasions <chr>, author <chr>, nutrition_info <list<chr>>,
## #   nutrient_amount <list<dbl>>, nutrient_unit <list<chr>>, percentOfDailyNeeds <list<dbl>>,
## #   ingredient_id <list<int>>, ingredient_name <list<chr>>, ingredient_amount <list<dbl>>,
## #   ingredient_unit <list<chr>>, ingredient_nutrient <list<list>>, nutrition.caloricBreakdown.percentProtein <dbl>,
## #   nutrition.caloricBreakdown.percentFat <dbl>, nutrition.caloricBreakdown.percentCarbs <dbl>, …

Similarly, for 100 recipes, let’s look at the data for someone who is on a keto diet.

keto_df <- getRecipesByDiet("keto",number=100,apiKey="987c314948d14831ac64f7edaa24a25c") %>% unnest("cuisines",keep_empty=TRUE) %>% 
  unnest("dishTypes",keep_empty=TRUE) %>%
  unnest("diets",keep_empty=TRUE) %>%
  unnest("occasions",keep_empty=TRUE) 
head(keto_df)

## # A tibble: 6 × 38
##   vegetarian vegan glutenFree dairyFree veryHealthy cheap veryPopular sustainable lowFodmap weightWatcherSmartPoints
##   <lgl>      <lgl> <lgl>      <lgl>     <lgl>       <lgl> <lgl>       <lgl>       <lgl>                        <int>
## 1 TRUE       FALSE TRUE       FALSE     FALSE       FALSE TRUE        FALSE       FALSE                           19
## 2 TRUE       FALSE TRUE       FALSE     FALSE       FALSE TRUE        FALSE       FALSE                           19
## 3 TRUE       FALSE TRUE       FALSE     FALSE       FALSE TRUE        FALSE       FALSE                           19
## 4 TRUE       FALSE TRUE       FALSE     FALSE       FALSE TRUE        FALSE       FALSE                           19
## 5 TRUE       FALSE TRUE       FALSE     FALSE       FALSE TRUE        FALSE       FALSE                           19
## 6 TRUE       FALSE TRUE       FALSE     FALSE       FALSE TRUE        FALSE       FALSE                           19
## # ℹ 28 more variables: preparationMinutes <int>, cookingMinutes <int>, aggregateLikes <int>, healthScore <int>,
## #   pricePerServing <dbl>, id <int>, title <chr>, readyInMinutes <int>, servings <int>, cuisines <chr>,
## #   dishTypes <chr>, diets <chr>, occasions <chr>, author <chr>, nutrition_info <list<chr>>,
## #   nutrient_amount <list<dbl>>, nutrient_unit <list<chr>>, percentOfDailyNeeds <list<dbl>>,
## #   ingredient_id <list<int>>, ingredient_name <list<chr>>, ingredient_amount <list<dbl>>,
## #   ingredient_unit <list<chr>>, ingredient_nutrient <list<list>>, nutrition.caloricBreakdown.percentProtein <dbl>,
## #   nutrition.caloricBreakdown.percentFat <dbl>, nutrition.caloricBreakdown.percentCarbs <dbl>, …

For later use, I’ll also create a combined dataframe with an indicator function variable to indicate whether or not a diet came from the keto data frame. I’ll also make sure

vegan_df2 <-vegan_df
vegan_df2$keto <- rep(0,nrow(vegan_df))
keto_df2 <- keto_df
keto_df2$keto <- rep(1,nrow(keto_df))

combined_df <-rbind(vegan_df2,keto_df2)
head(combined_df)

## # A tibble: 6 × 39
##   vegetarian vegan glutenFree dairyFree veryHealthy cheap veryPopular sustainable lowFodmap weightWatcherSmartPoints
##   <lgl>      <lgl> <lgl>      <lgl>     <lgl>       <lgl> <lgl>       <lgl>       <lgl>                        <int>
## 1 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 2 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 3 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 4 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 5 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 6 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## # ℹ 29 more variables: preparationMinutes <int>, cookingMinutes <int>, aggregateLikes <int>, healthScore <int>,
## #   pricePerServing <dbl>, id <int>, title <chr>, readyInMinutes <int>, servings <int>, cuisines <chr>,
## #   dishTypes <chr>, diets <chr>, occasions <chr>, author <chr>, nutrition_info <list<chr>>,
## #   nutrient_amount <list<dbl>>, nutrient_unit <list<chr>>, percentOfDailyNeeds <list<dbl>>,
## #   ingredient_id <list<int>>, ingredient_name <list<chr>>, ingredient_amount <list<dbl>>,
## #   ingredient_unit <list<chr>>, ingredient_nutrient <list<list>>, nutrition.caloricBreakdown.percentProtein <dbl>,
## #   nutrition.caloricBreakdown.percentFat <dbl>, nutrition.caloricBreakdown.percentCarbs <dbl>, …

Suppose that we wanted to compare the protein percentage in the vegan recipes as compared to the keto recipes. Let’s take a look at the distributions.

h1 <- hist(vegan_df$nutrition.caloricBreakdown.percentProtein,
             main = "Protein Distribution for Vegan Recipes", 
     xlab = "Percent Protein Vegan")

h2<-hist(keto_df$nutrition.caloricBreakdown.percentProtein,
         main = "Protein Distribution for Keto Recipes", 
     xlab = "Percent Protein Keto")

plot( h1, col=rgb(0,0,1,1/4), xlim=c(0,50),main="",xlab="",ylab="")  # first histogram
plot( h2, col=rgb(1,0,0,1/4), xlim=c(0,50), add=T,main="",xlab="",ylab="")  # second
title(main = "Vegan vs. Keto Protein Distribution Comparison", 
      xlab = "Diet Percent Protein: Vegan (Purple) and Keto (Pink)", ylab ="Frequency")

It’s fairly clear from this histograms (and especially that overlayed histogram), that it’s easier to get a higher percentage of your daily protein from the keto recipe. What about fats and carbs? Clearly by the name, a keto diet must necessarily be higher in fats than the vegan diet. I’m also guessing that there are more carbs in a vegan diet. Let’s use box plots to investigate.

boxplot(vegan_df$nutrition.caloricBreakdown.percentCarbs,keto_df$nutrition.caloricBreakdown.percentCarbs,names=c("Vegan","Keto"),col=c("Green","Yellow"))
title(main = "Vegan vs. Keto Carb Distribution Comparison", 
      xlab = "Diet", ylab ="Percentage Carbs")

Unsurprisingly, it seems that keto diets has less percentage of carbs as compared to vegan recipes.

boxplot(vegan_df$nutrition.caloricBreakdown.percentFat,keto_df$nutrition.caloricBreakdown.percentFat,names=c("Vegan","Keto"),col=c("darkgreen","cornsilk"))
title(main = "Vegan vs. Keto Fat Distribution Comparison", 
      xlab = "Diet", ylab ="Percentage Fats")

No surprises here! A ketogenic diet is intended to be high in fats, so the average percentage fats in a keto recipe is way higher than that of the vegan diet. Furthermore, there seems to be less spread so these diets are pretty homogeneously high-fat, it seems.

Do people seem to like the vegan or keto recipes more? Let’s compare the aggregate likes!

TotalLikes <- cbind(vegan_df$aggregateLikes,keto_df$aggregateLikes)
TotalLikes <- as.data.frame(TotalLikes) %>% rename("Vegan"= V1, "Keto" = V2)
barplot(colSums(TotalLikes),log="y",col=c("chartreuse2","coral")) 
title(main = "Aggregate Like for Vegan vs. Keto Recipes", 
      xlab = "Diet", ylab ="Aggregate Likes (Log Scale)")

It seems that people like vegan recipes more than keto recipes! I wonder if it is because of anything in particular (does keto not gain a lot of traction? Do more people like vegan stuff?) Or perhaps, if it’s just that more carb-heavy recipes are liked? We could create a variable in combined_df that could classify whether something is high or low carb, and see how well-liked it is.

combined_df_carbs<- combined_df %>%
  mutate(carb_content = case_when(nutrition.caloricBreakdown.percentCarbs < 10 ~ 'Very Low',                nutrition.caloricBreakdown.percentCarbs < 26 ~ 'Low',
                           nutrition.caloricBreakdown.percentCarbs < 60 ~'Moderate',
                           nutrition.caloricBreakdown.percentCarbs >60 ~ 'High'
                           ))
combined_df_carbs$carb_content <- as.factor(combined_df_carbs$carb_content)
head(combined_df_carbs)

## # A tibble: 6 × 40
##   vegetarian vegan glutenFree dairyFree veryHealthy cheap veryPopular sustainable lowFodmap weightWatcherSmartPoints
##   <lgl>      <lgl> <lgl>      <lgl>     <lgl>       <lgl> <lgl>       <lgl>       <lgl>                        <int>
## 1 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 2 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 3 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 4 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 5 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 6 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## # ℹ 30 more variables: preparationMinutes <int>, cookingMinutes <int>, aggregateLikes <int>, healthScore <int>,
## #   pricePerServing <dbl>, id <int>, title <chr>, readyInMinutes <int>, servings <int>, cuisines <chr>,
## #   dishTypes <chr>, diets <chr>, occasions <chr>, author <chr>, nutrition_info <list<chr>>,
## #   nutrient_amount <list<dbl>>, nutrient_unit <list<chr>>, percentOfDailyNeeds <list<dbl>>,
## #   ingredient_id <list<int>>, ingredient_name <list<chr>>, ingredient_amount <list<dbl>>,
## #   ingredient_unit <list<chr>>, ingredient_nutrient <list<list>>, nutrition.caloricBreakdown.percentProtein <dbl>,
## #   nutrition.caloricBreakdown.percentFat <dbl>, nutrition.caloricBreakdown.percentCarbs <dbl>, …

Make sure to group the carb content variable we just created.

carb_df <- combined_df_carbs %>%  group_by(carb_content)
head(carb_df)

## # A tibble: 6 × 40
## # Groups:   carb_content [1]
##   vegetarian vegan glutenFree dairyFree veryHealthy cheap veryPopular sustainable lowFodmap weightWatcherSmartPoints
##   <lgl>      <lgl> <lgl>      <lgl>     <lgl>       <lgl> <lgl>       <lgl>       <lgl>                        <int>
## 1 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 2 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 3 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 4 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 5 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## 6 TRUE       TRUE  TRUE       TRUE      TRUE        FALSE FALSE       FALSE       FALSE                           12
## # ℹ 30 more variables: preparationMinutes <int>, cookingMinutes <int>, aggregateLikes <int>, healthScore <int>,
## #   pricePerServing <dbl>, id <int>, title <chr>, readyInMinutes <int>, servings <int>, cuisines <chr>,
## #   dishTypes <chr>, diets <chr>, occasions <chr>, author <chr>, nutrition_info <list<chr>>,
## #   nutrient_amount <list<dbl>>, nutrient_unit <list<chr>>, percentOfDailyNeeds <list<dbl>>,
## #   ingredient_id <list<int>>, ingredient_name <list<chr>>, ingredient_amount <list<dbl>>,
## #   ingredient_unit <list<chr>>, ingredient_nutrient <list<list>>, nutrition.caloricBreakdown.percentProtein <dbl>,
## #   nutrition.caloricBreakdown.percentFat <dbl>, nutrition.caloricBreakdown.percentCarbs <dbl>, …

likes_sum <- combined_df_carbs %>%
  group_by(carb_content) %>%
  summarize(total_likes = sum(aggregateLikes))

# Create the barplot
ggplot(likes_sum, aes(carb_content, total_likes,fill=carb_content)) +
  geom_col() +
  xlab("Carb Content") +
  ylab("Total Likes") +
  ggtitle("Total Likes by Carb Content")

It seems that both high and very low carb recipes are not that well-liked! Furthermore, it seems that low carb recipes are the best liked, followed by moderate carb recipes.

Let’s get a numerical summary for this to investigate further:

#getting summary statistics for each of the categories
likesSummary <- combined_df_carbs %>%
  group_by(carb_content) %>% 
   summarize("Min." = min(aggregateLikes),
            "1st Quartile" = quantile(aggregateLikes, 0.25, na.rm=TRUE),
            "Median" = quantile(aggregateLikes, 0.5, na.rm=TRUE),
            "Mean" = mean(aggregateLikes, na.rm=TRUE),
            "3rd Quartile" = quantile(aggregateLikes, 0.75, na.rm=TRUE),
            "Max" = max(aggregateLikes),
            "Std. Dev." = sd(aggregateLikes, na.rm=TRUE)
            )
knitr::kable(likesSummary, 
             caption="Summary Statistics for Aggregate Likes by Carb Content Classification ",
             digits=2)

carb_content	Min.	1st Quartile	Median	Mean	3rd Quartile	Max	Std. Dev.
High	1	1	2	35.65	11	471	100.56
Low	0	1	2	208.62	11	5518	907.18
Moderate	0	1	4	164.00	11	3689	698.91
Very Low	0	1	1	45.35	7	991	163.40

Summary Statistics for Aggregate Likes by Carb Content Classification

Indeed, there is a difference in distributions for each of the carb content categories. The mean # of likes is highest for the low carb recipes. However, there also appears to be more variation for the distribution of likes for the low carb recipes!

What about the relationship for these recipes between the macro-nutrient? Is there some sort of pattern? We’ll explore the relationships between carbs and fat, carbs and protein, as well as protein and fat. I’ll also do a breakdown for these relationships for each of the diets.

#Carbs vs Fat, all
ggplot(combined_df, aes(x=nutrition.caloricBreakdown.percentCarbs, y=nutrition.caloricBreakdown.percentFat)) + 
  geom_point(col="red")+
  geom_smooth(method=lm) +
  xlab("Percent Carbs") +
  ylab("Percent Fat") +
  ggtitle("Percent Fat by Percent Carbs for Vegan and Keto Diets")

## `geom_smooth()` using formula = 'y ~ x'

#Carbs vs Protein, all
ggplot(combined_df, aes(x=nutrition.caloricBreakdown.percentCarbs, y=nutrition.caloricBreakdown.percentProtein)) + 
  geom_point(col="red")+
  geom_smooth(method=lm) +
  xlab("Percent Carbs") +
  ylab("Percent Protein") +
  ggtitle("Percent Protein by Percent Carbs for Vegan and Keto Diets")

## `geom_smooth()` using formula = 'y ~ x'

ggplot(combined_df, aes(x=nutrition.caloricBreakdown.percentProtein, y=nutrition.caloricBreakdown.percentFat)) + 
   geom_point(col="red")+
  geom_smooth(method=lm) +
  xlab("Percent Protein") +
  ylab("Percent Fat") +
  ggtitle("Percent Fat by Percent Protein for Vegan and Keto Diets")

## `geom_smooth()` using formula = 'y ~ x'

There seems to be a positive linear relationship between percent protein and percent fat, but curiously there looks like there is a strongly negative relationship between carbs and fat.

Let’s break it down by the specific diets:

#vegan, carbs vs. fat
ggplot(vegan_df, aes(x=nutrition.caloricBreakdown.percentCarbs, y=nutrition.caloricBreakdown.percentFat)) + 
  geom_point(col="green")+
  geom_smooth(method=lm) +
  xlab("Percent Carbs") +
  ylab("Percent Fat") +
  ggtitle("Percent Fat by Percent Carbs for Vegan Diets")

## `geom_smooth()` using formula = 'y ~ x'

#vegan, carbs vs. protein
ggplot(vegan_df, aes(x=nutrition.caloricBreakdown.percentCarbs, y=nutrition.caloricBreakdown.percentProtein)) + 
  geom_point(col="green")+
  geom_smooth(method=lm) +
  xlab("Percent Carbs") +
  ylab("Percent Protein") +
  ggtitle("Percent Protein by Percent Carbs for Vegan Diets")

## `geom_smooth()` using formula = 'y ~ x'

#vegan, protein vs. fat
ggplot(vegan_df, aes(x=nutrition.caloricBreakdown.percentProtein, y=nutrition.caloricBreakdown.percentFat)) + 
  geom_point(col="green")+
  geom_smooth(method=lm) +
  xlab("Percent Protein") +
  ylab("Percent Fat") +
  ggtitle("Percent Fat by Percent Protein for Vegan Diets")

## `geom_smooth()` using formula = 'y ~ x'

#keto, carbs vs. fat
ggplot(keto_df, aes(x=nutrition.caloricBreakdown.percentCarbs, y=nutrition.caloricBreakdown.percentFat)) + 
  geom_point(col="purple")+
  geom_smooth(method=lm) +
  xlab("Percent Carbs") +
  ylab("Percent Fat") +
  ggtitle("Percent Fat by Percent Carbs for Keto Diets")

## `geom_smooth()` using formula = 'y ~ x'

#keto, carbs vs. protein
ggplot(keto_df, aes(x=nutrition.caloricBreakdown.percentCarbs, y=nutrition.caloricBreakdown.percentProtein)) + 
  geom_point(col="purple")+
  geom_smooth(method=lm) +
  xlab("Percent Carbs") +
  ylab("Percent Protein") +
  ggtitle("Percent Protein by Percent Carbs for Keto Diets")

## `geom_smooth()` using formula = 'y ~ x'

#keto, protein vs. fat
ggplot(keto_df, aes(x=nutrition.caloricBreakdown.percentProtein, y=nutrition.caloricBreakdown.percentFat)) + 
  geom_point(col="purple")+
  geom_smooth(method=lm) +
  xlab("Percent Protein") +
  ylab("Percent Fat") +
  ggtitle("Percent Fat by Percent Protein for Keto Diets")

## `geom_smooth()` using formula = 'y ~ x'

Curiously, it seems that there is a strongly negative relationship between fat and protein for both keto and vegan diets (as you gain more protein, you get less fat). This isn’t the case for the combination of all the diets! I was especially surprised for keto, as you would think that your protein percentage and fat percentage would both be relatively high if you’re keeping starches (carbs) low. I also can’t really explain why the relationship between protein and fat would be negative for the vegan or keto diets, but combined the relationship is semi-positive.

We also have a lot of interesting categorical variables (especially logicals) that we could look at. Let’s look at some contingency tables for each of the diets. Let’s start by looking at the contingency table between a recipe being very healtlhy and being gluten-free.

vegan_health_tab<-table(vegan_df$glutenFree,vegan_df$veryHealthy)
colnames(vegan_health_tab)=c("Not Very Healthy","Very Healthy")
rownames(vegan_health_tab)=c("Not Gluten Free","Gluten Free")
kable(vegan_health_tab)

	Not Very Healthy	Very Healthy
Not Gluten Free	39	393
Gluten Free	336	2717

keto_health_tab<-table(keto_df$glutenFree,keto_df$veryHealthy)
colnames(keto_health_tab)=c("Not Very Healthy","Very Healthy")
rownames(keto_health_tab)=c("Not Gluten Free","Gluten Free")
kable(keto_health_tab)

	Not Very Healthy	Very Healthy
Not Gluten Free	52	0
Gluten Free	1967	530

I’m a little surprised that we pulled no recipes that were NOT gluten free and very healthy for the keto diet!

What about in general?

combined_health_tab<-table(combined_df$glutenFree,combined_df$veryHealthy)
colnames(combined_health_tab)=c("Not Very Healthy","Very Healthy")
rownames(combined_health_tab)=c("Not Gluten Free","Gluten Free")
kable(combined_health_tab)

	Not Very Healthy	Very Healthy
Not Gluten Free	91	393
Gluten Free	2303	3247

It seems in aggregate, there is some sort of relationship between a recipe being very healthy and being gluten-free.

What about recipes being popular and cheap?

combined_price_tab<-table(as.factor(combined_df$veryPopular),as.factor(combined_df$cheap))
colnames(combined_price_tab)=c("Not Cheap")
rownames(combined_price_tab)=c("Not Very Popular","Very Popular")
kable(combined_price_tab)

	Not Cheap
Not Very Popular	5699
Very Popular	335

Surprisingly, it seems that we weren’t able to pull in any cheap recipes at all! This might just be a peculiarity of the websites themselves and the most recent recipes.

Let’s try our other function, out of curiosity to see what kind of data we pull. Does this API pull relevant data? Suppose that I’m a user looking for bread. I’m also interested in a high carb diet. However, I’m also gluten-free.

apiKey <- "987c314948d14831ac64f7edaa24a25c"
query <- "bread"
carbs <- 60
fats <- 20
protein <- 20
intolerances <- c("gluten")

bread_ingredient_df <- getIngredientsByQuery(apiKey, query, carbs, fats, protein, intolerances)

print(bread_ingredient_df)

##          id                      name              image
## 1      9059                breadfruit     breadfruit.jpg
## 2     18372               baking soda   white-powder.jpg
## 3  10118375             instant yeast yeast-granules.jpg
## 4     18216                crispbread     crispbread.png
## 5     18019              banana bread    quick-bread.png
## 6     93671          banana bread mix   banana-bread.jpg
## 7   1052035         gingerbread spice   garam-masala.jpg
## 8  10099050         gluten free bread    white-bread.jpg
## 9     93694   gluten free breadcrumbs    breadcrumbs.jpg
## 10    99049 gluten free cornbread mix      cornbread.jpg

Of 10 rows, I received about 5 (breadcrumbs, bread, banana bread, banana bread mix, and instant yeast) if we’re interpreting this generously. It seems that you have to be careful with how you query, since this returns things that are kind of tangentially related about half the time here.

Conclusion

In summary, I built functions to interact with a couple of of spoonacular’s API’s endpoints. I retrieved some of the data, manipulated it, and explored it visually as well as with numerical summaries and tables. Some of my findings were nothing to call home about,since we knew that vegan recipes would probably be more carb-y and less protein-y than a keto diet.However, I was surprised by some things such as keto recipes having an inverse relationship between percent protein and percent carbs whereas it didn’t appear to be the case for the aggregated data. I also demonstrated the limits of some of the APIs themselves, and how they can return things very broadly.

Finally, I hope this helps with understanding how to interact with and utilize APIs! Happy coding!