Logo

Resume
Published on

Instacart Market Basket Data Analysis

Author
    Jiannina Pinto
    Name
    Jiannina Pinto

Photo by Marques Thomas on Unsplash

1. Introduction

Project Overview: Analyzed Instacart market basket data to understand business trends over time without using explicit dates. Used SQL queries to derive insights from the data set.

Skills Demonstrated: SQL data exploration, business analysis.

2. Data Exploration and Understanding

Schema Exploration:

  • Explored the schema of five tables: ic_order_products_curr, ic_order_products_prior, ic_products, ic_departments, and ic_aisles.
  • Identified primary keys and relationships between the tables.
  • Created Entity Relationship Diagram (ERD).

Instacart SQL Schema.png

  • ic_order_porducts_curr contains the order contents for all customers. The data was collected in Q3.
  • ic_order_products_prior contains the order contents for all customers. The data was collected in Q2.

Notice that these two tables have the same schema. We will compare them later.

  • ic_products contains info about each item in the Instacart product catalog.

  • ic_departments contains info about each department.

  • ic_aisles contains info about each aisle.

Queries and Results:

  • Explored data using SQL queries to understand table structures and the size of the data.

Query 1: Exploring ic_order_products_curr Table Structure

SELECT *
FROM ic_order_products_curr
LIMIT 5;

Explanation: This query retrieves all columns from the "ic_order_products_curr" table and displays the first 5 rows of the results.

Results:

| "order_id" | "product_id" | "add_to_cart_order" | "reordered" |
|------------|--------------|---------------------|-------------|
| 1          | 49302        | 1                   | true        |
| 1          | 11109        | 2                   | true        |
| 1          | 10246        | 3                   | false       |
| 1          | 49683        | 4                   | false       |
| 1          | 43633        | 5                   | true        |

Query 2: Exploring ic_order_products_prior Table Structure

SELECT *
FROM ic_order_products_prior
LIMIT 5;

Explanation: This query retrieves all columns from the "ic_order_products_prior" table and displays the first 5 rows of the results.

Results:

| "order_id" | "product_id" | "add_to_cart_order" | "reordered" |
|------------|--------------|---------------------|-------------|
| 1742049    | 47059        | 3                   | true        |
| 1742049    | 6343         | 4                   | true        |
| 1742049    | 6750         | 5                   | true        |
| 1742049    | 18740        | 6                   | true        |
| 1742049    | 27845        | 7                   | true        |

Query 3: Exploring ic_products Table Structure

SELECT *
FROM ic_products
LIMIT 5;

Explanation: This query retrieves all columns from the "ic_products" table and displays the first 5 rows of the results.

Results:

| "product_id" | "product_name"                                                      | "aisle_id" | "department_id" |
|--------------|---------------------------------------------------------------------|------------|-----------------|
| 1            | "Chocolate Sandwich Cookies"                                        | 61         | 19              |
| 2            | "All-Seasons Salt"                                                  | 104        | 13              |
| 3            | "Robust Golden Unsweetened Oolong Tea"                              | 94         | 7               |
| 4            | "Smart Ones Classic Favorites Mini Rigatoni With Vodka Cream Sauce" | 38         | 1               |
| 5            | "Green Chile Anytime Sauce"                                         | 5          | 13              |

Query 4: Exploring ic_departments Table Structure

SELECT *
FROM ic_departments
LIMIT 5;

Explanation: This query retrieves all columns from the "ic_departments" table and displays the first 5 rows of the results.

Results:

| "department_id" | "department"                                         |
|-----------------|------------------------------------------------------|
| 1               | "frozen                                            " |
| 2               | "other                                             " |
| 3               | "bakery                                            " |
| 4               | "produce                                           " |
| 5               | "alcohol                                           " |

Query 5: Exploring ic_aisles Table Structure

SELECT *
FROM ic_aisles
LIMIT 5;

Explanation: This query retrieves all columns from the "ic_aisles" table and displays the first 5 rows of the results.

Results:

| "aisle_id" | "aisle"                                              |
|------------|------------------------------------------------------|
| 1          | "prepared soups salads                             " |
| 2          | "specialty cheeses                                 " |
| 3          | "energy granola bars                               " |
| 4          | "instant foods                                     " |
| 5          | "marinades meat preparation                        " |

Query 6: Total number of products ordered in the previous quarter (Q2)

SELECT COUNT(*)
FROM ic_order_products_prior;

Explanation: This query calculates and returns the total count of products in the "ic_order_products_prior" table, as indicated by the result value of 32,434,489.

Results:

| "count"  |
|----------|
| 32434489 |

Query 7: Total number of products ordered in the current quarter (Q3)

SELECT COUNT(*)
FROM ic_order_products_curr;

Explanation: This query calculates and returns the total count of products in the "ic_order_products_curr" table, as indicated by the result value of 1,384,617.

Results:

| "count" |
|---------|
| 1384617 |

Data Quality Check

Query 8: Missing values in ic_order_products_curr

SELECT 
    COUNT(*) AS total_rows,
    COUNT(order_id) AS non_null_order_id,
    COUNT(product_id) AS non_null_product_id,
    COUNT(add_to_cart_order) AS non_null_add_to_cart_order,
    COUNT(reordered) AS non_null_reordered
FROM ic_order_products_curr;

Explanation: This query calculates the total number of rows in the "ic_order_products_curr" table and separately counts non-null values in specific columns, including "order_id," "product_id," "add_to_cart_order," and "reordered."

Results:

| "total_rows" | "non_null_order_id" | "non_null_product_id" | "non_null_add_to_cart_order" | "non_null_reordered" |
|--------------|---------------------|-----------------------|------------------------------|----------------------|
| 1384617      | 1384617             | 1384617               | 1384617                      | 1384617              |

The "ic_order_products_curr" table does not contain any null values in the specified columns. All 1,384,617 rows in the table have non-null values in these columns.

Query 9: Missing values in ic_order_products_prior

SELECT 
    COUNT(*) AS total_rows,
    COUNT(order_id) AS non_null_order_id,
    COUNT(product_id) AS non_null_product_id,
    COUNT(add_to_cart_order) AS non_null_add_to_cart_order,
    COUNT(reordered) AS non_null_reordered
FROM ic_order_products_prior;

Explanation: This query calculates the total number of rows in the "ic_order_products_prior" table and separately counts non-null values in specific columns, including "order_id," "product_id," "add_to_cart_order," and "reordered."

Results:

| "total_rows" | "non_null_order_id" | "non_null_product_id" | "non_null_add_to_cart_order" | "non_null_reordered" |
|--------------|---------------------|-----------------------|------------------------------|----------------------|
| 32434489     | 32434489            | 32434489              | 32434489                     | 32434489             |

The "ic_order_products_prior" table does not contain any null values in the specified columns. All 32,434,489 rows in the table have non-null values in these columns.

Query 10: Missing values in ic_products

SELECT 
    COUNT(*) AS total_rows,
    COUNT(product_id) AS non_null_product_id,
    COUNT(product_name) AS non_null_product_name,
    COUNT(aisle_id) AS non_null_aisle_id,
    COUNT(department_id) AS non_null_department_id
FROM ic_products;

Explanation: This query calculates counts for total rows and non-null values in specific columns ("product_id," "product_name," "aisle_id," and "department_id") in the "ic_products" table.

Results:

| "total_rows" | "non_null_product_id" | "non_null_product_name" | "non_null_aisle_id" | "non_null_department_id" |
|--------------|-----------------------|-------------------------|---------------------|--------------------------|
| 49688        | 49688                 | 49688                   | 49688               | 49688                    |

The "ic_products" table does not contain any null values in the specified columns.

Query 11: Missing values in ic_departments

SELECT 
    COUNT(*) AS total_rows,
    COUNT(department_id) AS non_null_department_id,
    COUNT(department) AS non_null_department
FROM ic_departments;

Explanation: This query calculates and displays the total number of rows in the "ic_departments" table, along with counts of non-null values in the "department_id" and "department" columns.

Results:

| "total_rows" | "non_null_department_id" | "non_null_department" |
|--------------|--------------------------|-----------------------|
| 21           | 21                       | 21                    |

The "ic_departments" table does not contain any null values in the specified columns.

Query 12: Missing values in ic_aisles

SELECT 
    COUNT(*) AS total_rows,
    COUNT(aisle_id) AS non_null_aisle_id,
    COUNT(aisle) AS non_null_aisle
FROM ic_aisles;

Explanation: This query calculates the total number of rows in the "ic_aisles" table and provides counts of non-null values in the "aisle_id" and "aisle" columns.

Results:

| "total_rows" | "non_null_aisle_id" | "non_null_aisle"                   |
|--------------|---------------------|------------------------------------|
| 134          | 134                 | 134                                |

The "ic_aisles" table does not contain any null values in the specified columns.

Query 13: Total number of unique products ordered in the current quarter (Q3)

SELECT COUNT (DISTINCT product_id)
FROM ic_order_products_curr;

Explanation: This query calculates and returns the count of unique product IDs in the "ic_order_products_curr" table, representing the number of distinct products ordered in the dataset for the current quarter (Q3).

Results:

| "count" |
|---------|
| 39123   |

There were 39,123 unique products ordered in Q3.

Query 14: Total number of unique products ordered in the previous quarter (Q2)

SELECT COUNT (DISTINCT product_id)
FROM ic_order_products_prior;

Explanation: This query calculates and returns the count of unique product IDs in the "ic_order_products_prior" table, representing the number of distinct products ordered in the dataset for the previous quarter (Q2). Results:

| "count" |
|---------|
| 49677   |

There were 49,677 unique products ordered in Q3.

3. Data Analysis

In our analysis, we will be focusing on a representative subset comprising 80% of the rows from both the ic_order_products_curr and ic_order_products_prior tables. This strategic choice stems from the considerable size of these datasets, each containing millions of records. Given the limitations of our current computational resources, working with smaller, randomly sampled data allows us to enhance query performance and alleviate processing challenges. By employing this approach, we aim to expedite our analysis, ensuring efficient query execution and reducing overall processing time substantially.

Query 15: Create a view of the sampled data for the ic_order_products_prior

CREATE VIEW sampled_order_products_prior_view AS 
SELECT * FROM ic_order_products_prior TABLESAMPLE SYSTEM (0.80);

Explanation: This query creates a view named "sampled_order_products_prior_view" containing 80% of the rows from "ic_order_products_prior" table.

Query 16: Create a view of the sampled data for the ic_order_products_curr

CREATE VIEW sampled_order_products_curr_view AS 
SELECT * FROM ic_order_products_curr TABLESAMPLE SYSTEM (0.80);

Explanation: This query creates a view named "sampled_order_products_curr_view" containing 80% of the rows from "ic_order_products_curr" table.

We created these views so that we can use the sampled data in many different queries. This makes our main analysis queries simpler and easier to manage.

Business Questions

1. What is the average number of products per order for the current quarter?

Query 17: Average number of products per order in the current quarter (Q3)

-- CTE to calculate the number of products per order from sampled data
WITH order_product_counts_curr_cte AS (
	SELECT
		order_id, 
		COUNT(product_id) AS products_per_order
	FROM sampled_order_products_prior_view
	GROUP BY order_id
)

SELECT 
	ROUND(AVG(products_per_order), 0) AS avg_products_per_order
FROM order_product_counts_curr_cte;

Explanation: In this query, a common table expression (CTE) named "order_product_counts_curr_cte" uses the view "sampled_order_products_curr_view" to calculate the number of products per order. This is done for each "order_id" using the "COUNT(product_id)" function and grouped by "order_id". Finally, we calculate the average number of products per order across all orders in the sampled data. The result is then rounded to the nearest whole number using the "ROUND" function.

Results:

| "avg_products_per_order" |
|--------------------------|
| 10                       |

The calculated average number of products per order, based on the sampled data, is 10. This means that, on average, customers in the current quarter (Q3) are ordering 10 products per order.

Query 18: Average number of products per order in the previous quarter (Q2)

-- CTE to calculate the number of products per order from sampled data
WITH order_product_counts_prior_cte AS (
	SELECT
		order_id, 
		COUNT(product_id) AS products_per_order
	FROM sampled_order_products_prior_view
	GROUP BY order_id
)

SELECT 
	ROUND(AVG(products_per_order), 0) AS avg_products_per_order
FROM order_product_counts_prior_cte;

Explanation: A common table expression (CTE) named "order_product_counts_prior_cte" uses the representative subset of Q2 data "sampled_order_products_prior_view" to calculate the number of products per order. This is done for each order_id using the "COUNT(product_id)" function and grouped by "order_id". The final step computes the average number of products per order by taking the average of products per order across all orders in the sampled Q2 data. The result is rounded to the nearest whole number using the "ROUND" function.

Results:

| "avg_products_per_order" |
|--------------------------|
| 10                       |

Customers in the current quarter (Q3) are also purchasing an average of 10 products per order, based on the sampled data.

Query 19: The 10 most reordered products for the current quarter (Q3)

WITH order_product_counts_curr_cte AS (
    SELECT
        order_id,
        COUNT(product_id) AS products_per_order
    FROM sampled_order_products_curr_view
    GROUP BY order_id
)

SELECT
    products.product_id,
    products.product_name,
    COUNT(current_order.reordered) AS total_reordered
FROM ic_order_products_curr AS current_order
INNER JOIN ic_products AS products
	ON current_order.product_id = products.product_id
GROUP BY products.product_id, products.product_name
ORDER BY total_reordered DESC
LIMIT 10;

Explanation: This query first calculates the count of products per order from the sampled view representing the current quarter's data. It then combines this information by joining it with the products table using the product ID as a common identifier. It further groups the results by product ID and name, calculating the total count of reorders for each product in the current orders. Finally, it orders the products by the total number of reorders in descending order and presents the top 10 products with the highest reorder counts.

Results:

| "product_id" | "product_name"           | "total_reordered" |
|--------------|--------------------------|-------------------|
| 24852        | "Banana"                 | 18726             |
| 13176        | "Bag of Organic Bananas" | 15480             |
| 21137        | "Organic Strawberries"   | 10894             |
| 21903        | "Organic Baby Spinach"   | 9784              |
| 47626        | "Large Lemon"            | 8135              |
| 47766        | "Organic Avocado"        | 7409              |
| 47209        | "Organic Hass Avocado"   | 7293              |
| 16797        | "Strawberries"           | 6494              |
| 26209        | "Limes"                  | 6033              |
| 27966        | "Organic Raspberries"    | 5546              |

The results display the top 10 products with the highest reorder counts in the current orders. For instance, the product with "product_id" 24852, named "Banana," has been reordered 18,726 times, making it the most frequently reordered product among the top 10 listed.

Query 20: The 10 most reordered products for the previous quarter (Q2)

WITH order_product_counts_prior_cte AS (
    SELECT
        order_id,
        COUNT(product_id) AS products_per_order
    FROM sampled_order_products_prior_view
    GROUP BY order_id
)

SELECT
    products.product_id,
    products.product_name,
    COUNT(prior_order.reordered) AS total_reordered
FROM ic_order_products_prior AS prior_order
INNER JOIN ic_products AS products
	ON prior_order.product_id = products.product_id
GROUP BY products.product_id, products.product_name
ORDER BY total_reordered DESC
LIMIT 10;

Explanation: This query first calculates the count of products per order from a sampled view representing the prior quarter's data. Then, it joins this information with the main products table using the product ID as a common identifier. It counts how often each product has been reordered in prior orders, grouping the results by product, and finally, it presents the top 10 products with the highest reorder counts.

Results:

| "product_id" | "product_name"           | "total_reordered" |
|--------------|--------------------------|-------------------|
| 24852        | "Banana"                 | 472565            |
| 13176        | "Bag of Organic Bananas" | 379450            |
| 21137        | "Organic Strawberries"   | 264683            |
| 21903        | "Organic Baby Spinach"   | 241921            |
| 47209        | "Organic Hass Avocado"   | 213584            |
| 47766        | "Organic Avocado"        | 176815            |
| 47626        | "Large Lemon"            | 152657            |
| 16797        | "Strawberries"           | 142951            |
| 26209        | "Limes"                  | 140627            |
| 27845        | "Organic Whole Milk"     | 137905            |

The results show the top 10 products with the highest reorder counts from the prior quarter's data. For example, the product with "product_id" 24852, named "Banana," has been reordered 472,565 times, making it the most frequently reordered product among the top 10 listed. These results offer valuable insights into the most popular and frequently repurchased items, providing crucial information for inventory management and customer preferences analysis.

2. Are there products that were rarely re-purchased in the last quarter (Q2) but are now being reordered more frequently in the current orders (Q3) compared to before?

Query 21: Products reordered less than 10 times in Q2 that have been reordered more than 10 or more times in Q3.

SELECT
      products.product_id,
      products.product_name,
      products.aisle_id,
      products.department_id,
      departments.department,
      aisles.aisle,
      SUM(CASE WHEN prior_orders.reordered::int = 1 THEN 1 ELSE 0 END) AS prior_reorders,
      SUM(CASE WHEN curr_orders.reordered::int =1 THEN 1 ELSE 0 END) AS curr_reorders
  FROM
      ic_products AS products
  JOIN
      sampled_order_products_prior_view AS prior_orders
      ON products.product_id = prior_orders.product_id
  JOIN
      sampled_order_products_curr_view AS curr_orders
      ON products.product_id = curr_orders.product_id
  JOIN
      ic_departments AS departments
      ON products.department_id = departments.department_id
  JOIN
      ic_aisles AS aisles
      ON products.aisle_id = aisles.aisle_id
  GROUP BY
      products.product_id,
      products.product_name,
      products.aisle_id,
      products.department_id,
      departments.department,
      aisles.aisle
  HAVING
      SUM(CASE WHEN prior_orders.reordered::int = 1 THEN 1 ELSE 0 END) < 10
      AND SUM(CASE WHEN curr_orders.reordered::int = 1 THEN 1 ELSE 0 END) >= 10
  ORDER BY
      curr_reorders DESC;

Explanation: This query calculates the total number of reorders for products in both the previous quarter (Q2) and the current quarter (Q3) based on the sampled data. It joins the "ic_products" table with sampled data from both "sampled_order_products_prior_view" (representing Q2) and "sampled_order_products_curr_view" (representing Q3) using product IDs. The "SUM" function is used with conditional logic ("CASE WHEN") to count reorders: 1 for reorders and 0 for non-reorders.

The query groups the results by product ID, product name, aisle ID, department ID, department name, and aisle name. It filters out products that were reordered less than 10 times in Q2 but are reordered 10 or more times in Q3. The final results are sorted by the number of reorders in the current quarter (Q3) in descending order.

Results:

| product_id | product_name                               | aisle_id | department_id | department  | aisle                           | prior_reorders  | curr_reorders |
|------------|--------------------------------------------|----------|---------------|-------------|---------------------------------|----------------|---------------|
| 16462      | Ready-to-Bake 9 Inch Pie Crusts            | 105      | 13            | pantry      | doughs gelatins bake mixes      | 8              | 38            |
| 44303      | Organic Shredded Unsweetened Coconut       | 17       | 13            | pantry      | baking ingredients              | 6              | 35            |
| 8006       | Chopped Organic Garlic                     | 110      | 13            | pantry      | pickled goods olives            | 4              | 34            |
| 45002      | Organic Balsamic Vinegar Of Modena         | 19       | 13            | pantry      | oils vinegars                   | 9              | 32            |
| 35269      | Vegetable Tray With Low Fat Dressing       | 32       | 4             | produce     | packaged produce                | 6              | 30            |
| 19767      | Old Fashioned Oatmeal                      | 130      | 14            | breakfast   | hot cereal pancake mixes        | 6              | 30            |
| 43920      | Organic Powdered Sugar                     | 17       | 13            | pantry      | baking ingredients              | 3              | 27            |
| 17679      | San Marzano Peeled Tomatoes                | 81       | 15            | canned goods| canned jarred vegetables        | 8              | 26            |
| 9871       | Beef Loin New York Strip Steak             | 122      | 12            | meat seafood| meat counter                    | 8              | 26            |
| 25861      | Freshly Shaved Parmesan Cheese             | 21       | 16            | dairy eggs | packaged cheese                  | 6              | 26            |
...
| Total: 289 rows                                   |

The results show a selection of products with their respective IDs, names, aisle and department IDs, aisle names, department names, prior reorders, and current reorders. These products, representing a subset of 289 items, indicate various categories (e.g., pantry, produce, breakfast) with differing levels of reordered quantities, suggesting diverse consumer preferences across departments.

3. Which departments exhibit the highest increase in reordered products?

Query 22: Identify the departments with the most significant increase in reordered products

WITH increased_reorders AS (
	SELECT
		products.product_id,
		products.product_name,
		products.aisle_id,
		products.department_id,
		departments.department,
		aisles.aisle,
		-- Total number of times each product has been reordered in the previous quarter (Q2)
		SUM(CASE WHEN prior_orders.reordered THEN 1 ELSE 0 END) AS prior_reorders,
		-- Total number of times each product has been reordered in the current quarter (Q3)
		SUM(CASE WHEN curr_orders.reordered THEN 1 ELSE 0 END) AS current_reorders
	FROM ic_products AS products
	JOIN sampled_order_products_prior_view AS prior_orders -- Get prior order data (Q2)
		ON products.product_id = prior_orders.product_id
	JOIN sampled_order_products_curr_view AS curr_orders -- Get current order data (Q3)
		ON products.product_id = curr_orders.product_id
	JOIN ic_departments AS departments
		ON products.department_id = departments.department_id
	JOIN ic_aisles AS aisles
		ON products.aisle_id = aisles.aisle_id
	GROUP BY
		products.product_id,
		products.product_name,
		products.aisle_id,
		products.department_id,
		departments.department,
		aisles.aisle
	HAVING
		SUM(CASE WHEN prior_orders.reordered THEN 1 ELSE 0 END) < 10 -- Filter on products previously reordered less than 10 times 
		AND SUM(CASE WHEN curr_orders.reordered THEN 1 ELSE 0 END) >= 10 -- Filter on products currently reordered more than 10 times 
	ORDER BY
		current_reorders DESC
)

SELECT department,
COUNT(*) AS num_products
FROM increased_reorders
GROUP BY department
ORDER BY num_products DESC;

Explanation: This query uses a Common Table Expression (CTE) named "increased_reorders" to filter and aggregate the data. The CTE calculates the total number of reorders for products in both the current quarter (Q3) and the previous quarter (Q2). It filters products that were reordered less than 10 times in Q2 but have been reordered more than 10 times in Q3.

The subsequent "SELECT" statement then counts the number of products in each department based on the criteria specified in the CTE. Departments are ordered by the count of products meeting the specified conditions, from highest to lowest, providing an overview of departments with the most significant changes in reorder patterns.

Results:

| "department"                                         | "num_products" |
|------------------------------------------------------|----------------|
| "dairy eggs                                        " | 35             |
| "pantry                                            " | 34             |
| "snacks                                            " | 34             |
| "frozen                                            " | 26             |
| "produce                                           " | 24             |
| "beverages                                         " | 24             |
| "household                                         " | 21             |
| "canned goods                                      " | 17             |
| "dry goods pasta                                   " | 16             |
| "bakery                                            " | 15             |
| "breakfast                                         " | 10             |
| "babies                                            " | 8              |
| "meat seafood                                      " | 7              |
| "deli                                              " | 6              |
| "missing                                           " | 4              |
| "alcohol                                           " | 3              |
| "personal care                                     " | 2              |
| "international                                     " | 2              |

The results show the number of products in each department that were reordered more than 10 times in the current quarter (Q3) but had less than 10 reorders in the previous quarter (Q2). The departments "dairy eggs," "pantry," and "snacks" have the highest number of such products, indicating a significant increase in reorders for items within these categories. On the other hand, departments like "personal care" and "international" have fewer products meeting these criteria.

4. Which aisles show the highest increase in reordered products?

Query 23: Identify the aisles with the most significant increase in reordered products

WITH increased_reorders AS (
	SELECT
		products.product_id,
		products.product_name,
		products.aisle_id,
		products.department_id,
		department.department,
		aisles.aisle,
		-- Total number of times each product has been reordered in the previous quarter (Q2)
		SUM(CASE WHEN prior_orders.reordered THEN 1 ELSE 0 END) AS prior_reorders,
		-- Total number of times each product has been reordered in the current quarter (Q3)
		SUM(CASE WHEN curr_orders.reordered THEN 1 ELSE 0 END) AS current_reorders
	FROM ic_products AS products
	JOIN sampled_order_products_prior_view AS prior_orders -- Get prior order data (Q2)
		ON products.product_id = prior_orders.product_id
	JOIN sampled_order_products_curr_view AS curr_orders -- Get current order data (Q3)
		ON products.product_id = curr_orders.product_id
	JOIN ic_departments AS department
		ON products.department_id = department.department_id
	JOIN ic_aisles AS aisles
		ON products.aisle_id = aisles.aisle_id
	GROUP BY
		products.product_id,
		products.product_name,
		products.aisle_id,
		products.department_id,
		department.department,
		aisles.aisle
	HAVING
		SUM(CASE WHEN prior_orders.reordered THEN 1 ELSE 0 END) < 10 -- Filter on products previously reordered less than 10 times 
		AND SUM(CASE WHEN curr_orders.reordered THEN 1 ELSE 0 END) >= 10 -- Filter on products currently reordered more than 10 times 
	ORDER BY
		current_reorders DESC
)

SELECT aisle,
COUNT(*) AS num_products
FROM increased_reorders
GROUP BY aisle
ORDER BY num_products DESC
LIMIT 10;

Explanation: In this query, a Common Table Expression (CTE) named increased_reorders is created. This CTE first filters products that were reordered less than 10 times in the previous quarter (Q2) but have been reordered more than 10 times in the current quarter (Q3). It calculates the total number of reorders for each qualifying product in both quarters and groups the results by product attributes such as product ID, product name, aisle ID, department ID, department name, and aisle name.

The main query then takes the results from the CTE and groups them by aisle. It counts the number of products in each aisle that meet the specified criteria (reordered less than 10 times in Q2 but more than 10 times in Q3) and orders the results by the count of products in descending order. Finally, it limits the output to the top 10 aisles with the most products meeting these criteria.

This query provides insights into specific aisles where there has been a significant increase in reorders between the previous and current quarters, indicating changing customer preferences or demand patterns.

Results:

| "aisle"                                              | "num_products" |
|------------------------------------------------------|----------------|
| "ice cream ice                                     " | 13             |
| "packaged cheese                                   " | 11             |
| "fresh vegetables                                  " | 10             |
| "yogurt                                            " | 10             |
| "frozen meals                                      " | 9              |
| "dry pasta                                         " | 9              |
| "chips pretzels                                    " | 8              |
| "refrigerated                                      " | 8              |
| "paper goods                                       " | 7              |
| "packaged vegetables fruits                        " | 7              |

These results indicate specific product categories within these aisles that have seen varying levels of changes in customer demand and reordering behavior, suggesting shifts in consumer preferences or market dynamics affecting these particular product types. According to these results, most of the products come from aisles related to "dairy eggs", "pantry" and "snacks".

4. Visualizations

These visualizations provide valuable insights into the changing reordering patterns, top-performing departments, and popular aisles.

Instacart Market Basket Analysis.png Image: Tableau Dashboard developed by author

5. Business Insights

The notable shifts in reordering patterns within top departments and aisles may be attributed to various influencing factors. Given that we are currently in Q3, aligning with the summer months of July to September, several hypotheses can be considered for these changes:

  • Seasonal Demand: Warmer weather in summer often increases the demand for certain dairy products such as ice cream, cheese, and yogurt. This surge in demand during the hot months could lead to fluctuations in reordering patterns as businesses respond to seasonal trends.

  • Promotional Strategies: Strategic promotions, discounts, or marketing campaigns targeted at specific products within these aisles might attract more customer orders. Variations in reordering could be a response to these promotional efforts, impacting consumer buying behavior.

  • Special Events and Occasions: Holidays like Independence Day, parties, or other significant events during summer could lead to heightened orders for items like ice cream, frozen meals, and chips/pretzels. These occasions may prompt changes in reordering patterns as businesses adapt to accommodate increased consumer demand during these events.

6. Business Recommendations

After analyzing the reordering patterns between the previous and current quarters, and the potential influencing factors, these are some recommendations that could help optimize business performance:

  • Seasonal Inventory Management: Understanding the ebb and flow of customer demand throughout the year is crucial. In summer, prioritize items like ice cream, cheese, yogurt, snacks, and fresh fruits and vegetables. By aligning inventory with these seasonal preferences, the business will ensure shelves are well-stocked without excess during slower months.

  • Targeted Promotions: Leverage the insight into consumer behavior during promotions. Identify products with increased demand during special offers and craft enticing promotions, such as bundle deals, to capitalize on customer interest. By tailoring offers to align with customer preferences, the impact of marketing campaigns could be maximized.

  • Event-Based Marketing: Special events and holidays provide excellent opportunities for tailored marketing. Strategically plan promotions and bundles around occasions like Independence Day and other significant summer events. Engaging marketing initiatives during these periods can not only boost sales but also foster customer loyalty, capitalizing on the festive spirit.

Data Source

Tableau Dashboard

Next Post

← Back to the projects

© 2023 DataScienceGal. All rights reserved

LinkedinGithub

Quick Links

AboutProjectsHire Me