<- read_csv("https://bcdanl.github.io/data/beer_markets.csv") beer_data
Let’s analyze the beer_data
data:
::paged_table(beer_data) rmarkdown
Variable Description for beer_data
data.frame
The following describes the variables in the beer_data
data.frame.
hh
: Household identifier_purchase_desc
: Description of the purchasequantity
: The quantity of beer purchasedbrand
: The brand of beerdollar_spent
: The amount spentbeer_floz
: Fluid ounces of beerprice_per_floz
: Price per fluid ouncecontainer
: Type of containerpromo
: Whether the purchase was on promotionmarket
: The market where the purchase was made- Demographics:
age
,employment
status,degree
, class of worker (cow
),race
, and household information likemicrowave
,dishwasher
,tvcable
,singlefamilyhome
, andnpeople
(number of people in the household)
Purchase Patterns
We’ll explore the purchase patterns in the dataset. This includes understanding the most popular brands, the average quantity purchased, and spending habits across different markets. Here are some specific analyses we can perform:
Calculate the total quantity and spending for each brand.
Find the average quantity purchased and average spending per purchase.
Compare the total spending across different markets.
I’ll begin with these analyses and create visualizations to help us understand the data better. Let’s start by calculating the total quantity and spending for each brand.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Reading the CSV file
= pd.read_csv("https://bcdanl.github.io/data/beer_markets.csv")
beer_data
# Setting up the visualisation settings
set(style="whitegrid")
sns.
# Calculate total quantity and spending for each brand
= beer_data.groupby('brand').agg({'quantity':'sum', 'dollar_spent':'sum'}).reset_index()
brand_summary
# Sort by total quantity and spending
= brand_summary.sort_values('quantity', ascending=False)
brand_summary_sorted_quantity = brand_summary.sort_values('dollar_spent', ascending=False) brand_summary_sorted_spent
# Plotting total quantity for each brand
=(10, 8))
plt.figure(figsize='quantity', y='brand', data=brand_summary_sorted_quantity, palette='viridis')
sns.barplot(x'Total Quantity of Beer Purchased by Brand')
plt.title('Total Quantity')
plt.xlabel('Brand')
plt.ylabel( plt.show()
# Plotting total spending for each brand
=(10, 8))
plt.figure(figsize='dollar_spent', y='brand', data=brand_summary_sorted_spent, palette='viridis')
sns.barplot(x'Total Spending on Beer by Brand')
plt.title('Total Spending')
plt.xlabel('Brand')
plt.ylabel( plt.show()
The bar charts above display the total quantity of beer purchased and the total spending by brand. From the looks of it, certain brands dominate in terms of quantity sold and total spending, indicating their popularity.
Now, let’s calculate the average quantity purchased and average spending per purchase. For this, we’ll consider each row in the dataset as a separate purchase and compute the averages accordingly.
# Calculate average quantity purchased and average spending per purchase
= beer_data.groupby('brand').agg({
average_purchase 'quantity': 'mean',
'dollar_spent': 'mean'
}).reset_index()
# Sort by average quantity and average spending
= average_purchase.sort_values('quantity', ascending=False)
average_purchase_sorted_quantity = average_purchase.sort_values('dollar_spent', ascending=False)
average_purchase_sorted_spent
# Plotting average quantity for each brand
=(10, 8))
plt.figure(figsize='quantity', y='brand', data=average_purchase_sorted_quantity, palette='viridis')
sns.barplot(x'Average Quantity of Beer Purchased by Brand')
plt.title('Average Quantity')
plt.xlabel('Brand')
plt.ylabel( plt.show()
# Plotting average spending for each brand
=(10, 8))
plt.figure(figsize='dollar_spent', y='brand', data=average_purchase_sorted_spent, palette='viridis')
sns.barplot(x'Average Spending on Beer by Brand')
plt.title('Average Spending')
plt.xlabel('Brand')
plt.ylabel( plt.show()
The visualizations above depict the average quantity of beer purchased per brand and the average spending per brand. This shows which brands tend to be bought in larger quantities on average and which brands tend to have higher spending per purchase, which could be indicative of their price point or the purchase of premium products.
Next, we’ll look at the total spending across different markets to see if there are any notable differences in spending habits geographically. To do this, we’ll sum up the spending in each market and visualize it.
# Calculate total spending in each market
= beer_data.groupby('market').agg({'dollar_spent':'sum'}).reset_index()
market_spending_summary
# Sort by total spending
= market_spending_summary.sort_values('dollar_spent', ascending=False)
market_spending_summary_sorted
# Plotting total spending in each market
=(12, 10))
plt.figure(figsize='dollar_spent', y='market', data=market_spending_summary_sorted, palette='viridis')
sns.barplot(x'Total Spending on Beer by Market')
plt.title('Total Spending')
plt.xlabel('Market')
plt.ylabel( plt.show()
The bar chart illustrates the total spending on beer by market, showcasing the differences in spending habits across various regions. Some markets have significantly higher spending, which could be due to a variety of factors including market size, consumer preferences, or economic factors.
Now, let’s move on to the second analysis:
Demographic Analysis
We will examine which demographics are buying what kind of beer and whether spending habits vary by demographics such as age, employment, and race. For this, we could look at:
- Spending by age group
- Spending by employment status
- Spending by race
I’ll start by analyzing spending by age group.
# Calculate total spending by age group
= beer_data.groupby('age').agg({'dollar_spent':'sum'}).reset_index()
age_group_spending
# Sort by total spending
= age_group_spending.sort_values('dollar_spent', ascending=False)
age_group_spending_sorted
# Plotting total spending by age group
=(10, 6))
plt.figure(figsize='dollar_spent', y='age', data=age_group_spending_sorted, palette='viridis')
sns.barplot(x'Total Spending on Beer by Age Group')
plt.title('Total Spending')
plt.xlabel('Age Group')
plt.ylabel( plt.show()
The bar chart demonstrates the total spending on beer segmented by age group, highlighting which age groups spend the most on beer. It appears that certain age groups are more dominant in beer spending, which may align with the purchasing power or preferences of those groups.
Next, we will examine spending by employment status.
# Calculate total spending by employment status
= beer_data.groupby('employment').agg({'dollar_spent':'sum'}).reset_index()
employment_spending
# Sort by total spending
= employment_spending.sort_values('dollar_spent', ascending=False)
employment_spending_sorted
# Plotting total spending by employment status
=(10, 6))
plt.figure(figsize='dollar_spent', y='employment', data=employment_spending_sorted, palette='viridis')
sns.barplot(x'Total Spending on Beer by Employment Status')
plt.title('Total Spending')
plt.xlabel('Employment Status')
plt.ylabel( plt.show()
The visualization shows the total spending on beer by employment status. We can see that certain employment groups, such as full-time workers, are spending more on beer, which might be related to their disposable income.
Finally, let’s look at spending by race to complete the demographic analysis.
# Calculate total spending by race
= beer_data.groupby('race').agg({'dollar_spent':'sum'}).reset_index()
race_spending
# Sort by total spending
= race_spending.sort_values('dollar_spent', ascending=False)
race_spending_sorted
# Plotting total spending by race
=(10, 6))
plt.figure(figsize='dollar_spent', y='race', data=race_spending_sorted, palette='viridis')
sns.barplot(x'Total Spending on Beer by Race')
plt.title('Total Spending')
plt.xlabel('Race')
plt.ylabel( plt.show()
The bar chart above indicates the total spending on beer broken down by race, highlighting which racial groups account for the most beer spending within the dataset. This could reflect both the demographics of the regions where the data was collected and cultural preferences regarding beer.
Now, let’s proceed to the third analysis:
Price Sensitivity
We’ll look at the price per fluid ounce and see if there are any trends or correlations with the quantity purchased or the brand popularity. To do this, we’ll calculate the average price per fluid ounce for each brand and then visualize how this relates to the average quantity purchased and the total quantity purchased by brand.
First, let’s calculate the average price per fluid ounce for each brand.
# Calculate average price per fluid ounce for each brand
= beer_data.groupby('brand').agg({
brand_price_sensitivity 'price_per_floz': 'mean',
'quantity': 'sum'
}).reset_index()
# Sort by price per fluid ounce
= brand_price_sensitivity.sort_values('price_per_floz', ascending=True)
brand_price_sensitivity_sorted
# Plotting average price per fluid ounce for each brand and the total quantity purchased
= plt.subplots(figsize=(12, 10))
fig, ax1
= 'tab:red'
color 'Brand')
ax1.set_xlabel('Average Price per Fluid Ounce', color=color)
ax1.set_ylabel('brand'], brand_price_sensitivity_sorted['price_per_floz'], color=color)
ax1.bar(brand_price_sensitivity_sorted[='y', labelcolor=color)
ax1.tick_params(axis'brand'], rotation=90)
ax1.set_xticklabels(brand_price_sensitivity_sorted[
= ax1.twinx() # instantiate a second axes that shares the same x-axis
ax2 = 'tab:blue'
color 'Total Quantity Purchased', color=color) # we already handled the x-label with ax1
ax2.set_ylabel('brand'], brand_price_sensitivity_sorted['quantity'], color=color)
ax2.plot(brand_price_sensitivity_sorted[='y', labelcolor=color)
ax2.tick_params(axis
# otherwise the right y-label is slightly clipped
fig.tight_layout() 'Average Price per Fluid Ounce & Total Quantity Purchased by Brand')
plt.title( plt.show()
In the visualization, we have a bar graph showing the average price per fluid ounce for each brand (in red) and a line graph showing the total quantity purchased for each brand (in blue). This gives us a sense of whether there’s a relationship between the price and the quantity purchased. The x-axis labels are quite compressed due to the number of brands, but we can still observe trends such as whether lower-priced beers tend to be purchased in larger quantities.
Lastly, let’s move to the fourth analysis:
Promotional Impact
We’ll assess the impact of promotions on the quantity of beer purchased. For this analysis, we can calculate the average quantity purchased with and without promotions and visualize the difference. We’ll do this for each brand to see which brands are most affected by promotions.
Let’s begin this analysis by looking at the average quantity purchased with and without promotions for each brand.
# Calculate average quantity purchased with and without promotions for each brand
= beer_data.groupby(['brand', 'promo']).agg({'quantity':'mean'}).reset_index()
promo_impact
# Pivot the data to have promo and non-promo side by side for each brand
= promo_impact.pivot(index='brand', columns='promo', values='quantity').reset_index()
promo_impact_pivot = ['brand', 'non_promo', 'promo']
promo_impact_pivot.columns
# Calculate the difference in average quantity purchased between promo and non-promo
'promo_impact'] = promo_impact_pivot['promo'] - promo_impact_pivot['non_promo']
promo_impact_pivot[
# Sort by the impact of promo
= promo_impact_pivot.sort_values('promo_impact', ascending=False)
promo_impact_pivot_sorted
# Plotting the difference in average quantity purchased between promo and non-promo for each brand
=(12, 10))
plt.figure(figsize='promo_impact', y='brand', data=promo_impact_pivot_sorted, palette='viridis')
sns.barplot(x'Impact of Promotions on Average Quantity Purchased by Brand')
plt.title('Difference in Average Quantity Purchased (Promo - Non-Promo)')
plt.xlabel('Brand')
plt.ylabel( plt.show()
The bar chart illustrates the impact of promotions on the average quantity of beer purchased by brand. A positive value indicates that, on average, more beer is purchased when there is a promotion compared to when there isn’t. Some brands appear to be significantly more influenced by promotions, with customers buying more when the products are on sale or promotion.
This comprehensive analysis has provided insights into purchase patterns, demographic preferences, price sensitivity, and the impact of promotions on beer purchases.