Marketing Campaigns Analysis with Python

Python for Data Science

Python is a top choice for data science due to its simplicity, readability, and vast ecosystem of powerful libraries. Tools like Pandas, NumPy, and scikit-learn make data manipulation, analysis, and machine learning accessible and efficient. Combined with visualization libraries like Matplotlib and Seaborn, and integration with big data platforms and deep learning frameworks, Python offers a flexible, end-to-end solution for data science tasks. Its large community also ensures strong support and continual development.

See the marketing mix report below and explore the different visualizations to understand how Python brings data-driven insights to life.

Problem scenario

Marketing mix stands as a widely utilized concept in the execution of marketing strategies. It encompasses various facets within a comprehensive marketing plan, with a central focus on the four Ps of marketing: product, price, place, and promotion.

Problem objective

As a data scientist, you must conduct exploratory data analysis and hypothesis testing to enhance your comprehension of the diverse factors influencing customer acquisition.

Data description

The dataset aligns with the Four Ps of Marketing, categorizing variables to analyze consumer behavior. Product-related variables track spending across categories, while Price factors like income and deal-based purchases indicate affordability. Place covers shopping channels and web visits, reflecting purchase preferences. Promotion measures campaign engagement, complaints, and recency. Additionally, demographics support segmentation for personalized marketing. This structured approach helps businesses optimize products, pricing, distribution, and promotions for better customer engagement and market performance.

Explore the Data

Take a closer look at the dataset below to uncover patterns and insights firsthand. Use the interactive table to sort, search, and filter the data—it's a great way to understand how each variable plays a role in shaping marketing strategies and customer behavior. See Column Guide below.

Please note that interactive table below shows only a sample of data. To see full dataset download sample CSV file.

ID Year_Birth Education Marital_Status Income Kidhome Teenhome Dt_Customer Recency MntWines MntFruits MntMeatProducts MntFishProducts MntSweetProducts MntGoldProds NumDealsPurchases NumWebPurchases NumCatalogPurchases NumStorePurchases NumWebVisitsMonth AcceptedCmp3 AcceptedCmp4 AcceptedCmp5 AcceptedCmp1 AcceptedCmp2 Response Complain Country
1826 1970 Graduation Divorced $84,835.00 0 0 6/16/14 0 189 104 379 111 189 218 1 4 4 6 1 0 0 0 0 0 1 0 SP
1 1961 Graduation Single $57,091.00 0 0 6/15/14 0 464 5 64 7 0 37 1 7 3 7 5 0 0 0 0 1 1 0 CA
10476 1958 Graduation Married $67,267.00 0 1 5/13/14 0 134 11 59 15 2 30 1 3 2 5 2 0 0 0 0 0 0 0 US
1386 1967 Graduation Together $32,474.00 1 1 5/11/14 0 10 0 1 0 0 0 1 1 0 2 7 0 0 0 0 0 0 0 AUS
5371 1989 Graduation Single $21,474.00 1 0 4/8/14 0 6 16 24 11 0 34 2 3 1 2 7 1 0 0 0 0 1 0 SP
7348 1958 PhD Single $71,691.00 0 0 3/17/14 0 336 130 411 240 32 43 1 4 7 5 2 0 0 0 0 0 1 0 SP
4073 1954 2n Cycle Married $63,564.00 0 0 1/29/14 0 769 80 252 15 34 65 1 10 10 7 6 1 0 0 0 0 1 0 GER
1991 1967 Graduation Together $44,931.00 0 1 1/18/14 0 78 0 11 0 0 7 1 2 1 3 5 0 0 0 0 0 0 0 SP
4047 1954 PhD Married $65,324.00 0 1 1/11/14 0 384 0 102 21 32 5 3 6 2 9 4 0 0 0 0 0 0 0 US
9477 1954 PhD Married $65,324.00 0 1 1/11/14 0 384 0 102 21 32 5 3 6 2 9 4 0 0 0 0 0 0 0 IND
2079 1947 2n Cycle Married $81,044.00 0 0 12/27/13 0 450 26 535 73 98 26 1 5 6 10 1 0 0 0 0 0 0 0 US
5642 1979 Master Together $62,499.00 1 0 12/9/13 0 140 4 61 0 13 4 2 3 1 6 4 0 0 0 0 0 0 0 SP
10530 1959 PhD Widow $67,786.00 0 0 12/7/13 0 431 82 441 80 20 102 1 3 6 6 1 0 0 0 0 0 1 0 IND
2964 1981 Graduation Married $26,872.00 0 0 10/16/13 0 3 10 8 3 16 32 1 1 1 2 6 0 0 0 0 0 0 0 CA
10311 1969 Graduation Married $4,428.00 0 1 10/5/13 0 16 4 12 2 4 321 0 25 0 0 1 0 0 0 0 0 0 0 SP
837 1977 Graduation Married $54,809.00 1 1 9/11/13 0 63 6 57 13 13 22 4 2 1 5 4 0 0 0 0 0 0 0 SP
10521 1977 Graduation Married $54,809.00 1 1 9/11/13 0 63 6 57 13 13 22 4 2 1 5 4 0 0 0 0 0 1 0 SP
10175 1958 PhD Divorced $32,173.00 0 1 8/1/13 0 18 0 2 0 0 2 1 1 0 3 4 0 0 0 0 0 0 0 SP
1473 1960 2n Cycle Single $47,823.00 0 1 7/23/13 0 53 1 5 2 1 10 2 2 0 3 8 0 0 0 0 0 0 0 CA
2795 1958 Master Single $30,523.00 2 1 7/1/13 0 5 0 3 0 0 5 1 1 0 2 7 0 0 0 0 0 0 0 CA
2285 1954 Master Together $36,634.00 0 1 5/28/13 0 213 9 76 4 3 30 3 5 2 5 7 0 0 0 0 0 0 0 SA
115 1966 Master Single $43,456.00 0 1 3/26/13 0 275 11 68 25 7 7 3 5 1 8 5 0 0 0 0 0 0 0 IND
10470 1979 Master Married $40,662.00 1 0 3/15/13 0 40 2 23 0 4 23 2 2 1 3 4 0 0 0 0 0 0 0 GER
4065 1976 PhD Married $49,544.00 1 0 2/12/13 0 308 0 73 0 0 23 2 5 1 8 7 0 0 0 0 0 0 0 SP
10968 1969 Graduation Single $57,731.00 0 1 11/23/12 0 266 21 300 65 8 44 4 8 8 6 6 0 0 0 0 0 0 0 IND
5985 1965 Master Single $33,168.00 0 1 10/13/12 0 80 1 37 0 1 3 3 2 1 4 7 0 0 0 0 0 0 0 SP
5430 1956 Graduation Together $54,450.00 1 1 9/14/12 0 454 0 171 8 19 32 12 9 2 8 8 0 0 0 0 0 0 0 SP
8432 1956 Graduation Together $54,450.00 1 1 9/14/12 0 454 0 171 8 19 32 12 9 2 8 8 0 0 0 0 0 0 0 SP
453 1956 PhD Widow $35,340.00 1 1 6/29/14 1 27 0 12 0 1 5 2 2 0 3 5 0 0 0 0 0 0 0 SP
9687 1975 Graduation Single $73,170.00 0 0 5/31/14 1 184 174 256 50 30 32 1 5 4 6 2 0 0 0 0 0 0 0 CA
8890 1971 PhD Divorced $65,808.00 1 1 5/30/14 1 155 7 80 13 7 10 3 5 1 5 6 0 0 0 0 0 0 0 SP
9264 1986 Graduation Married $79,529.00 0 0 4/27/14 1 423 42 706 73 197 197 1 4 8 9 2 0 0 0 0 0 0 0 CA
5824 1972 PhD Together $34,578.00 2 1 4/11/14 1 7 0 1 0 0 0 1 1 0 2 6 0 0 0 0 0 0 0 AUS
5794 1974 PhD Married $46,374.00 0 1 3/17/14 1 408 0 21 0 0 17 3 7 1 7 8 0 1 0 1 0 1 0 IND
3068 1990 Graduation Married $18,351.00 0 0 10/29/13 1 1 12 9 0 14 7 1 2 0 3 7 0 0 0 0 0 0 0 SP
7962 1987 PhD Single $95,169.00 0 0 10/9/13 1 1285 21 449 106 20 20 1 4 3 4 1 0 0 1 1 0 1 0 SP
2681 1984 2n Cycle Married $65,370.00 0 0 8/1/13 1 71 22 112 138 89 29 1 2 3 13 1 0 0 0 0 0 0 0 SP
10141 1960 Master Divorced $39,228.00 0 0 5/10/13 1 7 1 6 0 3 3 1 0 0 3 4 0 0 0 0 0 0 0 SA
3725 1961 PhD Single $84,865.00 0 0 5/9/13 1 1248 16 349 43 16 16 1 2 4 9 4 0 1 1 1 1 1 0 SP
3767 1968 Graduation Married $61,314.00 0 1 4/25/13 1 378 0 189 97 172 172 2 5 5 12 3 0 0 0 0 0 0 0 SP
5585 1972 Graduation Single $21,359.00 1 0 4/20/13 1 12 2 17 6 1 10 2 2 0 3 8 0 0 0 0 0 1 0 CA
7030 1955 PhD Married $66,465.00 0 1 3/30/13 1 1200 0 204 38 29 14 3 11 9 12 6 0 0 0 1 0 0 0 SP
1524 1983 2n Cycle Single $81,698.00 0 0 3/1/13 1 709 45 115 30 160 45 1 8 2 5 5 0 0 0 1 0 1 0 SP
3657 1986 Graduation Single $39,146.00 1 0 2/14/13 1 94 1 33 13 12 12 3 4 0 4 8 0 0 0 0 0 0 0 SP
5740 1970 2n Cycle Divorced $25,959.00 1 1 2/14/13 1 4 2 12 7 5 26 2 1 2 2 6 0 0 0 0 0 1 0 SP
9595 1961 Graduation Together $64,260.00 0 0 1/11/13 1 539 169 816 20 0 30 1 4 5 4 3 0 0 0 0 0 1 0 SP
3158 1973 Graduation Married $32,300.00 1 0 1/3/13 1 13 3 6 6 5 6 1 1 0 3 8 0 0 0 0 0 0 0 SP
5114 1965 Master Married $74,806.00 0 1 12/19/12 1 670 9 249 0 28 9 2 5 4 5 4 0 0 0 0 0 0 0 AUS
340 1970 Graduation Divorced $72,967.00 0 1 12/15/12 1 158 35 179 0 0 125 2 7 2 8 5 1 0 0 0 0 1 0 GER
8805 1960 Graduation Single $48,904.00 0 1 12/2/12 1 283 10 38 0 13 27 4 7 2 4 8 0 0 0 0 0 0 0 US
1241 1984 2n Cycle Married $14,796.00 1 0 9/17/12 1 13 3 8 7 4 16 2 1 0 3 9 0 0 0 0 0 1 0 GER
1402 1954 Master Married $66,991.00 0 0 9/11/12 1 496 36 460 189 60 12 3 4 8 6 3 0 0 0 0 0 0 0 GER
7264 1978 2n Cycle Single $52,195.00 2 1 5/12/14 2 12 0 4 0 0 1 1 1 0 2 8 0 0 0 0 0 0 0 SA
1619 1956 Graduation Married $90,369.00 0 0 4/28/14 2 292 51 981 224 23 17 1 4 6 6 1 0 0 0 0 0 1 0 SP
6398 1974 Basic Married $18,393.00 1 0 3/29/14 2 7 10 13 16 0 4 2 3 0 3 8 0 0 0 0 0 0 0 SP
1857 1952 Graduation Single $47,139.00 1 1 3/6/14 2 46 0 12 0 2 23 2 2 1 2 7 0 0 0 0 0 1 0 SP
4877 1973 Master Married $38,576.00 0 1 3/4/14 2 34 0 7 0 0 0 1 1 0 3 7 0 0 0 0 0 0 0 IND
3066 1975 PhD Together $61,905.00 0 1 2/4/14 2 167 0 43 6 2 13 2 4 2 4 5 0 0 0 0 0 0 0 SA
10286 1962 Graduation Married $83,715.00 0 0 2/3/14 2 318 8 407 150 35 8 1 2 8 13 0 0 0 0 0 0 0 0 SA
1992 1964 Graduation Married $60,597.00 0 1 1/1/14 2 522 0 257 32 16 66 4 2 2 8 7 0 0 0 1 0 1 0 SP
4246 1982 Master Single $6,560.00 0 0 12/12/13 2 67 11 26 4 3 262 0 1 0 1 17 0 0 0 0 0 0 0 SP
10623 1961 Master Together $48,330.00 0 1 11/15/13 2 28 0 4 0 0 0 1 1 0 3 5 0 0 0 0 0 0 0 SP
4867 1968 PhD Single $38,236.00 1 1 9/20/13 2 58 0 18 2 0 10 4 3 0 4 7 0 0 0 0 0 0 0 SP
3112 1977 Master Married $22,701.00 1 0 9/5/13 2 2 4 9 0 4 5 1 1 0 3 5 0 0 0 0 0 0 0 SP
4865 1974 Master Divorced $53,367.00 1 1 8/31/13 2 229 7 140 10 3 11 7 5 1 8 7 0 0 0 0 0 1 0 AUS
6287 1986 Graduation Together $34,728.00 1 0 7/30/13 2 14 0 16 0 0 6 1 1 1 2 6 0 0 0 0 0 1 0 SP
4405 1956 Master Married $63,915.00 0 2 7/30/13 2 622 7 115 30 0 15 2 6 3 12 5 0 0 0 0 0 0 0 AUS
5332 1960 2n Cycle Married $82,504.00 0 0 7/27/13 2 362 50 431 134 35 54 1 3 6 7 1 0 0 0 0 0 0 0 IND
1519 1972 PhD Single $38,578.00 1 1 6/22/13 2 38 4 22 3 3 3 3 3 0 3 8 0 0 0 0 0 1 0 SP
9080 1972 PhD Single $38,578.00 1 1 6/22/13 2 38 4 22 3 3 3 3 3 0 3 8 0 0 0 0 0 0 0 SP
1772 1975 PhD Married $79,174.00 0 0 1/11/13 2 1074 37 518 193 92 129 1 5 6 7 2 0 0 1 1 0 1 0 CA
5341 1962 2n Cycle Divorced $81,975.00 0 1 1/5/13 2 983 76 184 180 138 27 1 6 3 4 7 0 0 1 0 0 0 0 SA
5510 1977 Master Married $43,263.00 0 1 11/21/12 2 262 6 61 0 10 102 3 5 2 6 5 0 0 0 0 0 0 0 SP
3887 1970 Graduation Single $27,242.00 1 0 11/11/12 2 3 17 26 20 1 39 2 2 0 3 9 0 0 0 0 0 1 0 IND
7022 1971 Graduation Married $76,445.00 1 0 9/28/12 2 739 107 309 140 80 35 1 2 5 13 6 0 0 0 0 0 0 0 SA
9999 1965 Graduation Together $75,276.00 0 0 9/27/12 2 610 105 125 137 42 21 1 9 4 9 5 0 0 0 0 0 0 0 SP
10352 1963 Graduation Widow $34,213.00 1 1 9/7/12 2 50 4 28 6 3 26 3 3 1 2 9 0 0 0 0 0 1 0 SA
7919 1976 PhD Together $72,335.00 0 0 8/13/12 2 1285 105 653 28 21 0 1 10 4 8 8 0 0 0 0 0 1 0 SP
4114 1964 Master Married $79,143.00 0 0 8/11/12 2 650 37 780 27 167 32 1 6 9 13 3 0 0 0 0 0 0 0 AUS
7990 1947 Graduation Married $27,469.00 0 0 8/2/12 2 9 1 2 3 2 0 1 0 0 3 6 0 0 0 0 0 0 0 CA
9888 1969 Graduation Together $68,695.00 0 0 6/25/14 3 458 81 356 106 50 40 1 4 4 7 2 0 0 0 0 0 0 0 SP
4399 1969 Graduation Together $68,695.00 0 0 6/25/14 3 458 81 356 106 50 40 1 4 4 7 2 0 0 0 0 0 0 0 CA
4452 1957 Graduation Single $50,388.00 0 1 5/28/14 3 292 6 37 0 3 34 4 6 1 6 7 0 1 0 1 0 1 0 GER
4785 1970 PhD Together $77,622.00 0 2 4/14/14 3 520 7 154 19 0 14 2 6 3 11 3 0 0 0 0 0 0 0 SA
8461 1962 Graduation Divorced $46,102.00 2 1 3/10/14 3 14 0 1 0 0 1 1 1 0 2 7 0 0 0 0 0 0 0 SP
3878 1980 2n Cycle Single $31,859.00 1 0 2/27/14 3 3 4 7 15 8 11 1 1 0 3 7 0 0 0 0 0 0 0 SP
9612 1987 2n Cycle Single $23,830.00 0 0 2/7/14 3 1 8 6 4 8 16 1 1 0 3 7 0 0 0 0 0 0 0 SP
4098 1973 Graduation Married $24,639.00 1 1 1/28/14 3 20 3 16 0 4 1 3 2 0 4 6 0 0 0 0 0 0 0 AUS
158 1945 PhD Together $71,604.00 0 0 11/17/13 3 345 53 528 98 75 97 1 8 3 5 4 1 0 0 0 0 1 0 SP
3896 1984 Graduation Married $27,255.00 1 0 11/7/13 3 22 1 11 0 1 2 1 1 0 3 7 0 0 0 0 0 0 0 SP
9970 1977 Graduation Together $55,375.00 0 1 10/17/13 3 42 11 57 10 28 14 1 1 1 6 2 0 0 0 0 0 0 0 CA
4002 1960 PhD Married $77,037.00 0 1 10/13/13 3 463 96 333 168 53 10 1 7 7 12 3 0 0 0 0 0 0 0 SP
10914 1970 Graduation Single $24,163.00 1 1 10/12/13 3 4 1 7 2 1 2 2 1 0 3 4 0 0 0 0 0 0 0 SP
7279 1969 PhD Together $69,476.00 0 0 9/30/13 3 260 86 559 63 9 67 1 4 6 4 2 0 0 0 0 0 0 0 US
10582 1979 Graduation Married $72,063.00 0 1 7/3/13 3 180 32 348 76 32 90 2 5 2 12 2 0 0 0 0 0 0 0 GER
4470 1962 Master Married $58,646.00 0 1 6/10/13 3 62 1 44 6 5 22 1 2 1 4 4 0 0 0 0 0 0 0 SP
6183 1962 Master Married $58,646.00 0 1 6/10/13 3 62 1 44 6 5 22 1 2 1 4 4 0 0 0 0 0 0 0 GER
6379 1949 Master Widow $47,570.00 1 1 5/29/13 3 67 1 20 0 2 31 3 2 2 2 7 0 0 0 0 0 1 0 US
8601 1980 Graduation Married $80,011.00 0 1 4/29/13 3 421 76 536 82 178 102 2 8 6 5 4 0 0 0 0 0 0 0 AUS
4827 1956 PhD Single $54,998.00 0 1 3/10/13 3 154 22 202 39 30 8 5 4 2 9 4 0 0 0 0 0 1 0 SP

Column Guide

Variable Description
IDCustomer's unique identifier
Year_BirthCustomer's birth year
EducationCustomer's education level
Marital_StatusCustomer's marital status
IncomeCustomer's yearly household income
Kidhomenumber of small children in customer's household
Teenhomeno of teenagers in customer's house
Dt_CustomerDate of customer's enrollment with the company
Recencynumber of days since the last purchase
MntWinesamount spent on wine in last 2 years
MntFruitsamount spent on fruits in last 2 years
MntMeatProductsamount spent on meat products in last 2 years
MntFishProductsamount spent on fish products in last 2 years
MntSweetProductsamount spent on sweet products in last 2 years
MntGoldProdsamount spent on gold in last 2 years
NumDealsPurchasesno of purchases made with discount
NumWebPurchasesno of purchases made through company's website
NumCatalogPurchasesno of purchases made using catelogue
NumStorePurchasesno of purchases made directly in store
NumWebVisitsMonthno of visits to company's website in the last month
AcceptedCmp31 if the customer accepted the offer in the 3rd campaign, 0 otherwise
AcceptedCmp41 if the customer accepted the offer in the 4th campaign, 0 otherwise
AcceptedCmp51 if the customer accepted the offer in the 5th campaign, 0 otherwise
AcceptedCmp11 if the customer accepted the offer in the first campaign, 0 otherwise
AcceptedCmp21 if the customer accepted the offer in the 2nd campaign, 0 otherwise
Response1 if the customer accepted the offer in the last campaign, 0 otherwise
Complain1 if customer complained in the last 2 years
CountryCustomer's location

Data Import and Inspection

After importing the data, examine variables such as Dt_Customer and Income to verify their accurate importation.


  # Python Data Science Libraries we will use
  import pandas as pd
  import numpy as np
  import plotly.graph_objs as go
  import matplotlib.pyplot as plt
  import seaborn as sns
  from scipy.stats import ttest_ind
  import plotly.express as px
  import json
  import plotly.utils


  data = pd.read_csv('marketing_data.csv')
  data.columns = data.columns.str.strip() 
  data.head()

  print(data['Income'].head(10))
  print(data['Income'].dtype)

  0    $84,835.00 
  1    $57,091.00 
  2    $67,267.00 
  3    $32,474.00 
  4    $21,474.00 
  5    $71,691.00 
  6    $63,564.00 
  7    $44,931.00 
  8    $65,324.00 
  9    $65,324.00 
  Name: Income, dtype: object
  object

  # Convert Income to string and clean it BEFORE any numeric conversion 
  df['Income'] = df['Income'].astype(str).str.replace('$', '', regex=False).str.replace(',', '', regex=False)

  # Convert to numeric
  df['Income'] = pd.to_numeric(df['Income'], errors='coerce')

  data['Dt_Customer'] = pd.to_datetime(data['Dt_Customer'], format='%m/%d/%y')

  print(data['Dt_Customer'].head())
  0   2014-06-16
  1   2014-06-15
  2   2014-05-13
  3   2014-05-11
  4   2014-04-08
  Name: Dt_Customer, dtype: datetime64[ns]

  data[['Dt_Customer', 'Income']].head()
  Dt_Customer	Income
  0	2014-06-16	84835.0
  1	2014-06-15	57091.0
  2	2014-05-13	67267.0
  3	2014-05-11	32474.0
  4	2014-04-08	21474.0

  

Missing Value Imputation

There are missing income values for some customers. To address this, we assume customers with similar education and marital status tend to have comparable yearly incomes. We impute missing income values using the group mean based on these two variables. It's also necessary to ensure that `Education` and `Marital_Status` categories are cleaned before imputation.


    # Checking Education and Marital_Status unique values for cleaning
    print(data['Education'].unique(), data['Marital_Status'].unique())
    
    # Output should look like this:
    # array(['Graduation', 'PhD', '2n Cycle', 'Master', 'Basic'], dtype=object),
    # array(['Divorced', 'Single', 'Married', 'Together', 'Widow', 'YOLO',
    #        'Alone', 'Absurd'], dtype=object)
    
    # Imputing missing Income values based on Education and Marital_Status group mean
    data['Income'] = data.groupby(['Education', 'Marital_Status'])['Income'].transform(lambda x: x.fillna(x.mean()))
    

Feature Engineering

This step derives new features that capture key behavioral and demographic patterns:

  • total_children: sum of children at home (Kidhome + Teenhome)
  • Age: derived from the customer's year of birth
  • total_spending: sum of all product category expenditures
  • total_purchases: total purchases across all purchase channels


  # Create total number of children
  data['total_children'] = data['Kidhome'] + data['Teenhome']

  # Derive customer's age
  data['Age'] = 2025 - data['Year_Birth']

  # Calculate total spending across product categories
  spending_cols = ['MntWines', 'MntFruits', 'MntMeatProducts', 
                  'MntFishProducts', 'MntSweetProducts', 'MntGoldProds']
  data['total_spending'] = data[spending_cols].sum(axis=1)

  # Calculate total purchases across channels
  purchase_cols = ['NumWebPurchases', 'NumCatalogPurchases', 'NumStorePurchases']
  data['total_purchases'] = data[purchase_cols].sum(axis=1)
  

Exploratory Data Analysis (Income) & Outlier Treatment

To better understand the distribution of income values and identify potential anomalies, we use boxplots and histograms. Outliers are detected using the interquartile range (IQR) method. Income values that fall outside of the IQR range are capped to the boundary limits to reduce skewness and improve robustness.


  # Calculate IQR bounds
  Q1 = data['Income'].quantile(0.25)
  Q3 = data['Income'].quantile(0.75)
  IQR = Q3 - Q1
  lower = Q1 - 1.5 * IQR
  upper = Q3 + 1.5 * IQR

  # Save original income values
  data['Income_Original'] = data['Income']

  # Apply outlier capping
  data['Income'] = np.where(data['Income'] > upper, upper,
                  np.where(data['Income'] < lower, lower, data['Income']))
  

Outlier Treatment: Age

Similar to income, age can also contain extreme values that may distort the analysis. We apply outlier treatment using the IQR method to cap ages outside a reasonable range. This ensures more stable statistical modeling and visualization.


  # Save original Age values
  data['Age_Original'] = data['Age']

  # Calculate IQR for Age
  Q1 = data['Age'].quantile(0.25)
  Q3 = data['Age'].quantile(0.75)
  IQR = Q3 - Q1
  lower = Q1 - 1.5 * IQR
  upper = Q3 + 1.5 * IQR

  # Apply outlier capping
  data['Age'] = np.where(data['Age'] > upper, upper,
              np.where(data['Age'] < lower, lower, data['Age']))
  

To better illustrate the impact of IQR-based outlier treatment on the Age variable, we display separate histograms before and after capping. Notice that extremely high values (e.g., age 100+) are removed in the post-treatment view.

Feature Encoding & Correlation

We apply feature engineering to prepare the data for modeling. First, we encode Education using an ordinal scale. Then, we group unusual values in Marital_Status under "Other", and apply one-hot encoding to convert this categorical field into binary indicators. Finally, we generate a correlation matrix to understand how features relate to each other.


  # Ordinal encoding for Education
  edu_order = {'Basic': 0, '2n Cycle': 1, 'Graduation': 2, 'Master': 3, 'PhD': 4}
  data['Education_encoded'] = data['Education'].map(edu_order)

  # Group rare marital status categories
  data['Marital_Status_Clean'] = data['Marital_Status'].replace({
      'Absurd': 'Other',
      'YOLO': 'Other',
      'Alone': 'Other'
  })

  # One-hot encoding
  data = pd.get_dummies(data, columns=['Marital_Status_Clean'], prefix='Marital')
  

Focused Correlation Heatmap

Rather than examining all 30+ variables at once, we focus on a curated set of key features such as age, income, education level, and purchase behavior. This cleaner correlation matrix makes it easier to identify important relationships, such as how income relates to total spending, or how age correlates with recency or number of children.


  # Select features most relevant to behavior and segmentation
  selected_features = [
      "Age", "Income", "Education_encoded", "total_spending",
      "total_purchases", "total_children", "Recency", "Complain"
  ]

  # Compute correlation matrix for just these variables
  corr = data[selected_features].corr()
  

Hypothesis A: Older Individuals Prefer In-Store Shopping

We test the idea that older individuals may have lower technological proficiency and therefore prefer to shop in-store. We analyze the correlation between age and the number of store purchases, and compare purchase behavior across age groups.


# Correlation between Age and NumStorePurchases
data['AgeGroup'] = pd.cut(data['Age'], bins=[18, 30, 45, 60, 75, 100], labels=['18–30', '31–45', '46–60', '61–75', '76+'])
correlation = data[['Age', 'NumStorePurchases']].corr()
  

Correlation between Age and Store Purchases: 0.1344

ℹ️ There's no strong evidence of a negative relationship based on correlation.

Hypothesis B: Parents prefer web purchases

Individuals with children may face time constraints and be more price-sensitive. To explore whether they prefer online shopping, we grouped customers by number of children and compared their web purchase activity. This is visualized using a boxplot and scatterplot, and evaluated statistically using correlation.


  # Create total_children feature
  df['total_children'] = df['Kidhome'] + df['Teenhome']
  
  # Filter relevant fields
  df_b = df[['total_children', 'NumWebPurchases']].dropna()
  
  # Correlation between children and web purchases
  corr = df_b.corr().iloc[0, 1]
  print(f"Correlation: {corr:.4f}")
  
📈 Correlation: -0.1464
The correlation between number of children and number of web purchases is low, suggesting no strong linear relationship. However, boxplots show some variation in behavior among parents.

Hypothesis C: Higher education leads to higher spending

Customers with higher education levels may have more purchasing power and therefore spend more on average. This analysis groups customers by education level and compares their total spending through a boxplot and bar chart.


  # Ensure 'total_spending' exists
  df['total_spending'] = df[['MntWines', 'MntFruits', 'MntMeatProducts',
                            'MntFishProducts', 'MntSweetProducts', 'MntGoldProds']].sum(axis=1)

  # Group by education and calculate mean
  edu_spending = df.groupby('Education')['total_spending'].mean()
  # print(edu_spending)

Hypothesis D: Do U.S. customers make more purchases than the rest of the world?

This hypothesis explores whether customers from the United States significantly outperform international customers in terms of total purchases. A boxplot compares purchase volumes, and a t-test checks whether the difference is statistically significant.


from scipy.stats import ttest_ind

us = df[df['Country'] == 'USA']['total_purchases']
non_us = df[df['Country'] != 'USA']['total_purchases']

t_stat, p_value = ttest_ind(us, non_us, equal_var=False)
# print(f"T-stat: {t_stat:.4f}, P-value: {p_value:.4f}")
US Average: nan
Non-US Average: 12.54
T-statistic: nan
P-value: nan
The difference is not statistically significant. There's no strong evidence that US customers purchase more than international ones.

Business Insights Visualizations

This chart ranks products based on total customer spending. Wines lead revenue, indicating strong customer preference.

Does age influence how likely a person is to accept a marketing campaign? This scatterplot helps us visualize any patterns.

Knowing where campaigns are most accepted helps optimize international marketing strategy. This bar chart shows campaign acceptance volume by country.

Analyzing average spending by family size reveals how children impact household spending behavior.

Which education groups are most likely to lodge complaints? This visualization breaks down complaint rates by education level.

Marketing Campaign Analysis Report

Executive Summary

This project analyzes customer behavior using the Four Ps of Marketing: Product, Price, Place, and Promotion through interactive visual analytics. A combination of data cleaning, feature engineering, exploratory data analysis, and hypothesis testing was used to deliver data-driven marketing insights.

Problem Objective

Explore a structured marketing dataset and uncover how demographics, customer behavior, and campaign responses influence marketing performance. The ultimate goal is to help businesses personalize campaigns, target the right audience, and improve ROI.

Methodology

  • Data cleaning and missing value imputation for income
  • Feature engineering: age, total children, total purchases, total spending
  • Outlier detection and treatment using boxplots and histograms
  • Encoding categorical variables for modeling and analysis
  • Interactive correlation heatmaps and hypothesis testing (A–D)
  • Business insight visualizations embedded in a Flask web application

Key Insights

  • Product: Wines generate the most revenue; fruits and sweets rank lowest.
  • Price: Higher income strongly correlates with increased total spending.
  • Promotion: Younger users and US customers respond more, but not always with higher purchases.
  • Complaints:"2n Cycle" and Graduation education groups show higher complaint rates.
  • Geography: Spain leads in campaign acceptance volumes.

Hypothesis Testing Summary

  1. H1: Older customers prefer in-store shopping — supported with moderate correlation.
  2. H2: More children → fewer online purchases — confirmed with negative correlation.
  3. H3: Age impacts campaign response — supported; acceptance drops with age.
  4. H4: US significantly outperforms others in responses — rejected; no significant difference.

Business Recommendations

  • Double down on high-margin products like wine and gold items
  • Segment by age and income for personalized marketing strategies
  • Track and address complaints in education segments with higher dissatisfaction
  • Optimize multi-channel strategies: online and in-store coexist
  • Continue A/B testing on regional campaigns for effectiveness

Conclusion

Through interactive visualizations and hypothesis-driven exploration, this analysis empowers strategic marketing decisions. The combination of behavioral patterns and campaign responsiveness builds a solid foundation for customer-centric planning.