Marketing Campaigns Analysis with Python

Python for Data Science

Python is a top choice for data science due to its simplicity, readability, and vast ecosystem of powerful libraries. Tools like Pandas, NumPy, and scikit-learn make data manipulation, analysis, and machine learning accessible and efficient. Combined with visualization libraries like Matplotlib and Seaborn, and integration with big data platforms and deep learning frameworks, Python offers a flexible, end-to-end solution for data science tasks. Its large community also ensures strong support and continual development.

See the marketing mix report below and explore the different visualizations to understand how Python brings data-driven insights to life.

Problem scenario

Marketing mix stands as a widely utilized concept in the execution of marketing strategies. It encompasses various facets within a comprehensive marketing plan, with a central focus on the four Ps of marketing: product, price, place, and promotion.

Problem objective

As a data scientist, you must conduct exploratory data analysis and hypothesis testing to enhance your comprehension of the diverse factors influencing customer acquisition.

Data description

The dataset aligns with the Four Ps of Marketing, categorizing variables to analyze consumer behavior. Product-related variables track spending across categories, while Price factors like income and deal-based purchases indicate affordability. Place covers shopping channels and web visits, reflecting purchase preferences. Promotion measures campaign engagement, complaints, and recency. Additionally, demographics support segmentation for personalized marketing. This structured approach helps businesses optimize products, pricing, distribution, and promotions for better customer engagement and market performance.

Explore the Data

Take a closer look at the dataset below to uncover patterns and insights firsthand. Use the interactive table to sort, search, and filter the data—it's a great way to understand how each variable plays a role in shaping marketing strategies and customer behavior. See Column Guide below.

Please note that interactive table below shows only a sample of data. To see full dataset download sample CSV file.

ID	Year_Birth	Education	Marital_Status	Income	Kidhome	Teenhome	Dt_Customer	Recency	MntWines	MntFruits	MntMeatProducts	MntFishProducts	MntSweetProducts	MntGoldProds	NumDealsPurchases	NumWebPurchases	NumCatalogPurchases	NumStorePurchases	NumWebVisitsMonth	AcceptedCmp3	AcceptedCmp4	AcceptedCmp5	AcceptedCmp1	AcceptedCmp2	Response	Country
1826	1970	Graduation	Divorced	$84,835.00	0	0	6/16/14	0	189	104	379	111	189	218	1	4	4	6	1	0	0	0	0	0	1	SP
1	1961	Graduation	Single	$57,091.00	0	0	6/15/14	0	464	5	64	7	0	37	1	7	3	7	5	0	0	0	0	1	1	CA
10476	1958	Graduation	Married	$67,267.00	0	1	5/13/14	0	134	11	59	15	2	30	1	3	2	5	2	0	0	0	0	0	0	US
1386	1967	Graduation	Together	$32,474.00	1	1	5/11/14	0	10	0	1	0	0	0	1	1	0	2	7	0	0	0	0	0	0	AUS
5371	1989	Graduation	Single	$21,474.00	1	0	4/8/14	0	6	16	24	11	0	34	2	3	1	2	7	1	0	0	0	0	1	SP
7348	1958	PhD	Single	$71,691.00	0	0	3/17/14	0	336	130	411	240	32	43	1	4	7	5	2	0	0	0	0	0	1	SP
4073	1954	2n Cycle	Married	$63,564.00	0	0	1/29/14	0	769	80	252	15	34	65	1	10	10	7	6	1	0	0	0	0	1	GER
1991	1967	Graduation	Together	$44,931.00	0	1	1/18/14	0	78	0	11	0	0	7	1	2	1	3	5	0	0	0	0	0	0	SP
4047	1954	PhD	Married	$65,324.00	0	1	1/11/14	0	384	0	102	21	32	5	3	6	2	9	4	0	0	0	0	0	0	US
9477	1954	PhD	Married	$65,324.00	0	1	1/11/14	0	384	0	102	21	32	5	3	6	2	9	4	0	0	0	0	0	0	IND
2079	1947	2n Cycle	Married	$81,044.00	0	0	12/27/13	0	450	26	535	73	98	26	1	5	6	10	1	0	0	0	0	0	0	US
5642	1979	Master	Together	$62,499.00	1	0	12/9/13	0	140	4	61	0	13	4	2	3	1	6	4	0	0	0	0	0	0	SP
10530	1959	PhD	Widow	$67,786.00	0	0	12/7/13	0	431	82	441	80	20	102	1	3	6	6	1	0	0	0	0	0	1	IND
2964	1981	Graduation	Married	$26,872.00	0	0	10/16/13	0	3	10	8	3	16	32	1	1	1	2	6	0	0	0	0	0	0	CA
10311	1969	Graduation	Married	$4,428.00	0	1	10/5/13	0	16	4	12	2	4	321	0	25	0	0	1	0	0	0	0	0	0	SP
837	1977	Graduation	Married	$54,809.00	1	1	9/11/13	0	63	6	57	13	13	22	4	2	1	5	4	0	0	0	0	0	0	SP
10521	1977	Graduation	Married	$54,809.00	1	1	9/11/13	0	63	6	57	13	13	22	4	2	1	5	4	0	0	0	0	0	1	SP
10175	1958	PhD	Divorced	$32,173.00	0	1	8/1/13	0	18	0	2	0	0	2	1	1	0	3	4	0	0	0	0	0	0	SP
1473	1960	2n Cycle	Single	$47,823.00	0	1	7/23/13	0	53	1	5	2	1	10	2	2	0	3	8	0	0	0	0	0	0	CA
2795	1958	Master	Single	$30,523.00	2	1	7/1/13	0	5	0	3	0	0	5	1	1	0	2	7	0	0	0	0	0	0	CA
2285	1954	Master	Together	$36,634.00	0	1	5/28/13	0	213	9	76	4	3	30	3	5	2	5	7	0	0	0	0	0	0	SA
115	1966	Master	Single	$43,456.00	0	1	3/26/13	0	275	11	68	25	7	7	3	5	1	8	5	0	0	0	0	0	0	IND
10470	1979	Master	Married	$40,662.00	1	0	3/15/13	0	40	2	23	0	4	23	2	2	1	3	4	0	0	0	0	0	0	GER
4065	1976	PhD	Married	$49,544.00	1	0	2/12/13	0	308	0	73	0	0	23	2	5	1	8	7	0	0	0	0	0	0	SP
10968	1969	Graduation	Single	$57,731.00	0	1	11/23/12	0	266	21	300	65	8	44	4	8	8	6	6	0	0	0	0	0	0	IND
5985	1965	Master	Single	$33,168.00	0	1	10/13/12	0	80	1	37	0	1	3	3	2	1	4	7	0	0	0	0	0	0	SP
5430	1956	Graduation	Together	$54,450.00	1	1	9/14/12	0	454	0	171	8	19	32	12	9	2	8	8	0	0	0	0	0	0	SP
8432	1956	Graduation	Together	$54,450.00	1	1	9/14/12	0	454	0	171	8	19	32	12	9	2	8	8	0	0	0	0	0	0	SP
453	1956	PhD	Widow	$35,340.00	1	1	6/29/14	1	27	0	12	0	1	5	2	2	0	3	5	0	0	0	0	0	0	SP
9687	1975	Graduation	Single	$73,170.00	0	0	5/31/14	1	184	174	256	50	30	32	1	5	4	6	2	0	0	0	0	0	0	CA
8890	1971	PhD	Divorced	$65,808.00	1	1	5/30/14	1	155	7	80	13	7	10	3	5	1	5	6	0	0	0	0	0	0	SP
9264	1986	Graduation	Married	$79,529.00	0	0	4/27/14	1	423	42	706	73	197	197	1	4	8	9	2	0	0	0	0	0	0	CA
5824	1972	PhD	Together	$34,578.00	2	1	4/11/14	1	7	0	1	0	0	0	1	1	0	2	6	0	0	0	0	0	0	AUS
5794	1974	PhD	Married	$46,374.00	0	1	3/17/14	1	408	0	21	0	0	17	3	7	1	7	8	0	1	0	1	0	1	IND
3068	1990	Graduation	Married	$18,351.00	0	0	10/29/13	1	1	12	9	0	14	7	1	2	0	3	7	0	0	0	0	0	0	SP
7962	1987	PhD	Single	$95,169.00	0	0	10/9/13	1	1285	21	449	106	20	20	1	4	3	4	1	0	0	1	1	0	1	SP
2681	1984	2n Cycle	Married	$65,370.00	0	0	8/1/13	1	71	22	112	138	89	29	1	2	3	13	1	0	0	0	0	0	0	SP
10141	1960	Master	Divorced	$39,228.00	0	0	5/10/13	1	7	1	6	0	3	3	1	0	0	3	4	0	0	0	0	0	0	SA
3725	1961	PhD	Single	$84,865.00	0	0	5/9/13	1	1248	16	349	43	16	16	1	2	4	9	4	0	1	1	1	1	1	SP
3767	1968	Graduation	Married	$61,314.00	0	1	4/25/13	1	378	0	189	97	172	172	2	5	5	12	3	0	0	0	0	0	0	SP
5585	1972	Graduation	Single	$21,359.00	1	0	4/20/13	1	12	2	17	6	1	10	2	2	0	3	8	0	0	0	0	0	1	CA
7030	1955	PhD	Married	$66,465.00	0	1	3/30/13	1	1200	0	204	38	29	14	3	11	9	12	6	0	0	0	1	0	0	SP
1524	1983	2n Cycle	Single	$81,698.00	0	0	3/1/13	1	709	45	115	30	160	45	1	8	2	5	5	0	0	0	1	0	1	SP
3657	1986	Graduation	Single	$39,146.00	1	0	2/14/13	1	94	1	33	13	12	12	3	4	0	4	8	0	0	0	0	0	0	SP
5740	1970	2n Cycle	Divorced	$25,959.00	1	1	2/14/13	1	4	2	12	7	5	26	2	1	2	2	6	0	0	0	0	0	1	SP
9595	1961	Graduation	Together	$64,260.00	0	0	1/11/13	1	539	169	816	20	0	30	1	4	5	4	3	0	0	0	0	0	1	SP
3158	1973	Graduation	Married	$32,300.00	1	0	1/3/13	1	13	3	6	6	5	6	1	1	0	3	8	0	0	0	0	0	0	SP
5114	1965	Master	Married	$74,806.00	0	1	12/19/12	1	670	9	249	0	28	9	2	5	4	5	4	0	0	0	0	0	0	AUS
340	1970	Graduation	Divorced	$72,967.00	0	1	12/15/12	1	158	35	179	0	0	125	2	7	2	8	5	1	0	0	0	0	1	GER
8805	1960	Graduation	Single	$48,904.00	0	1	12/2/12	1	283	10	38	0	13	27	4	7	2	4	8	0	0	0	0	0	0	US
1241	1984	2n Cycle	Married	$14,796.00	1	0	9/17/12	1	13	3	8	7	4	16	2	1	0	3	9	0	0	0	0	0	1	GER
1402	1954	Master	Married	$66,991.00	0	0	9/11/12	1	496	36	460	189	60	12	3	4	8	6	3	0	0	0	0	0	0	GER
7264	1978	2n Cycle	Single	$52,195.00	2	1	5/12/14	2	12	0	4	0	0	1	1	1	0	2	8	0	0	0	0	0	0	SA
1619	1956	Graduation	Married	$90,369.00	0	0	4/28/14	2	292	51	981	224	23	17	1	4	6	6	1	0	0	0	0	0	1	SP
6398	1974	Basic	Married	$18,393.00	1	0	3/29/14	2	7	10	13	16	0	4	2	3	0	3	8	0	0	0	0	0	0	SP
1857	1952	Graduation	Single	$47,139.00	1	1	3/6/14	2	46	0	12	0	2	23	2	2	1	2	7	0	0	0	0	0	1	SP
4877	1973	Master	Married	$38,576.00	0	1	3/4/14	2	34	0	7	0	0	0	1	1	0	3	7	0	0	0	0	0	0	IND
3066	1975	PhD	Together	$61,905.00	0	1	2/4/14	2	167	0	43	6	2	13	2	4	2	4	5	0	0	0	0	0	0	SA
10286	1962	Graduation	Married	$83,715.00	0	0	2/3/14	2	318	8	407	150	35	8	1	2	8	13	0	0	0	0	0	0	0	SA
1992	1964	Graduation	Married	$60,597.00	0	1	1/1/14	2	522	0	257	32	16	66	4	2	2	8	7	0	0	0	1	0	1	SP
4246	1982	Master	Single	$6,560.00	0	0	12/12/13	2	67	11	26	4	3	262	0	1	0	1	17	0	0	0	0	0	0	SP
10623	1961	Master	Together	$48,330.00	0	1	11/15/13	2	28	0	4	0	0	0	1	1	0	3	5	0	0	0	0	0	0	SP
4867	1968	PhD	Single	$38,236.00	1	1	9/20/13	2	58	0	18	2	0	10	4	3	0	4	7	0	0	0	0	0	0	SP
3112	1977	Master	Married	$22,701.00	1	0	9/5/13	2	2	4	9	0	4	5	1	1	0	3	5	0	0	0	0	0	0	SP
4865	1974	Master	Divorced	$53,367.00	1	1	8/31/13	2	229	7	140	10	3	11	7	5	1	8	7	0	0	0	0	0	1	AUS
6287	1986	Graduation	Together	$34,728.00	1	0	7/30/13	2	14	0	16	0	0	6	1	1	1	2	6	0	0	0	0	0	1	SP
4405	1956	Master	Married	$63,915.00	0	2	7/30/13	2	622	7	115	30	0	15	2	6	3	12	5	0	0	0	0	0	0	AUS
5332	1960	2n Cycle	Married	$82,504.00	0	0	7/27/13	2	362	50	431	134	35	54	1	3	6	7	1	0	0	0	0	0	0	IND
1519	1972	PhD	Single	$38,578.00	1	1	6/22/13	2	38	4	22	3	3	3	3	3	0	3	8	0	0	0	0	0	1	SP
9080	1972	PhD	Single	$38,578.00	1	1	6/22/13	2	38	4	22	3	3	3	3	3	0	3	8	0	0	0	0	0	0	SP
1772	1975	PhD	Married	$79,174.00	0	0	1/11/13	2	1074	37	518	193	92	129	1	5	6	7	2	0	0	1	1	0	1	CA
5341	1962	2n Cycle	Divorced	$81,975.00	0	1	1/5/13	2	983	76	184	180	138	27	1	6	3	4	7	0	0	1	0	0	0	SA
5510	1977	Master	Married	$43,263.00	0	1	11/21/12	2	262	6	61	0	10	102	3	5	2	6	5	0	0	0	0	0	0	SP
3887	1970	Graduation	Single	$27,242.00	1	0	11/11/12	2	3	17	26	20	1	39	2	2	0	3	9	0	0	0	0	0	1	IND
7022	1971	Graduation	Married	$76,445.00	1	0	9/28/12	2	739	107	309	140	80	35	1	2	5	13	6	0	0	0	0	0	0	SA
9999	1965	Graduation	Together	$75,276.00	0	0	9/27/12	2	610	105	125	137	42	21	1	9	4	9	5	0	0	0	0	0	0	SP
10352	1963	Graduation	Widow	$34,213.00	1	1	9/7/12	2	50	4	28	6	3	26	3	3	1	2	9	0	0	0	0	0	1	SA
7919	1976	PhD	Together	$72,335.00	0	0	8/13/12	2	1285	105	653	28	21	0	1	10	4	8	8	0	0	0	0	0	1	SP
4114	1964	Master	Married	$79,143.00	0	0	8/11/12	2	650	37	780	27	167	32	1	6	9	13	3	0	0	0	0	0	0	AUS
7990	1947	Graduation	Married	$27,469.00	0	0	8/2/12	2	9	1	2	3	2	0	1	0	0	3	6	0	0	0	0	0	0	CA
9888	1969	Graduation	Together	$68,695.00	0	0	6/25/14	3	458	81	356	106	50	40	1	4	4	7	2	0	0	0	0	0	0	SP
4399	1969	Graduation	Together	$68,695.00	0	0	6/25/14	3	458	81	356	106	50	40	1	4	4	7	2	0	0	0	0	0	0	CA
4452	1957	Graduation	Single	$50,388.00	0	1	5/28/14	3	292	6	37	0	3	34	4	6	1	6	7	0	1	0	1	0	1	GER
4785	1970	PhD	Together	$77,622.00	0	2	4/14/14	3	520	7	154	19	0	14	2	6	3	11	3	0	0	0	0	0	0	SA
8461	1962	Graduation	Divorced	$46,102.00	2	1	3/10/14	3	14	0	1	0	0	1	1	1	0	2	7	0	0	0	0	0	0	SP
3878	1980	2n Cycle	Single	$31,859.00	1	0	2/27/14	3	3	4	7	15	8	11	1	1	0	3	7	0	0	0	0	0	0	SP
9612	1987	2n Cycle	Single	$23,830.00	0	0	2/7/14	3	1	8	6	4	8	16	1	1	0	3	7	0	0	0	0	0	0	SP
4098	1973	Graduation	Married	$24,639.00	1	1	1/28/14	3	20	3	16	0	4	1	3	2	0	4	6	0	0	0	0	0	0	AUS
158	1945	PhD	Together	$71,604.00	0	0	11/17/13	3	345	53	528	98	75	97	1	8	3	5	4	1	0	0	0	0	1	SP
3896	1984	Graduation	Married	$27,255.00	1	0	11/7/13	3	22	1	11	0	1	2	1	1	0	3	7	0	0	0	0	0	0	SP
9970	1977	Graduation	Together	$55,375.00	0	1	10/17/13	3	42	11	57	10	28	14	1	1	1	6	2	0	0	0	0	0	0	CA
4002	1960	PhD	Married	$77,037.00	0	1	10/13/13	3	463	96	333	168	53	10	1	7	7	12	3	0	0	0	0	0	0	SP
10914	1970	Graduation	Single	$24,163.00	1	1	10/12/13	3	4	1	7	2	1	2	2	1	0	3	4	0	0	0	0	0	0	SP
7279	1969	PhD	Together	$69,476.00	0	0	9/30/13	3	260	86	559	63	9	67	1	4	6	4	2	0	0	0	0	0	0	US
10582	1979	Graduation	Married	$72,063.00	0	1	7/3/13	3	180	32	348	76	32	90	2	5	2	12	2	0	0	0	0	0	0	GER
4470	1962	Master	Married	$58,646.00	0	1	6/10/13	3	62	1	44	6	5	22	1	2	1	4	4	0	0	0	0	0	0	SP
6183	1962	Master	Married	$58,646.00	0	1	6/10/13	3	62	1	44	6	5	22	1	2	1	4	4	0	0	0	0	0	0	GER
6379	1949	Master	Widow	$47,570.00	1	1	5/29/13	3	67	1	20	0	2	31	3	2	2	2	7	0	0	0	0	0	1	US
8601	1980	Graduation	Married	$80,011.00	0	1	4/29/13	3	421	76	536	82	178	102	2	8	6	5	4	0	0	0	0	0	0	AUS
4827	1956	PhD	Single	$54,998.00	0	1	3/10/13	3	154	22	202	39	30	8	5	4	2	9	4	0	0	0	0	0	1	SP

Column Guide

Variable	Description
ID	Customer's unique identifier
Year_Birth	Customer's birth year
Education	Customer's education level
Marital_Status	Customer's marital status
Income	Customer's yearly household income
Kidhome	number of small children in customer's household
Teenhome	no of teenagers in customer's house
Dt_Customer	Date of customer's enrollment with the company
Recency	number of days since the last purchase
MntWines	amount spent on wine in last 2 years
MntFruits	amount spent on fruits in last 2 years
MntMeatProducts	amount spent on meat products in last 2 years
MntFishProducts	amount spent on fish products in last 2 years
MntSweetProducts	amount spent on sweet products in last 2 years
MntGoldProds	amount spent on gold in last 2 years
NumDealsPurchases	no of purchases made with discount
NumWebPurchases	no of purchases made through company's website
NumCatalogPurchases	no of purchases made using catelogue
NumStorePurchases	no of purchases made directly in store
NumWebVisitsMonth	no of visits to company's website in the last month
AcceptedCmp3	1 if the customer accepted the offer in the 3rd campaign, 0 otherwise
AcceptedCmp4	1 if the customer accepted the offer in the 4th campaign, 0 otherwise
AcceptedCmp5	1 if the customer accepted the offer in the 5th campaign, 0 otherwise
AcceptedCmp1	1 if the customer accepted the offer in the first campaign, 0 otherwise
AcceptedCmp2	1 if the customer accepted the offer in the 2nd campaign, 0 otherwise
Response	1 if the customer accepted the offer in the last campaign, 0 otherwise
Complain	1 if customer complained in the last 2 years
Country	Customer's location

Data Import and Inspection

After importing the data, examine variables such as Dt_Customer and Income to verify their accurate importation.


  # Python Data Science Libraries we will use
  import pandas as pd
  import numpy as np
  import plotly.graph_objs as go
  import matplotlib.pyplot as plt
  import seaborn as sns
  from scipy.stats import ttest_ind
  import plotly.express as px
  import json
  import plotly.utils


  data = pd.read_csv('marketing_data.csv')
  data.columns = data.columns.str.strip() 
  data.head()

  print(data['Income'].head(10))
  print(data['Income'].dtype)

  0    $84,835.00 
  1    $57,091.00 
  2    $67,267.00 
  3    $32,474.00 
  4    $21,474.00 
  5    $71,691.00 
  6    $63,564.00 
  7    $44,931.00 
  8    $65,324.00 
  9    $65,324.00 
  Name: Income, dtype: object
  object

  # Convert Income to string and clean it BEFORE any numeric conversion 
  df['Income'] = df['Income'].astype(str).str.replace('$', '', regex=False).str.replace(',', '', regex=False)

  # Convert to numeric
  df['Income'] = pd.to_numeric(df['Income'], errors='coerce')

  data['Dt_Customer'] = pd.to_datetime(data['Dt_Customer'], format='%m/%d/%y')

  print(data['Dt_Customer'].head())
  0   2014-06-16
  1   2014-06-15
  2   2014-05-13
  3   2014-05-11
  4   2014-04-08
  Name: Dt_Customer, dtype: datetime64[ns]

  data[['Dt_Customer', 'Income']].head()
  Dt_Customer	Income
  0	2014-06-16	84835.0
  1	2014-06-15	57091.0
  2	2014-05-13	67267.0
  3	2014-05-11	32474.0
  4	2014-04-08	21474.0

Missing Value Imputation

There are missing income values for some customers. To address this, we assume customers with similar education and marital status tend to have comparable yearly incomes. We impute missing income values using the group mean based on these two variables. It's also necessary to ensure that `Education` and `Marital_Status` categories are cleaned before imputation.


    # Checking Education and Marital_Status unique values for cleaning
    print(data['Education'].unique(), data['Marital_Status'].unique())
    
    # Output should look like this:
    # array(['Graduation', 'PhD', '2n Cycle', 'Master', 'Basic'], dtype=object),
    # array(['Divorced', 'Single', 'Married', 'Together', 'Widow', 'YOLO',
    #        'Alone', 'Absurd'], dtype=object)
    
    # Imputing missing Income values based on Education and Marital_Status group mean
    data['Income'] = data.groupby(['Education', 'Marital_Status'])['Income'].transform(lambda x: x.fillna(x.mean()))

Feature Engineering

This step derives new features that capture key behavioral and demographic patterns:

total_children: sum of children at home (Kidhome + Teenhome)
Age: derived from the customer's year of birth
total_spending: sum of all product category expenditures
total_purchases: total purchases across all purchase channels


  # Create total number of children
  data['total_children'] = data['Kidhome'] + data['Teenhome']

  # Derive customer's age
  data['Age'] = 2025 - data['Year_Birth']

  # Calculate total spending across product categories
  spending_cols = ['MntWines', 'MntFruits', 'MntMeatProducts', 
                  'MntFishProducts', 'MntSweetProducts', 'MntGoldProds']
  data['total_spending'] = data[spending_cols].sum(axis=1)

  # Calculate total purchases across channels
  purchase_cols = ['NumWebPurchases', 'NumCatalogPurchases', 'NumStorePurchases']
  data['total_purchases'] = data[purchase_cols].sum(axis=1)

Exploratory Data Analysis (Income) & Outlier Treatment

To better understand the distribution of income values and identify potential anomalies, we use boxplots and histograms. Outliers are detected using the interquartile range (IQR) method. Income values that fall outside of the IQR range are capped to the boundary limits to reduce skewness and improve robustness.


  # Calculate IQR bounds
  Q1 = data['Income'].quantile(0.25)
  Q3 = data['Income'].quantile(0.75)
  IQR = Q3 - Q1
  lower = Q1 - 1.5 * IQR
  upper = Q3 + 1.5 * IQR

  # Save original income values
  data['Income_Original'] = data['Income']

  # Apply outlier capping
  data['Income'] = np.where(data['Income'] > upper, upper,
                  np.where(data['Income'] < lower, lower, data['Income']))

Outlier Treatment: Age

Similar to income, age can also contain extreme values that may distort the analysis. We apply outlier treatment using the IQR method to cap ages outside a reasonable range. This ensures more stable statistical modeling and visualization.


  # Save original Age values
  data['Age_Original'] = data['Age']

  # Calculate IQR for Age
  Q1 = data['Age'].quantile(0.25)
  Q3 = data['Age'].quantile(0.75)
  IQR = Q3 - Q1
  lower = Q1 - 1.5 * IQR
  upper = Q3 + 1.5 * IQR

  # Apply outlier capping
  data['Age'] = np.where(data['Age'] > upper, upper,
              np.where(data['Age'] < lower, lower, data['Age']))

To better illustrate the impact of IQR-based outlier treatment on the Age variable, we display separate histograms before and after capping. Notice that extremely high values (e.g., age 100+) are removed in the post-treatment view.

Feature Encoding & Correlation

We apply feature engineering to prepare the data for modeling. First, we encode Education using an ordinal scale. Then, we group unusual values in Marital_Status under "Other", and apply one-hot encoding to convert this categorical field into binary indicators. Finally, we generate a correlation matrix to understand how features relate to each other.


  # Ordinal encoding for Education
  edu_order = {'Basic': 0, '2n Cycle': 1, 'Graduation': 2, 'Master': 3, 'PhD': 4}
  data['Education_encoded'] = data['Education'].map(edu_order)

  # Group rare marital status categories
  data['Marital_Status_Clean'] = data['Marital_Status'].replace({
      'Absurd': 'Other',
      'YOLO': 'Other',
      'Alone': 'Other'
  })

  # One-hot encoding
  data = pd.get_dummies(data, columns=['Marital_Status_Clean'], prefix='Marital')

Focused Correlation Heatmap

Rather than examining all 30+ variables at once, we focus on a curated set of key features such as age, income, education level, and purchase behavior. This cleaner correlation matrix makes it easier to identify important relationships, such as how income relates to total spending, or how age correlates with recency or number of children.


  # Select features most relevant to behavior and segmentation
  selected_features = [
      "Age", "Income", "Education_encoded", "total_spending",
      "total_purchases", "total_children", "Recency", "Complain"
  ]

  # Compute correlation matrix for just these variables
  corr = data[selected_features].corr()

Hypothesis A: Older Individuals Prefer In-Store Shopping

We test the idea that older individuals may have lower technological proficiency and therefore prefer to shop in-store. We analyze the correlation between age and the number of store purchases, and compare purchase behavior across age groups.


# Correlation between Age and NumStorePurchases
data['AgeGroup'] = pd.cut(data['Age'], bins=[18, 30, 45, 60, 75, 100], labels=['18–30', '31–45', '46–60', '61–75', '76+'])
correlation = data[['Age', 'NumStorePurchases']].corr()

Correlation between Age and Store Purchases: 0.1344

ℹ️ There's no strong evidence of a negative relationship based on correlation.

Hypothesis B: Parents prefer web purchases

Individuals with children may face time constraints and be more price-sensitive. To explore whether they prefer online shopping, we grouped customers by number of children and compared their web purchase activity. This is visualized using a boxplot and scatterplot, and evaluated statistically using correlation.


  # Create total_children feature
  df['total_children'] = df['Kidhome'] + df['Teenhome']
  
  # Filter relevant fields
  df_b = df[['total_children', 'NumWebPurchases']].dropna()
  
  # Correlation between children and web purchases
  corr = df_b.corr().iloc[0, 1]
  print(f"Correlation: {corr:.4f}")

📈 Correlation: -0.1464
The correlation between number of children and number of web purchases is low, suggesting no strong linear relationship. However, boxplots show some variation in behavior among parents.

Hypothesis C: Higher education leads to higher spending

Customers with higher education levels may have more purchasing power and therefore spend more on average. This analysis groups customers by education level and compares their total spending through a boxplot and bar chart.


  # Ensure 'total_spending' exists
  df['total_spending'] = df[['MntWines', 'MntFruits', 'MntMeatProducts',
                            'MntFishProducts', 'MntSweetProducts', 'MntGoldProds']].sum(axis=1)

  # Group by education and calculate mean
  edu_spending = df.groupby('Education')['total_spending'].mean()
  # print(edu_spending)

Hypothesis D: Do U.S. customers make more purchases than the rest of the world?

This hypothesis explores whether customers from the United States significantly outperform international customers in terms of total purchases. A boxplot compares purchase volumes, and a t-test checks whether the difference is statistically significant.


from scipy.stats import ttest_ind

us = df[df['Country'] == 'USA']['total_purchases']
non_us = df[df['Country'] != 'USA']['total_purchases']

t_stat, p_value = ttest_ind(us, non_us, equal_var=False)
# print(f"T-stat: {t_stat:.4f}, P-value: {p_value:.4f}")

US Average: nan
Non-US Average: 12.54
T-statistic: nan
P-value: nan
The difference is not statistically significant. There's no strong evidence that US customers purchase more than international ones.

Business Insights Visualizations

This chart ranks products based on total customer spending. Wines lead revenue, indicating strong customer preference.

Does age influence how likely a person is to accept a marketing campaign? This scatterplot helps us visualize any patterns.

Knowing where campaigns are most accepted helps optimize international marketing strategy. This bar chart shows campaign acceptance volume by country.

Analyzing average spending by family size reveals how children impact household spending behavior.

Which education groups are most likely to lodge complaints? This visualization breaks down complaint rates by education level.

Marketing Campaign Analysis Report

Executive Summary

This project analyzes customer behavior using the Four Ps of Marketing: Product, Price, Place, and Promotion through interactive visual analytics. A combination of data cleaning, feature engineering, exploratory data analysis, and hypothesis testing was used to deliver data-driven marketing insights.

Problem Objective

Explore a structured marketing dataset and uncover how demographics, customer behavior, and campaign responses influence marketing performance. The ultimate goal is to help businesses personalize campaigns, target the right audience, and improve ROI.

Methodology

Data cleaning and missing value imputation for income
Feature engineering: age, total children, total purchases, total spending
Outlier detection and treatment using boxplots and histograms
Encoding categorical variables for modeling and analysis
Interactive correlation heatmaps and hypothesis testing (A–D)
Business insight visualizations embedded in a Flask web application

Key Insights

Product: Wines generate the most revenue; fruits and sweets rank lowest.
Price: Higher income strongly correlates with increased total spending.
Promotion: Younger users and US customers respond more, but not always with higher purchases.
Complaints:"2n Cycle" and Graduation education groups show higher complaint rates.
Geography: Spain leads in campaign acceptance volumes.

Hypothesis Testing Summary

H1: Older customers prefer in-store shopping — supported with moderate correlation.
H2: More children → fewer online purchases — confirmed with negative correlation.
H3: Age impacts campaign response — supported; acceptance drops with age.
H4: US significantly outperforms others in responses — rejected; no significant difference.

Business Recommendations

Double down on high-margin products like wine and gold items
Segment by age and income for personalized marketing strategies
Track and address complaints in education segments with higher dissatisfaction
Optimize multi-channel strategies: online and in-store coexist
Continue A/B testing on regional campaigns for effectiveness

Conclusion

Through interactive visualizations and hypothesis-driven exploration, this analysis empowers strategic marketing decisions. The combination of behavioral patterns and campaign responsiveness builds a solid foundation for customer-centric planning.

Download CSV Dataset Download Jupyter Notebook