Table of Contents

  1. Introduction
  2. Data Quality Analysis
  3. Results

Introduction

Language used: Python

Libraries used

Library Description
pandas Data manipulation and analysis to work with structured data
numpy For numerical operations to work on large and multi-dimensional arrays and matrices
matplotlib.pyplot Plotting library to create visualizations
seaborn Statistical data visualization tool for informative statistical graphics

Goal: audit the generation and compliance of the information contained in the dataset.

Observations:

  • Presence of missing values in the dataset
  • Low success rate in marketing campaigns
  • Risk of bankruptcy among borrowers
  • Non-compliance with Law 26.951

Dataset: Bank-full.csv, https://www.kaggle.com/code/vinicius150987/bank-full-machine-learning/input

The dataset used in this study on banking marketing was generated from direct marketing campaigns conducted by a banking institution in Portugal. These campaigns primarily involved telephone calls to customers with the aim of determining their interest in subscribing to a term deposit offered by the bank. The classification goal of this dataset is to predict whether a customer will subscribe to a term deposit or not (output variable “y”).

The dataset comprises a total of 16 input variables providing information about the bank customers and the last contact of the current campaign. These variables include the customer’s age, employment type, marital status, education, credit default status, average annual balance, existence of mortgage or personal loans, type of contact used, day and month of the last contact, duration of the last contact, the number of contacts made during this campaign, the number of days since the last contact from a previous campaign, the number of contacts made previously, and the outcome of the previous marketing campaign.

Tasks to be carried out:

  • Compilation of applicable regulations
  • Analysis of data quality
  • Verification of the existence of loans associated with individuals who may currently lack sufficient funds to meet them. This analysis considers factors such as employment status, age, and current balance.
  • Analysis of records to determine which contacted customers may be in violation of the provisions of Law 26.951*.

*Law 26.951 in Argentina establishes a “Do Not Call Registry” to protect individuals from unwanted telemarketing calls. Companies must check and respect the registry, facing penalties for non-compliance.

Data Quality Analysis

This database comprises a total of 45.211 records and 17 variables and there is not any missing data. A validation check is performed on the data’s validity in terms of its typology and range, ensuring consistency with common sense. For instance, it is verified that there are no ages below 18 years or minimum call durations of 0 seconds.

table

Missing Values:

The presence of a value labeled “unknown” has been identified in the data, which is replaced with “NaN,” indicating the absence of data.

Suggestion: It is recommended to fill in these missing values, as having the maximum amount of information about customers is essential. There is a particular emphasis on completing the data in the “contact” column, as it is necessary for establishing contact with the clients.

Variables % of missing data
Job 0,64%
Education 4,11%
Poutcome 81,75%
Contact 28,80%

Possible duplicated cases

A total of 2576 cases have been identified where customers share the same age, occupation, marital status, educational level, bank account balance, and loan and credit situation. It’s essential to note that this database does not provide sensitive data that would allow for a more detailed analysis of these repeated cases to confirm whether they are duplicates. Therefore, it is assumed that the records are unique and not duplicated.

Outliers previous marketing campaign (Graph 1) and current marketing campaign (Graph 2)

graph_1

Regarding the previous marketing campaign, it is worth to note that the highest number of calls was made to customers with a low annual balance, specifically those with a balance below 10.000. However, a particular case is observed where a married customer received more than 250 calls, which is significantly higher compared to other clients.

It has been observed that married men generally have higher balances in their bank accounts. Despite this, they have not been contacted with the same frequency as other customer groups. This suggests an opportunity for improvement in the contact strategy, as men with a good bank balance might be more inclined to invest more in the bank’s proposals.

In Graph 2, an analysis of bank balances in relation to the number of contacts during the current marketing campaign is conducted, taking into account the customers’ education level. The data distribution is more equitable, showing outliers in calls to people with not-so-high bank balances, as well as a lower number of contacts with clients who have higher balances.

Outcome of calls by Job Type graph_2

The group with the highest number of customers who opted to subscribe to a long-term deposit corresponds to those working in managerial positions. On the other hand, entrepreneurs were the category with the lowest success rate.

Age Group Success cases
18, 30 1138
40, 50 1019
50, 60 811
60, 70 284
70, 100 218

The age groups with the highest success were those between 18 and 30 years, followed by the group of 40 to 50 years. Suggestion: importance of directing future marketing campaigns towards these specific demographic groups.

Loans

% of job type with loans or mortgage

Job Type %
blue-collar 27,15%
management 18,67%
technician 16,47%
admin. 12,76%
services 10,95%
entrepreneur 3,65%
self-employed 3,09%
retired 2,50%
unemployed 2,11%
housemaid 1,75%
student 0,92%

Categories such as workers, managers, and technicians are the ones that have requested loans or mortgages the most. These professions are generally considered stable in terms of income and employment.

This finding suggests that individuals in these occupational categories have a higher need for financing to acquire real estate or cover other financial needs. Given the job stability associated with these professions, these clients are more likely to have the necessary resources to meet loan payments.

Suggestion: strategic targeting of loan or mortgage offers should be directed towards these occupational categories.

% of people with loans by marital status

Status %
married 60,73%
single 27,38%
divorced 11,89%

The 60.73% of clients with loans are married, 27.38% are single, and only 11.89% are divorced.

Customers Segmentation

Two groups based on the specified type of work and financial stability associated with certain occupational categories.

  • Group 1 is composed of the following jobs: ‘management’, ’technician’, ‘blue-collar’, ‘admin.’, ‘services’. These categories are considered more financially stable.

  • Group 2 is composed of the following jobs: ’entrepreneur’, ‘retired’, ‘self-employed’, ‘unemployed’, ‘housemaid’, ‘student’. Professions included in this group may represent a higher financial risk.

Loans by Age Group and Job Type

Individuals between 30 and 50 years old have a higher percentage of loan applications, indicating that they are likely to need more loans compared to other age groups. Additionally, clients in occupational Group 1 are more likely to apply for loans than those in Group 2, except in the age range of 60 to 70 years, where the trend is reversed.

Loans by Marital status, Age Group and Job Type

Regarding marital status, married individuals generally have more loan applications than divorced or single individuals, except in the age range of 60 to 70 years, where divorced individuals in occupational Group 2 have a higher percentage of loan applications.

Marital status, Balance and Job Type

graph_3

A significant portion of clients with active loans has a bank account balance lower than 10.000 (7204 cases). This finding suggests that this group of clients could be considered a risk factor as they have a limited liquidity level. Additionally, it is noted that some clients have negative balances in their accounts.

By observing the percentage of the possibility of defaulting on loan payments by age group and job type, it is noted that as individuals are younger, the proportion of loans in default tends to be higher. However, as people age, the proportion of loans in default decreases.

When analyzing the segmentation by occupational groups, it seems that the job type doesn’t have a significant impact on likelihood of defaulting on loan payments. Overall, the percentages of loans in default in the table are quite low, mostly below 1%. This suggests that the majority of people are paying their loans on time, regardless of their age or type of work.

Law 26.951

In the database 7.864 records classified as “Do not call” have been previously contacted, as reflected in the value -1 in the “pdays” column.

- No Contact Contact Total
Do not call 35116 7864 42980
Call 1838 393 2231
Total 36954 8257 45211

It appears to be a potential violation of Law 26.951 in 7.864 cases where customers registered in the “Do not call” registry received calls from the bank.

Suggestion: It is necessary to conduct a review to determine whether these customers were enrolled in the registry at the beginning of the campaign during which they received such contacts, considering that the registry is updated monthly. This review will assess whether there was any violation of the law by making calls to these registered customers. To avoid any future non-compliance, it is recommended to conduct regular checks to ensure that the database is always up-to-date with the latest registry data. This will ensure that marketing communications are conducted in accordance with established regulations, and potential breaches of the law are avoided.

To ensure compliance with the law and avoid regulatory violations, the decision was made to take a sample and verify whether the customers were enrolled in the registry before the marketing campaign. After conducting this verification, it was discovered that there were no customers registered in the registry at that specific time.

Results

Presence of Missing Values in the Dataset

Observations:

  • Missing values have been identified in the dataset, with particular significance in the “contact” variable, accounting for approximately 28% of cases.

Risk Level: Moderate Risk

Effects:

  • Lack of contact information hinders direct communication with clients, potentially limiting promotional opportunities and personalized service offerings.
  • Incomplete data may impact the quality and reliability of analyses and models applied to the data.

Recommendation:

  • Establish a process to contact clients and collect missing information, either through phone calls or other forms of communication.
  • If contact data is unavailable, consider using notifications via the application or platform used, requesting clients to complete their personal details.

Low Success Rate in Marketing Campaigns

Observations:

  • Clients with higher average balances have received fewer contacts compared to those with lower balances.
  • It is estimated that clients with higher average balances may show more receptiveness to new banking initiatives.

Risk Level: Low Risk

Effects:

  • Lack of contact with clients having higher average balances may limit the success levels of marketing campaigns, overlooking the investment potential of this segment.

Recommendation:

  • Increase the number of calls specifically targeting clients with good bank balances.
  • Clients with higher average balances may be more open to considering proposals from the bank, enhancing the effectiveness of marketing campaigns.
  • Strategically focus efforts on this segment, leveraging its investment potential and strengthening the relationship with the bank.

Risk of Default Among Borrowers

Observations:

  • It is noteworthy that 27.38% of loans have been granted to single clients, while 11.89% belong to divorced clients. This situation poses a potential risk, as these client categories live alone and may face difficulties in making payments in case of employment issues.
  • A lower percentage of loans has been granted to students, unemployed individuals, and homemakers, categories with a lack of fixed income, presenting a higher level of risk.

Risk Level: High Risk

Effects:

  • There is a possibility of not receiving the full loan repayment due to a lack of funds from these clients.

Recommendation:

  • To mitigate risk in these categories, it is recommended to request a guarantor when granting a loan to individuals belonging to any category considered “high risk.” This will provide greater security and support in case of financial difficulties on the part of the borrower.

Non-Compliance with Law 26.951

Observations:

  • It was identified that a total of 7,864 clients were contacted despite being registered on the “Do Not Call” list.

Risk Level: High Risk

Effects:

  • Through a sample of data prior to the campaign, it was found that at that time, these clients were not yet registered on the list, emphasizing the need for regular checks to comply with current legislation.

Recommendation:

  • To avoid incurring this risk, it is recommended to conduct regular checks of the “Do Not Call” registry. These periodic checks will verify the updating and accuracy of the list, thus preventing potential legal violations and ensuring compliance with regulations.