Posts

Bank Marketing Audit

Table of Contents Introduction Data Quality Analysis Loans Customers Segmentation Law 26.951 Results Presence of Missing Values in the Dataset Presence of Missing Values in the Dataset Low Success Rate in Marketing Campaigns Risk of Default Among Borrowers Non-Compliance with Law 26.951 Introduction Language used: Python Libraries used Library Description pandas Data manipulation and analysis to work with structured data numpy For numerical operations to work on large and multi-dimensional arrays and matrices matplotlib....

Panel data analysis for Happiness reports (2015-2020)

Table of Contents Introduction Exploratory Data Analysis Preprocessing Data Modelling and Testing Conclusion Introduction Language used: R Description of the dataset and variables, refer to: http://jfeggio.github.io/posts/whr1/ Goal: Identify if there is a positive correlation between the variable log_gdp and the reported happiness score (life_ladder) Exploratory Data Analysis Variable Life ladder among all years The histogram has a symmetrical distribution centered around 6.7. The majority of observations fall within the range of 6....

House Prices Prediction with Regression Modelling and Features Selection

Table of Contents Introduction Data Wrangling Model Introduction Language used: Python Goal: Prediction of house prices. Data used: Dataset from Properati website (https://www.properati.com.ar/). Two datasets: one for training (dataframe: dfef) and one for testing (dataframe: dfp) Link to the Dataset: https://www.kaggle.com/datasets/jluza92/argentina-properati-listings-dataset-20202021/data (1gb) Libraries used Library Description pandas Data manipulation and analysis to work with structured data numpy For numerical operations to work on large and multi-dimensional arrays and matrices sklearn used for machine learning algorithms for classification, regression, clustering, dimensionality reduction matplotlib....

Use of Anglicisms on Spanish-speaking Latin American subreddits on the social media platform Reddit

Table of Contents Introduction Data Wrangling Descriptive Analysis Frequency of English words by year (2016-2023) Named-entity recognition (NER) and Topic Modelling NER Topic Modelling Cluster Analysis K-Means Influence of US by year Introduction Language used: Python Goal: Analyse the frequency, the type and the related-topic of English words in the Latin American Spanish-speaking subreddits of the social media platform Reddit for the period 2016-2023. Finally, it will be taken into account the influence of economic relations with United States and the amount of tourists (>1....

Linear Programming - Investment funds

Table of Contents Introduction Mathematical Model Solution Introduction Language used: Excel (Solver add-on) Task to solve: Money to be invested: $5.000.000 Investment type Earning to be achieved (per year) Investment limit (milions) Consumer credit 7% 1 Corporative bonds 11% 1.5 Gold deposits 19% 2.5 Housing loans 12% 1.7 Additional Limits: Investment type Investment out of all funds Consumer credit Maximum 15% Housing loans & Gold deposits Minimum 5% Mathematical Model Decision Variables:...

Markov Chains - Collection Strategy

Table of Contents Introduction Mathematical Model Solution Introduction Language used: R (Markov Chains package) Task to solve: Analyze the Bad Payers situation. Transitional matrix with this components: Group A (good payers) 30% will stay in this category 30% will become a bad payer (Group B) The rest will pay all overdue Group B (bad payers) 10% goes to the default stage 20% pays the overdue 40% will stay in Group B 30% will go to Group A D (default): Default...

A Descriptive Analysis of Airbnb accomodations in Bologna

Language used: R Database used: “listing.csv” from the website http://insideairbnb.com/get-the-data/ composed by a mix of 18 qualitative and quantitative variables and 3896 cases. Packages used: dplyr: to obtain a general descriptive analysis of the data. ggplot2: to create graphs ggmap: to be able to show maps RColorBrewer: to assign colors to the heatmap visualization osmdata: to obtain the geographic coordinates of the city of Bologna hrbrthemes: to obtain more ggplot themes Variables description:...

World Happiness Report - A Cluster Analysis [1/2]

Table of Contents Introduction Data Wrangling Descriptive Analysis Outliers Cluster Analysis Number of Clusters Hierarchical Analysis Non-Hierarchical Analysis ANOVA Conclusion (PCA) Introduction Language used: R Database used: The World Happiness report from 2021 which is a global survey data to report how people evaluate their own lives in more than 150 countries worldwide. Packages used: dplyr: to obtain a general descriptive analysis of the data. corrplot: to create correlation matrix readxl: to import the database (....

World Happiness Report - A Principal Components Analysis [2/2]

Table of Contents Introduction Descriptive Analysis Outliers Principal Components Analysis Eigenvalues Loadings Correlation between components Components 1 (Quality of Life) and 2 (Economy/Corruption) Components 1 (Quality of Life) and 3 (Social Support/Corruption) Components 2 (Economy/Corruption) and 3 (Social Support/Corruption) Conclusion Introduction Please refer to the Introduction of “World Happiness Report - A Cluster Analysis [1/2]” as this is the second part of this work. Descriptive Analysis Please refer to the Descriptive Analysis of “World Happiness Report - A Cluster Analysis [1/2]” as this is the second part of this work....