python

Bank Marketing Audit

Table of Contents Introduction Data Quality Analysis Loans Customers Segmentation Law 26.951 Results Presence of Missing Values in the Dataset Presence of Missing Values in the Dataset Low Success Rate in Marketing Campaigns Risk of Default Among Borrowers Non-Compliance with Law 26.951 Introduction Language used: Python Libraries used Library Description pandas Data manipulation and analysis to work with structured data numpy For numerical operations to work on large and multi-dimensional arrays and matrices matplotlib....

Panel data analysis for Happiness reports (2015-2020)

Table of Contents Introduction Exploratory Data Analysis Preprocessing Data Modelling and Testing Conclusion Introduction Language used: R Description of the dataset and variables, refer to: http://jfeggio.github.io/posts/whr1/ Goal: Identify if there is a positive correlation between the variable log_gdp and the reported happiness score (life_ladder) Exploratory Data Analysis Variable Life ladder among all years The histogram has a symmetrical distribution centered around 6.7. The majority of observations fall within the range of 6....

House Prices Prediction with Regression Modelling and Features Selection

Table of Contents Introduction Data Wrangling Model Introduction Language used: Python Goal: Prediction of house prices. Data used: Dataset from Properati website (https://www.properati.com.ar/). Two datasets: one for training (dataframe: dfef) and one for testing (dataframe: dfp) Link to the Dataset: https://www.kaggle.com/datasets/jluza92/argentina-properati-listings-dataset-20202021/data (1gb) Libraries used Library Description pandas Data manipulation and analysis to work with structured data numpy For numerical operations to work on large and multi-dimensional arrays and matrices sklearn used for machine learning algorithms for classification, regression, clustering, dimensionality reduction matplotlib....

Use of Anglicisms on Spanish-speaking Latin American subreddits on the social media platform Reddit

Table of Contents Introduction Data Wrangling Descriptive Analysis Frequency of English words by year (2016-2023) Named-entity recognition (NER) and Topic Modelling NER Topic Modelling Cluster Analysis K-Means Influence of US by year Introduction Language used: Python Goal: Analyse the frequency, the type and the related-topic of English words in the Latin American Spanish-speaking subreddits of the social media platform Reddit for the period 2016-2023. Finally, it will be taken into account the influence of economic relations with United States and the amount of tourists (>1....