data wrangling

Panel data analysis for Happiness reports (2015-2020)

Table of Contents Introduction Exploratory Data Analysis Preprocessing Data Modelling and Testing Conclusion Introduction Language used: R Description of the dataset and variables, refer to: http://jfeggio.github.io/posts/whr1/ Goal: Identify if there is a positive correlation between the variable log_gdp and the reported happiness score (life_ladder) Exploratory Data Analysis Variable Life ladder among all years The histogram has a symmetrical distribution centered around 6.7. The majority of observations fall within the range of 6....

House Prices Prediction with Regression Modelling and Features Selection

Table of Contents Introduction Data Wrangling Model Introduction Language used: Python Goal: Prediction of house prices. Data used: Dataset from Properati website (https://www.properati.com.ar/). Two datasets: one for training (dataframe: dfef) and one for testing (dataframe: dfp) Link to the Dataset: https://www.kaggle.com/datasets/jluza92/argentina-properati-listings-dataset-20202021/data (1gb) Libraries used Library Description pandas Data manipulation and analysis to work with structured data numpy For numerical operations to work on large and multi-dimensional arrays and matrices sklearn used for machine learning algorithms for classification, regression, clustering, dimensionality reduction matplotlib....

Use of Anglicisms on Spanish-speaking Latin American subreddits on the social media platform Reddit

Table of Contents Introduction Data Wrangling Descriptive Analysis Frequency of English words by year (2016-2023) Named-entity recognition (NER) and Topic Modelling NER Topic Modelling Cluster Analysis K-Means Influence of US by year Introduction Language used: Python Goal: Analyse the frequency, the type and the related-topic of English words in the Latin American Spanish-speaking subreddits of the social media platform Reddit for the period 2016-2023. Finally, it will be taken into account the influence of economic relations with United States and the amount of tourists (>1....

World Happiness Report - A Cluster Analysis [1/2]

Table of Contents Introduction Data Wrangling Descriptive Analysis Outliers Cluster Analysis Number of Clusters Hierarchical Analysis Non-Hierarchical Analysis ANOVA Conclusion (PCA) Introduction Language used: R Database used: The World Happiness report from 2021 which is a global survey data to report how people evaluate their own lives in more than 150 countries worldwide. Packages used: dplyr: to obtain a general descriptive analysis of the data. corrplot: to create correlation matrix readxl: to import the database (....