unsupervised machine learning

Use of Anglicisms on Spanish-speaking Latin American subreddits on the social media platform Reddit

Table of Contents Introduction Data Wrangling Descriptive Analysis Frequency of English words by year (2016-2023) Named-entity recognition (NER) and Topic Modelling NER Topic Modelling Cluster Analysis K-Means Influence of US by year Introduction Language used: Python Goal: Analyse the frequency, the type and the related-topic of English words in the Latin American Spanish-speaking subreddits of the social media platform Reddit for the period 2016-2023. Finally, it will be taken into account the influence of economic relations with United States and the amount of tourists (>1....

World Happiness Report - A Cluster Analysis [1/2]

Table of Contents Introduction Data Wrangling Descriptive Analysis Outliers Cluster Analysis Number of Clusters Hierarchical Analysis Non-Hierarchical Analysis ANOVA Conclusion (PCA) Introduction Language used: R Database used: The World Happiness report from 2021 which is a global survey data to report how people evaluate their own lives in more than 150 countries worldwide. Packages used: dplyr: to obtain a general descriptive analysis of the data. corrplot: to create correlation matrix readxl: to import the database (....

World Happiness Report - A Principal Components Analysis [2/2]

Table of Contents Introduction Descriptive Analysis Outliers Principal Components Analysis Eigenvalues Loadings Correlation between components Components 1 (Quality of Life) and 2 (Economy/Corruption) Components 1 (Quality of Life) and 3 (Social Support/Corruption) Components 2 (Economy/Corruption) and 3 (Social Support/Corruption) Conclusion Introduction Please refer to the Introduction of “World Happiness Report - A Cluster Analysis [1/2]” as this is the second part of this work. Descriptive Analysis Please refer to the Descriptive Analysis of “World Happiness Report - A Cluster Analysis [1/2]” as this is the second part of this work....