The following analysis is part of Udacity’s Data Analyst Nanodegree program and requires students to use Python’s Pandas and Numpy libraries to do some basic data analysis on Kaggle’s Titanic Passenger dataset.
My analysis focused first on general statistics that you would expect for a dataset like this (distribution of age, gender, class, etc.) and then went further into who had better survival rates between different groups. This project was very illuminative in that it gave some insight into the data analysis process and forced me to understand the limitations of my analysis and to be careful not to make too many assumptions about the data or about my conclusions.
In the future I would like to apply some classification algorithms to help predict which groups were more likely to have survived based on a series of characteristics.
The analysis can be found here in the repository.