I embarked on this project to get some experience with financial data and also get more experience with R. The dataset that I use for this analysis is a stripped down version of LendingClub’s complete loan data for loans issued between 2007 and 2015. LendingClub, a lending firm started in 2007, manages crowdfunding between disparate borrowers and investors. In this analysis, I propose that useful risk and return estimates can be ascertained from LendingClub’s public historical data.
I think this particular dataset was way more challenging than I had expected. The initial problem was deciding how to clean and transform te data based on what I wanted to know. Of course, if I wanted to know new things I would often go back and transform the data again – it was a cyclical process rather than linear. Another part of the challenge was wrapping my head around the large number of variables at my disposal. It’s tempting to want to look at a permutation of all the relationships that could exist, but I had to narrow my scope and accept that I might not understand everything in one swoop.
I did enjoy learning about the lending process and some of the intricacies of lending under the hood. This type of knowledge might be materially useful in the future. I also enjoyed using the R language and RStudio.
For future work, I could look more into how geography plays into the risk vs. reward paradigm. Eventually, understanding how to boost profitability from the investor side would be very useful.
Link to the analysis: http://rpubs.com/culight/310410