Conclusion to the Course
Rina Deka 2023-07-24
what your current thoughts are in terms of using R for data science - do you think you’ll continue to use R going forward? Why or why not?
Absolutely! I kind of think in R more than I think in python, even though I use python much more than I do R. However, R is a statistician’s must-have language and really useful for plotting and various packages that are pretty well-documented (such as the BAS package if you’re a Bayesian).
what things are you going to do differently in practice now that you’ve had this course?
Documentation, documentation, documentation! This course for sure has helped me in my actual work and I’ve learned how important it is to create code that is readable with good documentation so that it’s usable for collaborators. This is especially the case for things related to Github and Gitlab (or anything git related). I’m also learning how to debug code more carefully as a result now, so issues are a little easier to resolve. This course has definitely helped me think more systemically with how I write and communicate my code. I also plan on utilizing github a lot more now that I’m using it more than I have in any other work situation.
what areas of statistics/data science are you thinking about exploring further?
The areas of statistics/data science that I’m thinking about exploring further currently is actually functional data analysis! We talked a little bit in this course about PCA (which is an unsupervised learning technique), and recently at work I discovered the concept of functional PCA which is a method for investigating the dominant modes of variation in functional data. For FPCA, we’re still concerned about dimension reduction, except instead of dealing with vectors, we are dealing with functions. Thus, the principal components will also be functions (ie, curves). This is really useful for some projects I’m doing for spectral signature detection. I’m also interested in exploring Hierarchical modelling and am currently doing a lot of things with Gaussian processes, which is a Bayesian method that’s kind of a “hot topic” in statistical learning; although rarely covered in classes about statistical (machine/deep) learning.