This is my first blog post
Data Science Articles
Q: What do you think being a data scientist is about? $~$ A: I view data science as the intersection between mathematics/statistics and computer science. Specifically, I believe that data science is an interdisciplinary field concerned with the utilization of processes, methods, and systems to gain knowledge from data that is noisy, large, and either structured or unstructured.
Q: What do you see as the major duties and/or knowledge areas? $~$ A: I once heard a joke about data scientists being the janitors of the computing world, because of how much time they spend “cleaning” data. As data scientists spend a lot of time with high-volume/massive data sets, they are often doing large-scale data ingests and then cleaning and transforming the data into a form that is usable to perform analyses.
Q: What differences/similarities do you see between data scientists and statisticians? $~$ A: Both are concerned with modelling and creating accurate statistical models. However, the approach is slightly different. With respect to differences, I would say that I think that data scientists are a bit more “applied” in that they test a bunch of different models by evaluating the predictive accuracy of said models. They create various models from the data and see which one is the “best”. A statistician uses more theory to create the model and then simply uses data to verify whether or not their model was accurate. An analogy I’d use is that data scientists are like engineers, whereas statisicians are more like physicists.
Q: How do you view yourself in relation to these two areas? $~$ A: I view myself primarily as a statistician, but I often do “data science-y” tasks for my job. I’m working on Bayesian inferencing in model and model evaluation. I spend a lot of time writing proofs and then translating into pseudocode, which I then can write actual code from. However, I often have to clean data when I build models.