I started with learning the programming language Python, and started with something easy:
and of course talking with friends who uses Python.
‘Data Science’, ‘Machine Learning’, ‘Data Mining’, may seem like big confusing terms to a biologist. But once you read into it a little bit, we biologist are actually using ‘Data Science’ on a daily basis, albeit as what you can call a rather rudimentary form of it.
For example, when we want to do a Western blot, first we need to determine the protein concentration of each of our samples. We do that by finding a standard curve:
This is a very simplified version of ‘Machine Learning’ or alternatively called ‘Statistical Learning’ - which is to have a ‘training’ data set with known predictor values (color intensity) and response values (protein concentration). The standard curve (protein concentration versus color intensity) is a model that can then be used to predict the protein concentration of unknown samples!
The fun part, then, is learning about all those other mathematical and statistical models that can be use on all sorts of data. And when the predictor value is not just one, but tens of them, used to predict one outcome.