This online book is about data management, storage and manipulation tools common in data science. Application to real scenarios will help to understand deeper concepts of data science.
Data Science refers to an emerging area of studies concerned with the collection, storage, preparation, analysis, visualization, analytics, reporting, and management of large collections of data. Much of the data in the world is non-numeric and unstructured which means that the data are not arranged in rows and columns. Words, lists, photographs, sounds, and other kinds of information appears to be fundamental data that need to be analysed to elaborate predictive models.
Data is being collected from many different platforms as connected devices (watches, wristbands, electronic scales, ...) Bank cards, shopping center fidelity cards, smartphones, mobile apps, POS systems, e-commerce sites, medical imagery, etc. With the ability to collect and analyze more data, and more types of data, than ever before, businesses are in an unprecedented position to quantify and predict what works, what does not work, and why. Data science is an interdisciplinary field larger than statistics, even if much of the theoretical basis of data science comes from statistics, it also involves other different skills as programmation, business intelligence, sciences, etc.
Typical predictive analytic goals include as different subjects as predicting which medical treatment will success in recovering from cancers, which part of brain will be active when completing different tasks or should be trained to fight specific pathologies, with products and services will be better aligned with customer needs and desires, who will win next election, which advertisements will be the most clicked on, etc.
The data scientist is responsible for acquiring the data, managing the data, choosing the modeling technique, writing the code, and verifying the results. This online book focuses on practical skills required for a data scientist.