Big Data without analysis is hardly anything but dead weight. But how to analyse it? Finding algorithms to do so is one of the Data Scientist’s jobs. However, we would like to not only explore our data, but also automatise the process by building systems that analyse our data for us. A solution should enable research, meet industry demands and enable continuous delivery of technology transfer.
For this we need a Big Data Science Architecture. Why? Because in projects, Big Data and Data Science can not be handled separately, since they influence each other. For their complexities applies – so to speak – Big Data Science ≠ Big Data + Data Science, but Big Data Science = Big Data · Data Science. (Luckily the gain is also a multiplication.) This complexity boost arises from the clash of the two different worlds of scientific research programming (Data Science) and enterprise software engineering (Big Data). The former thrives on explorative experiments which are often messy, ad hoc and uncertain in their findings. The later however requires code quality and fail-safe operation. In industrial settings those are achieved with well defined processes emphasizing access control and automated testing and deployment.
We present a blue print for a Big Data Science Architecture. It includes data cleaning, feature derivation and machine learning, using Batch and Real-time engines. It spans the entire lifecycle with three environments: Experiments, close-to-life-tests, life-operations, enabling creativity while ensuring fail-safe operation. It takes the needs of all three – data scientist, software engineers and operation administrators – into account.
Data can be creatively explored in the experimental environment. Thanks to strict read governance, even if things get messy, no critical systems are endangered. After algorithms are developed, a technology transfer to the test environment takes place, which is build the same as the life-operations environment. There the algorithm is adapted to run in automated operations and tested thoroughly. On acceptance the algorithms are deployed to life-operations.
Referent: Dr. Rick Moritz, WidasConcepts Unternehmensberatung GmbH
Nach seiner Dissertation bei der France Télécom Orange in Meylan (Frankreich), stieg Herr Dr. Rick Moritz 2014 als Consultant Big Data bei WidasConcepts ein und entwickelte sich in kürzester Zeit zum Senior Consultant Big Data mit dem Schwerpunkt Data Science. Sein Aufgabengebiet umfasst unter anderem die Analyse und Virtualisierung großer Datensätze, sowie die Entwicklung von Data Science Konzepten. Als innovatives IT-Consulting Unternehmen unterstützt WidasConcepts seine Kunden dabei, Geschäftsprozesse erfolgreich zu gestalten. Ob Big Data, Internet of Things oder Mobile- und Websolutions – WidasConcepts liefert moderne und zukunftssichere Konzepte und Lösungen!