Kapitel 1

Data Scientist: The Sexiest Job of the 21st Centuryhttps://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
Top Trends in the Gartner Hype Cycle for Emerging Technologies 2017http://www.gartner.com/smarterwithgartner/top-trends-in-the-gartner-hype-cycle-for-emerging-technologies-2017

Kapitel 2

Rollen

Close look at Data Scientist vs Data Engineerhttp://www.techiexpert.com/close-look-data-scientist-vs-data-engineer/
The Data Science Venn Diagramhttp://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
How Do I Become a Data Scientist?https://advanceddataanalytics.net/2015/05/12/how-do-i-become-a-data-scientist/
The New Data Scientist Venn Diagramhttps://whatsthebigdata.com/2016/07/08/the-new-data-scientist-venn-diagram/

Online-Kurse und Ressourcen

Online-Kurse Courserahttps://www.coursera.org/
Online-Kurse edxhttps://www.edx.org/
Online-Kurse  und viele Bücher, die im O’Reilly Verlag erschienen sindhttps://www.safaribooksonline.com/
Online-Kurse Udacityhttps://eu.udacity.com/
https://www.udacity.com/
Online-Kurse Udemyhttps://www.udemy.com/

Weitere Informationen

SystemML (jetzt SystemDS)https://systemds.apache.org/
DevOps, und alle ziehen an einem Stranghttps://www.cloudcomputing-insider.de/devops-und-alle-ziehen-an-einem-strang-a-501139/
DevOps: Schluss mit den Grenzen zwischen Entwicklung und Operationshttps://de.atlassian.com/devops

Kapitel 3

Herausforderungen

Weshalb die meisten Big-Data-Projekte scheiternhttps://www.datacenter-insider.de/weshalb-die-meisten-big-data-projekte-scheitern-a-417085/
Woran Big-Data-Analysen wirklich scheiternhttps://www.bigdata-insider.de/woran-big-data-analysen-wirklich-scheitern-a-677594/
Berater scheitern an Data Analyticshttps://www.cio.de/a/berater-scheitern-an-data-analytics,3580190
BARC-Studie: Data-Preparation-Initiativen scheitern oft an Fachkräftemangel und fehlender Management-Unterstützunghttps://barc.de/news/barc-studie-data-preparation-initiativen-scheitern-oft-an-fachkraftemangel-und-fehlender-management-unterstutzung
Why Silicon Valley’s ‚Fail Fast‘ Mantra Is Just Hypehttps://www.forbes.com/sites/robasghar/2014/07/14/why-silicon-valleys-fail-fast-mantra-is-just-hype/#170c1a6d24bc

CRISP-DM

KDD, SEMMA and CRISP-DM: A parallel overviewhttp://recipp.ipp.pt/bitstream/10400.22/136/3/KDD-CRISP-SEMMA.pdf
CRISP-DM 1.0 – Step-by-step data mining guidehttps://www.the-modeling-agency.com/crisp-dm.pdf
The CRISP-DM User Guidehttps://s2.smu.edu/~mhd/8331f03/crisp.pdf
Why Continuous Learning is the key towards Machine Intelligencehttps://medium.com/@vlomonaco/why-continuous-learning-is-the-key-towards-machine-intelligence-1851cb57c308

SCRUM, KANBAN, Machine Learning Canvas

Scrum-Einführunghttp://scrum-master.de/Scrum-Einfuehrung
Scrum – Ein kurzer Blick auf die Verwendung des Scrum-Frameworks in der Softwareentwicklunghttps://de.atlassian.com/agile/scrum
Sind wir schon da? – Die Definition of Done (DOD)https://www.scrum.de/sind-wir-schon-da-die-definition-of-done-dod/
KANBAN Board Simulationhttp://www.kanbansim.org/

Cloud

Data Lakes and Analytics on AWShttps://aws.amazon.com/de/products/analytics/
Azure Analysis Serviceshttps://azure.microsoft.com/en-us/services/analysis-services/
Azure Big data and analyticshttps://azure.microsoft.com/en-us/solutions/big-data/
IBM Analytics Serviceshttps://www.ibm.com/cloud/analytics
IBM Analytics Enginehttps://www.ibm.com/cloud/analytics-engine
Data and analytics services on IBM cloudhttps://www.ibm.com/cloud/data
IBM Watson Studiohttps://www.ibm.com/de-de/cloud/watson-studio
Comparing Machine Learning as a Service: Amazon, Microsoft Azure, Google Cloud AIhttps://www.kdnuggets.com/2018/01/mlaas-amazon-microsoft-azure-google-cloud-ai.html
Simple Monthly Calculatorhttp://calculator.s3.amazonaws.com/index.html

Weitere Informationen

Was ist Business Intelligence – BI?https://www.bigdata-insider.de/was-ist-business-intelligence-bi-a-563185/
Data Science Plattform Kagglehttps://www.kaggle.com/

Kapitel 4

Die 9 V von Big Datahttps://blog.qsc.de/2016/08/die-9-v-von-big-data/
13 V’s in Big Data
Alternative: tdwi – The 10 Vs of Big Data
http://www.godatafy.com/tag/13-vs-in-big-data/
https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx
Was ist Big Data? – Eine Definition mit fünf Vhttps://blog.unbelievable-machine.com/was-ist-big-data-definition-f%C3%BCnf-v
Smart Data Newsletterhttps://www.digitale-technologien.de/DT/Redaktion/DE/Downloads/Publikation/SmartData_NL1.pdf?__blob=publicationFile&v=5
Attacking Machine Learning with Adversarial Exampleshttps://blog.openai.com/adversarial-example-research/
Aktuelles Schlagwort “Semi-strukturierte Daten”https://www.en.pms.ifi.lmu.de/publications/PMS-FB/PMS-FB-2001-9.pdf
The world’s most valuable resource is no longer oil, but datahttps://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data
Is Your Company’s Data Actually Valuable in the AI Era?https://hbr.org/2018/01/is-your-companys-data-actually-valuable-in-the-ai-era

Kapitel 5

Methoden

Top Data Science and Machine Learning Methods Used in 2017https://www.kdnuggets.com/2017/12/top-data-science-machine-learning-methods.html?utm_content=buffer8610e&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
Reguar Expressions 101https://regex101.com/
RegExrhttps://regexr.com/
Python Data Preparation Casehttps://www.kdnuggets.com/2017/09/python-data-preparation-case-files-group-based-imputation.html
Principal Component Analysishttp://setosa.io/ev/principal-component-analysis/
Introduction to Principal Components and Factor Analysisftp://statgen.ncsu.edu/pub/thorne/molevoclass/AtchleyOct19.pdf
A One-Stop Shop for Principal Component Analysishttps://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c
The Random Forest Algorithmhttps://towardsdatascience.com/an-implementation-and-explanation-of-the-random-forest-in-python-77bf308a9b76
Tankerkönighttps://creativecommons.tankerkoenig.de/
StatsModels – Statistics in Pythonhttp://www.statsmodels.org/
Ein Jahr Markttransparenzstelle für Kraftstoffe (MTS-K): Eine erste Zwischenbilanzhttps://www.bundeskartellamt.de/SharedDocs/Publikation/DE/Berichte/Ein_Jahr_MTS-K_Marginalsp.pdf?__blob=publicationFile&v=10
Forecasting: Principles and Practicehttps://www.otexts.org/fpp/
Predicting house value using regression analysishttps://towardsdatascience.com/regression-analysis-model-used-in-machine-learning-318f7656108a
Least-Squares Regressionhttps://faculty.elgin.edu/dkernler/statistics/ch04/4-2.html
Coursera – Spezialisierung Deep Learninghttps://www.coursera.org/specializations/deep-learning
Linear Regression: Implementation, Hyperparameters and their Optimizationshttp://pavelbazin.com/post/linear-regression-hyperparameters/#linear-regression-implementation-hyperparameters-and-their-optimizations
ScitKitLearn – Logistic Regressionhttp://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
What are kernels in machine learning and SVM and why do we need them?https://www.quora.com/What-are-kernels-in-machine-learning-and-SVM-and-why-do-we-need-them
Mahalanobis-Distanzhttp://www.statistics4u.com/fundstat_germ/ee_mahalanobis_distance.html
Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbor Classifier – A Reviewhttps://arxiv.org/pdf/1708.04321.pdf
The distance function effect on k-nearest neighbor classification for medical datasetshttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978658/
Introduction to k-Nearest Neighbors: Simplified (with implementation in Python)https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/
6 Easy Steps to Learn Naive Bayes Algorithm (with codes in Python and R)https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/
Introduction to One-class Support Vector Machineshttp://rvlasveld.github.io/blog/2013/07/12/introduction-to-one-class-support-vector-machines/
Python for Image Understanding: Deep Learning with Convolutional Neural Netshttps://www.slideshare.net/roelofp/python-for-image-understanding-deep-learning-with-convolutional-neural-nets
10 Ways Machine Learning Is Revolutionizing Manufacturing In 2018https://www.forbes.com/sites/louiscolumbus/2018/03/11/10-ways-machine-learning-is-revolutionizing-manufacturing-in-2018/#2a267fa023ac
Machine Learning in Manufacturing – Present and Future Use-Caseshttps://emerj.com/ai-sector-overviews/machine-learning-in-manufacturing/
The Neural Network Zoohttp://www.asimovinstitute.org/neural-network-zoo/
Top 8 Free Must-Read Books on Deep Learninghttps://www.kdnuggets.com/2018/04/top-free-books-deep-learning.html
Deep Learning – An MIT Press bookhttp://www.deeplearningbook.org
Neural Networks and Deep Learninghttp://neuralnetworksanddeeplearning.com/
Neural Networks and Learning Machines (Third Edition)https://cours.etsmtl.ca/sys843/REFS/Books/ebook_Haykin09.pdf
Selecting the number of clusters with silhouette analysis on KMeans clusteringhttps://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html

Datenbanken und ETL-Tools

w3schools.com – SQL Tutorialhttps://www.w3schools.com/sql/
Einführung in SQL – Datenbanken bearbeitenhttps://upload.wikimedia.org/wikibooks/de/d/d3/Einf%C3%BChrung_in_SQL.pdf
List of NoSQL Databaseshttp://nosql-database.org/
The history of Hadoop: From 4 nodes to the future of datahttps://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/
The history of Hadoophttps://medium.com/@markobonaci/the-history-of-hadoop-68984a11704
Hadoop Tutorial: All you need to know about Hadoop!https://www.edureka.co/blog/hadoop-tutorial/
Download Cloudera Enterprisehttps://www.cloudera.com/downloads.html
Downloads für Connected-Data-Plattformen von Hortonworkshttps://de.hortonworks.com/downloads/
–> Hortonworks wurde in der Zwischenzeit Teil von Cloudera
Safari Books Onlinehttps://www.safaribooksonline.com/
Talendhttps://de.talend.com/
Informaticahttps://www.informatica.com/de
Apache Kafkahttp://kafka.apache.org/
Confluenthttps://www.confluent.io/
Nifihttps://nifi.apache.org/
Hortonworks Data Platformhttps://de.hortonworks.com/products/data-platforms/hdf/
–> Hortonworks wurde in der Zwischenzeit Teil von Cloudera
Apache Spark Streaminghttps://spark.apache.org/streaming/
Lambda Architecturehttp://lambda-architecture.net/

Analytics-Tools

KNIMEhttps://www.knime.com/
Rapidminer
Jetzt Altair RapidMiner
https://rapidminer.com/
https://altair.com/altair-rapidminer
Wikipedia – RapidMinerhttps://de.wikipedia.org/wiki/RapidMiner
Wikipedia – KNIMEhttps://de.wikipedia.org/wiki/KNIME
Anacondahttps://www.anaconda.com/distribution/
WinPythonhttp://winpython.sourceforge.net/
Jetbrains PyCharmhttps://www.jetbrains.com/pycharm/
PyDEVhttp://www.pydev.org/
Visual Studio Codehttps://code.visualstudio.com
Notepad++https://notepad-plus-plus.org/
Matplotlibhttps://matplotlib.org/
Text Mining Onlinehttp://textminingonline.com/category/nltk
RStudiohttps://www.rstudio.com/
https://posit.co/download/rstudio-desktop/
The Comprehensive R Archive Networkhttps://cran.r-project.org/
swirlhttp://swirlstats.com/
RStudio Cheat Sheetshttps://www.rstudio.com/resources/cheatsheets/
https://posit.co/resources/cheatsheets/
20 Most Popular R packageshttp://makemeanalyst.com/20-most-popular-r-packages/
RStudio Cheat Sheetshttps://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf
Studio Cheat Sheets – Stringshttps://github.com/rstudio/cheatsheets/blob/master/strings.pdf
Project Jupyterhttp://jupyter.org/
Apache Zeppelinhttps://zeppelin.apache.org/
Tableauhttps://www.tableau.com/
Apache Supersethttps://superset.incubator.apache.org/

Weiterführende Informationen

KDNuggetshttps://www.kdnuggets.com/
Data Science Centralhttps://www.datasciencecentral.com/
Towards Data Sciencehttps://towardsdatascience.com/

Kapitel 6

Process Mining

Interview – Process Mining ist ein wichtiger Treiber der Prozessautomatisierunghttps://data-science-blog.com/blog/2017/10/19/interview-prof-scheer-process-mining-automation/
Celonishttps://www.celonis.com/de/
Fluxiconhttps://fluxicon.com/disco/
Process Mininghttp://www.processmining.org/tools/start
https://www.processmining.org/software.html#software
Dataset – Production Analysis with Process Mining Technologyhttps://data.4tu.nl/articles/dataset/Production_Analysis_with_Process_Mining_Technology/12697997
ProM Toolshttp://www.promtools.org/doku.php
https://promtools.org/prom-documentation/
Online Course: Introduction to Process Mining with ProMhttps://www.futurelearn.com/courses/process-mining
Alpha Minerhttps://www.futurelearn.com/courses/process-mining/0/steps/15637
Fuzzy Minerhttp://processmining.org/online/fuzzyminer

Berichte

Data.gov – Consumer Complaint Databasehttps://catalog.data.gov/dataset/consumer-complaint-database
German Stopwordshttps://github.com/solariz/german_stopwords/blob/master/german_stopwords_full.txt
nltk.stem packagehttp://www.nltk.org/api/nltk.stem.html
German stemming algorithmhttp://snowball.tartarus.org/algorithms/german/stemmer.html
Stemming and Lemmatization with Python NLTKhttp://text-processing.com/demo/stem/
corpus: Text Corpus Analysishttps://cran.r-project.org/web/packages/corpus/
https://cran.r-project.org/src/contrib/Archive/corpus/
ScitKit Learn – Logistic Regressionhttp://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
ScitKit Learn – Working with datahttps://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
https://scikit-learn.org/stable/auto_examples/text/index.html
Machine Learning, NLP: Text Classification using scikit-learn, python and NLTK.https://towardsdatascience.com/machine-learning-nlp-text-classification-using-scikit-learn-python-and-nltk-c52b92a7c73a

Wartung

automotiveIT: Predictive Maintenance enttäuscht Erwartungenhttps://www.automotiveit.eu/predictive-maintenance-enttaeuscht-erwartungen/news/id-0060652
automotiveIT: Predictive Maintenance fristet Schattendaseinhttps://www.automotiveit.eu/predictive-maintenance-fristet-schattendasein/news/id-0060169
Industrie 4.0 Index: Predicitive Maintenance bleibt noch deutlich hinter den Erwartungenhttps://www.staufen.ag/de/unternehmen/news-events/news/newsdetail/2018/02/industrie-40-index-predictive-maintenance-bleibt-noch-deutlich-hinter-den-erwartungen/
6. Turbofan Engine Degradation Simulation Data Sethttps://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/#turbofan
https://www.nasa.gov/intelligent-systems-division/discovery-and-systems-health/pcoe/pcoe-data-set-repository/
Modular Aero-Propulsion System Simulations – MAPSS, C-MAPSS, C-MAPSS40khttps://www.grc.nasa.gov/www/cdtb/software/mapss.html
Getting Started with Predictive Maintenance Modelshttps://www.svds.com/getting-started-predictive-maintenance-models/
Predictive Maintenance for IoThttps://www.svds.com/predictive-maintenance-iot/
Data analysis and processing techniques for remaining useful life estimationshttps://rdw.rowan.edu/cgi/viewcontent.cgi?article=3433&context=etd
GitHub – Apache Spark – Turbofan Engine Degradation Simulation Data Set example in Apache Sparkhttps://github.com/oluies/tedsds

Transporte

Bureau Of Transportation Statistics: Freight Analysis Frameworkhttps://www.bts.gov/faf
Seaborn: statistical data visualizationhttps://seaborn.pydata.org/
Geopyhttps://geopy.readthedocs.io/en/stable/
Great Circle Maps for Pythonhttps://github.com/paulgb/gcmap

Letzter Check: 21.09.2025