Beitragsseiten

Hier finden Sie die Links zu allen Onlineressourcen, die im Buch referenziert oder anderweitig angegeben wurden.

Stand: 19.07.2019

 


Kapitel 1

Data Scientist: The Sexiest Job of the 21st Century  https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
Top Trends in the Gartner Hype Cycle for Emerging Technologies 2017 http://www.gartner.com/smarterwithgartner/top-trends-in-the-gartner-hype-cycle-for-emerging-technologies-2017/

Kapitel 2

Rollen

Close look at Data Scientist vs Data Engineer http://www.techiexpert.com/close-look-data-scientist-vs-data-engineer/
The Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
How Do I Become a Data Scientist? https://advanceddataanalytics.net/2015/05/12/how-do-i-become-a-data-scientist/
The New Data Scientist Venn Diagram  https://whatsthebigdata.com/2016/07/08/the-new-data-scientist-venn-diagram/

  
Online-Kurse und Ressourcen

Online-Kurse  https://www.coursera.org/
Online-Kurse   https://www.edx.org/
Online-Kurse  und viele Bücher, die im O'Reilly Verlag erschienen sind  https://www.safaribooksonline.com/
Online-Kurse   https://eu.udacity.com/
Online-Kurse   https://www.udemy.com/

 

 Weitere Informationen

SystemML (jetzt SystemDS) https://systemds.apache.org/
DevOps, und alle ziehen an einem Strang https://www.cloudcomputing-insider.de/devops-und-alle-ziehen-an-einem-strang-a-501139/
DevOps: Schluss mit den Grenzen zwischen Entwicklung und Operations  https://de.atlassian.com/devops

 

 


Kapitel 3

Herausforderungen

Weshalb die meisten Big-Data-Projekte scheitern https://www.datacenter-insider.de/weshalb-die-meisten-big-data-projekte-scheitern-a-417085/
Woran Big-Data-Analysen wirklich scheitern https://www.bigdata-insider.de/woran-big-data-analysen-wirklich-scheitern-a-677594/
Berater scheitern an Data Analytics https://www.cio.de/a/berater-scheitern-an-data-analytics,3580190
BARC-Studie: Data-Preparation-Initiativen scheitern oft an Fachkräftemangel und fehlender Management-Unterstützung https://barc.de/news/barc-studie-data-preparation-initiativen-scheitern-oft-an-fachkraftemangel-und-fehlender-management-unterstutzung
Why Silicon Valley's 'Fail Fast' Mantra Is Just Hype https://www.forbes.com/sites/robasghar/2014/07/14/why-silicon-valleys-fail-fast-mantra-is-just-hype/#170c1a6d24bc

 

CRISP-DM

KDD, SEMMA and CRISP-DM: A parallel overview http://recipp.ipp.pt/bitstream/10400.22/136/3/KDD-CRISP-SEMMA.pdf
CRISP-DM 1.0 - Step-by-step data mining guide https://www.the-modeling-agency.com/crisp-dm.pdf
The CRISP-DM User Guide https://s2.smu.edu/~mhd/8331f03/crisp.pdf
Why Continuous Learning is the key towards Machine Intelligence https://medium.com/@vlomonaco/why-continuous-learning-is-the-key-towards-machine-intelligence-1851cb57c308

 

SCRUM, KANBAN, Machine Learning Canvas

Scrum-Einführung http://scrum-master.de/Scrum-Einfuehrung
Scrum - Ein kurzer Blick auf die Verwendung des Scrum-Frameworks in der Softwareentwicklung https://de.atlassian.com/agile/scrum
Sind wir schon da? – Die Definition of Done (DOD) https://www.scrum.de/sind-wir-schon-da-die-definition-of-done-dod/
KANBAN Board Simulation http://www.kanbansim.org/

 

Cloud 

Data Lakes and Analytics on AWS https://aws.amazon.com/de/products/analytics/
Azure Analysis Services https://azure.microsoft.com/en-us/services/analysis-services/
Azure Big data and analytics https://azure.microsoft.com/en-us/solutions/big-data/
IBM Analytics Services https://www.ibm.com/cloud/analytics
IBM Analytics Engine https://www.ibm.com/cloud/analytics-engine
Data and analytics services on IBM cloud https://www.ibm.com/cloud/data
IBM Watson Studio (aktualisiert) https://www.ibm.com/de-de/cloud/watson-studio
Comparing Machine Learning as a Service: Amazon, Microsoft Azure, Google Cloud AI https://www.kdnuggets.com/2018/01/mlaas-amazon-microsoft-azure-google-cloud-ai.html
Simple Monthly Calculator http://calculator.s3.amazonaws.com/index.html

 

Weitere Informationen

Was ist Business Intelligence – BI? https://www.bigdata-insider.de/was-ist-business-intelligence-bi-a-563185/
Data Science Plattform Kaggle https://www.kaggle.com/

 


Kapitel 4

Die 9 V von Big Data  https://blog.qsc.de/2016/08/die-9-v-von-big-data/ 

13 V’s in Big Data (Link tot)

--> Alternative: tdwi - The 10 Vs of Big Data

http://www.godatafy.com/tag/13-vs-in-big-data/

https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx

Was ist Big Data? – Eine Definition mit fünf V https://blog.unbelievable-machine.com/was-ist-big-data-definition-f%C3%BCnf-v
Smart Data Newsletter https://www.digitale-technologien.de/DT/Redaktion/DE/Downloads/Publikation/SmartData_NL1.pdf?__blob=publicationFile&v=5
Attacking Machine Learning with Adversarial Examples https://blog.openai.com/adversarial-example-research/
Aktuelles Schlagwort “Semi-strukturierte Daten” https://www.en.pms.ifi.lmu.de/publications/PMS-FB/PMS-FB-2001-9.pdf
The world’s most valuable resource is no longer oil, but data https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data
Is Your Company’s Data Actually Valuable in the AI Era? https://hbr.org/2018/01/is-your-companys-data-actually-valuable-in-the-ai-era

 


Kapitel 5

Methoden

Top Data Science and Machine Learning Methods Used in 2017 https://www.kdnuggets.com/2017/12/top-data-science-machine-learning-methods.html?utm_content=buffer8610e&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 
Reguar Expressions 101 https://regex101.com/
RegExr https://regexr.com/
Python Data Preparation Case (aktualisiert) https://www.kdnuggets.com/2017/09/python-data-preparation-case-files-group-based-imputation.html
Principal Component Analysis http://setosa.io/ev/principal-component-analysis/
Introduction to Principal Components and Factor Analysis ftp://statgen.ncsu.edu/pub/thorne/molevoclass/AtchleyOct19.pdf
A One-Stop Shop for Principal Component Analysis https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c
The Random Forest Algorithm (aktualisiert)

https://towardsdatascience.com/an-implementation-and-explanation-of-the-random-forest-in-python-77bf308a9b76

Tankerkönig https://creativecommons.tankerkoenig.de/
StatsModels - Statistics in Python http://www.statsmodels.org/
Ein Jahr Markttransparenzstelle für Kraftstoffe (MTS-K): Eine erste Zwischenbilanz https://www.bundeskartellamt.de/SharedDocs/Publikation/DE/Berichte/Ein_Jahr_MTS-K_Marginalsp.pdf?__blob=publicationFile&v=10
Forecasting: Principles and Practice

https://www.otexts.org/fpp/

Predicting house value using regression analysis https://towardsdatascience.com/regression-analysis-model-used-in-machine-learning-318f7656108a
Least-Squares Regression https://faculty.elgin.edu/dkernler/statistics/ch04/4-2.html
Coursera - Spezialisierung Deep Learning https://www.coursera.org/specializations/deep-learning
Linear Regression: Implementation, Hyperparameters and their Optimizations http://pavelbazin.com/post/linear-regression-hyperparameters/#linear-regression-implementation-hyperparameters-and-their-optimizations
ScitKitLearn - Logistic Regression http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
What are kernels in machine learning and SVM and why do we need them? https://www.quora.com/What-are-kernels-in-machine-learning-and-SVM-and-why-do-we-need-them
Mahalanobis-Distanz http://www.statistics4u.com/fundstat_germ/ee_mahalanobis_distance.html
Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbor Classifier – A Review https://arxiv.org/pdf/1708.04321.pdf
The distance function effect on k-nearest neighbor classification for medical datasets https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978658/
Introduction to k-Nearest Neighbors: Simplified (with implementation in Python) https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/
6 Easy Steps to Learn Naive Bayes Algorithm (with codes in Python and R) https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/
Introduction to One-class Support Vector Machines http://rvlasveld.github.io/blog/2013/07/12/introduction-to-one-class-support-vector-machines/
Python for Image Understanding: Deep Learning with Convolutional Neural Nets https://www.slideshare.net/roelofp/python-for-image-understanding-deep-learning-with-convolutional-neural-nets
10 Ways Machine Learning Is Revolutionizing Manufacturing In 2018  https://www.forbes.com/sites/louiscolumbus/2018/03/11/10-ways-machine-learning-is-revolutionizing-manufacturing-in-2018/#2a267fa023ac
Machine Learning in Manufacturing – Present and Future Use-Cases https://emerj.com/ai-sector-overviews/machine-learning-in-manufacturing/
The Neural Network Zoo http://www.asimovinstitute.org/neural-network-zoo/
Top 8 Free Must-Read Books on Deep Learning https://www.kdnuggets.com/2018/04/top-free-books-deep-learning.html
Deep Learning - An MIT Press book http://www.deeplearningbook.org/
Neural Networks and Deep Learning http://neuralnetworksanddeeplearning.com/
Neural Networks and Learning Machines (Third Edition) https://cours.etsmtl.ca/sys843/REFS/Books/ebook_Haykin09.pdf
Selecting the number of clusters with silhouette analysis on KMeans clustering https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html

 

Datenbanken und ETL-Tools

w3schools.com - SQL Tutorial https://www.w3schools.com/sql/
Einführung in SQL - Datenbanken bearbeiten https://upload.wikimedia.org/wikibooks/de/d/d3/Einf%C3%BChrung_in_SQL.pdf
List of NoSQL Databases http://nosql-database.org/
The history of Hadoop: From 4 nodes to the future of data https://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/
The history of Hadoop https://medium.com/@markobonaci/the-history-of-hadoop-68984a11704
 Hadoop Tutorial: All you need to know about Hadoop! https://www.edureka.co/blog/hadoop-tutorial/
Download Cloudera Enterprise https://www.cloudera.com/downloads.html
Downloads für Connected-Data-Plattformen von Hortonworks https://de.hortonworks.com/downloads/
Safari Books Online https://www.safaribooksonline.com/
Talend https://de.talend.com/
Informatica https://www.informatica.com/de/
Apache Kafka http://kafka.apache.org/
Confluent https://www.confluent.io/
Nifi https://nifi.apache.org/
Hortonworks Data Platform https://de.hortonworks.com/products/data-platforms/hdf/
Apache Spark Streaming https://spark.apache.org/streaming/
Lambda Architecture http://lambda-architecture.net/

 

 

Analytics-Tools

KNIME https://www.knime.com/
Rapidminer https://rapidminer.com/
Wikipedia - RapidMiner https://de.wikipedia.org/wiki/RapidMiner
Wikipedia - KNIME https://de.wikipedia.org/wiki/KNIME
Anaconda https://www.anaconda.com/distribution/
WinPython http://winpython.sourceforge.net/
Jetbrains PyCharm https://www.jetbrains.com/pycharm/
PyDEV http://www.pydev.org/
Visual Studio Code https://code.visualstudio.com/
Notepad++ https://notepad-plus-plus.org/
ScitKitLearn  http://scikit-learn.org/
Matplotlib https://matplotlib.org/
Text Mining Online http://textminingonline.com/category/nltk
RStudio https://www.rstudio.com/
The Comprehensive R Archive Network https://cran.r-project.org/
swirl http://swirlstats.com/
RStudio Cheat Sheets https://www.rstudio.com/resources/cheatsheets/
20 Most Popular R packages http://makemeanalyst.com/20-most-popular-r-packages/
RStudio Cheat Sheets https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf
RStudio Cheat Sheets - Strings https://github.com/rstudio/cheatsheets/blob/master/strings.pdf
Project Jupyter http://jupyter.org/
Apache Zeppelin https://zeppelin.apache.org/
Tableau https://www.tableau.com/
Apache Superset https://superset.incubator.apache.org/

 

 

Weiterführende Informationen

KDNuggets https://www.kdnuggets.com/
Data Science Central https://www.datasciencecentral.com/
Towards Data Science https://towardsdatascience.com/

Kapitel 6

Process Mining

Interview – Process Mining ist ein wichtiger Treiber der Prozessautomatisierung https://data-science-blog.com/blog/2017/10/19/interview-prof-scheer-process-mining-automation/ 
Celonis https://www.celonis.com/de/
Fluxicon https://fluxicon.com/disco/
Process Mining http://www.processmining.org/tools/start
Dataset - Production Analysis with Process Mining Technology (aktualisiert) https://data.4tu.nl/articles/dataset/Production_Analysis_with_Process_Mining_Technology/12697997
ProM Tools http://www.promtools.org/doku.php
Online Course: Introduction to Process Mining with ProM https://www.futurelearn.com/courses/process-mining
Alpha Miner https://www.futurelearn.com/courses/process-mining/0/steps/15637
Fuzzy Miner (aktualisiert) http://processmining.org/online/fuzzyminer

 

Berichte

Data.gov - Consumer Complaint Database https://catalog.data.gov/dataset/consumer-complaint-database
German Stopwords https://github.com/solariz/german_stopwords/blob/master/german_stopwords_full.txt
nltk.stem package http://www.nltk.org/api/nltk.stem.html
German stemming algorithm http://snowball.tartarus.org/algorithms/german/stemmer.html
Stemming and Lemmatization with Python NLTK http://text-processing.com/demo/stem/
corpus: Text Corpus Analysis https://cran.r-project.org/web/packages/corpus/
ScitKit Learn - Logistic Regression  http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
ScitKit Learn - Working with data https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
Machine Learning, NLP: Text Classification using scikit-learn, python and NLTK. https://towardsdatascience.com/machine-learning-nlp-text-classification-using-scikit-learn-python-and-nltk-c52b92a7c73a

 

Wartung

automotiveIT: Predictive Maintenance enttäuscht Erwartungen https://www.automotiveit.eu/predictive-maintenance-enttaeuscht-erwartungen/news/id-0060652 
automotiveIT: Predictive Maintenance fristet Schattendasein https://www.automotiveit.eu/predictive-maintenance-fristet-schattendasein/news/id-0060169
Industrie 4.0 Index: Predicitive Maintenance bleibt noch deutlich hinter den Erwartungen https://www.staufen.ag/de/unternehmen/news-events/news/newsdetail/2018/02/industrie-40-index-predictive-maintenance-bleibt-noch-deutlich-hinter-den-erwartungen/
6.Turbofan Engine Degradation Simulation Data Set https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/#turbofan
Modular Aero-Propulsion System Simulations - MAPSS, C-MAPSS, C-MAPSS40k https://www.grc.nasa.gov/www/cdtb/software/mapss.html
Getting Started with Predictive Maintenance Models https://www.svds.com/getting-started-predictive-maintenance-models/
Predictive Maintenance for IoT https://www.svds.com/predictive-maintenance-iot/
Data analysis and processing techniques for
remaining useful life estimations
https://rdw.rowan.edu/cgi/viewcontent.cgi?article=3433&context=etd
GitHub - Apache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark https://github.com/oluies/tedsds

 

Transporte

Bureau Of Transportation Statistics: Freight Analysis Framework  https://www.bts.gov/faf
Seaborn: statistical data visualization https://seaborn.pydata.org/
Geopy https://geopy.readthedocs.io/en/stable/
Great Circle Maps for Python https://github.com/paulgb/gcmap