Kapitel 5

Methoden

Top Data Science and Machine Learning Methods Used in 2017 https://www.kdnuggets.com/2017/12/top-data-science-machine-learning-methods.html?utm_content=buffer8610e&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 
Reguar Expressions 101 https://regex101.com/
RegExr https://regexr.com/
Python Data Preparation Case (aktualisiert) https://www.kdnuggets.com/2017/09/python-data-preparation-case-files-group-based-imputation.html
Principal Component Analysis http://setosa.io/ev/principal-component-analysis/
Introduction to Principal Components and Factor Analysis ftp://statgen.ncsu.edu/pub/thorne/molevoclass/AtchleyOct19.pdf
A One-Stop Shop for Principal Component Analysis https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c
The Random Forest Algorithm (aktualisiert)

https://towardsdatascience.com/an-implementation-and-explanation-of-the-random-forest-in-python-77bf308a9b76

Tankerkönig https://creativecommons.tankerkoenig.de/
StatsModels - Statistics in Python http://www.statsmodels.org/
Ein Jahr Markttransparenzstelle für Kraftstoffe (MTS-K): Eine erste Zwischenbilanz https://www.bundeskartellamt.de/SharedDocs/Publikation/DE/Berichte/Ein_Jahr_MTS-K_Marginalsp.pdf?__blob=publicationFile&v=10
Forecasting: Principles and Practice

https://www.otexts.org/fpp/

Predicting house value using regression analysis https://towardsdatascience.com/regression-analysis-model-used-in-machine-learning-318f7656108a
Least-Squares Regression https://faculty.elgin.edu/dkernler/statistics/ch04/4-2.html
Coursera - Spezialisierung Deep Learning https://www.coursera.org/specializations/deep-learning
Linear Regression: Implementation, Hyperparameters and their Optimizations http://pavelbazin.com/post/linear-regression-hyperparameters/#linear-regression-implementation-hyperparameters-and-their-optimizations
ScitKitLearn - Logistic Regression http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
What are kernels in machine learning and SVM and why do we need them? https://www.quora.com/What-are-kernels-in-machine-learning-and-SVM-and-why-do-we-need-them
Mahalanobis-Distanz http://www.statistics4u.com/fundstat_germ/ee_mahalanobis_distance.html
Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbor Classifier – A Review https://arxiv.org/pdf/1708.04321.pdf
The distance function effect on k-nearest neighbor classification for medical datasets https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978658/
Introduction to k-Nearest Neighbors: Simplified (with implementation in Python) https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/
6 Easy Steps to Learn Naive Bayes Algorithm (with codes in Python and R) https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/
Introduction to One-class Support Vector Machines http://rvlasveld.github.io/blog/2013/07/12/introduction-to-one-class-support-vector-machines/
Python for Image Understanding: Deep Learning with Convolutional Neural Nets https://www.slideshare.net/roelofp/python-for-image-understanding-deep-learning-with-convolutional-neural-nets
10 Ways Machine Learning Is Revolutionizing Manufacturing In 2018  https://www.forbes.com/sites/louiscolumbus/2018/03/11/10-ways-machine-learning-is-revolutionizing-manufacturing-in-2018/#2a267fa023ac
Machine Learning in Manufacturing – Present and Future Use-Cases https://emerj.com/ai-sector-overviews/machine-learning-in-manufacturing/
The Neural Network Zoo http://www.asimovinstitute.org/neural-network-zoo/
Top 8 Free Must-Read Books on Deep Learning https://www.kdnuggets.com/2018/04/top-free-books-deep-learning.html
Deep Learning - An MIT Press book http://www.deeplearningbook.org/
Neural Networks and Deep Learning http://neuralnetworksanddeeplearning.com/
Neural Networks and Learning Machines (Third Edition) https://cours.etsmtl.ca/sys843/REFS/Books/ebook_Haykin09.pdf
Selecting the number of clusters with silhouette analysis on KMeans clustering https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html

 

Datenbanken und ETL-Tools

w3schools.com - SQL Tutorial https://www.w3schools.com/sql/
Einführung in SQL - Datenbanken bearbeiten https://upload.wikimedia.org/wikibooks/de/d/d3/Einf%C3%BChrung_in_SQL.pdf
List of NoSQL Databases http://nosql-database.org/
The history of Hadoop: From 4 nodes to the future of data https://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/
The history of Hadoop https://medium.com/@markobonaci/the-history-of-hadoop-68984a11704
 Hadoop Tutorial: All you need to know about Hadoop! https://www.edureka.co/blog/hadoop-tutorial/
Download Cloudera Enterprise https://www.cloudera.com/downloads.html
Downloads für Connected-Data-Plattformen von Hortonworks https://de.hortonworks.com/downloads/
Safari Books Online https://www.safaribooksonline.com/
Talend https://de.talend.com/
Informatica https://www.informatica.com/de/
Apache Kafka http://kafka.apache.org/
Confluent https://www.confluent.io/
Nifi https://nifi.apache.org/
Hortonworks Data Platform https://de.hortonworks.com/products/data-platforms/hdf/
Apache Spark Streaming https://spark.apache.org/streaming/
Lambda Architecture http://lambda-architecture.net/

 

 

Analytics-Tools

KNIME https://www.knime.com/
Rapidminer https://rapidminer.com/
Wikipedia - RapidMiner https://de.wikipedia.org/wiki/RapidMiner
Wikipedia - KNIME https://de.wikipedia.org/wiki/KNIME
Anaconda https://www.anaconda.com/distribution/
WinPython http://winpython.sourceforge.net/
Jetbrains PyCharm https://www.jetbrains.com/pycharm/
PyDEV http://www.pydev.org/
Visual Studio Code https://code.visualstudio.com/
Notepad++ https://notepad-plus-plus.org/
ScitKitLearn  http://scikit-learn.org/
Matplotlib https://matplotlib.org/
Text Mining Online http://textminingonline.com/category/nltk
RStudio https://www.rstudio.com/
The Comprehensive R Archive Network https://cran.r-project.org/
swirl http://swirlstats.com/
RStudio Cheat Sheets https://www.rstudio.com/resources/cheatsheets/
20 Most Popular R packages http://makemeanalyst.com/20-most-popular-r-packages/
RStudio Cheat Sheets https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf
RStudio Cheat Sheets - Strings https://github.com/rstudio/cheatsheets/blob/master/strings.pdf
Project Jupyter http://jupyter.org/
Apache Zeppelin https://zeppelin.apache.org/
Tableau https://www.tableau.com/
Apache Superset https://superset.incubator.apache.org/

 

 

Weiterführende Informationen

KDNuggets https://www.kdnuggets.com/
Data Science Central https://www.datasciencecentral.com/
Towards Data Science https://towardsdatascience.com/