Nabídka tohoto termínu kurzu již není aktuální. Podobné kurzy naleznete zde.

Practical Data Science with Cortana Intelligence: Azure Machine Learning, SQL Data Mining and Microsoft R

Základní info

Popis kurzu

There is much readily available information about algorithms, deep-learning frameworks, or stastistical software packages, but how do you put it all together to solve a real-world problem with data science? This 5-day course will teach you about the tools you need, but above it all, it will also carefully explain the working methods and processes that successful data scientists use. Not only will you know the algorithms, but you will also know how—and when—to start and finish your projects, or which ones are likely to succeed but only with significant extra effort.



  • You will learn machine learning, data mining, some statistics, data preparation, and how to interpret the results.

  • You will see how to formulate business questions in terms of data science hypotheses and experiments, and how to prepare inputs to answer those questions.

  • We will cover common issues and mistakes, how to resolve them, like overtraining, and how to cope with rare events, such as fraud.


At the end of this course you will be able to plan and run data science projects.


There is much readily available information about algorithms, deep-learning frameworks, or stastistical software packages, but how do you put it all together to solve a real-world problem with data science? This 5-day course will teach you about the tools you need, but above it all, it will also carefully explain the working methods and processes that successful data scientists use. Not only will you know the algorithms, but you will also know how—and when—to start and finish your projects, or which ones are likely to succeed but only with significant extra effort.



  • You will learn machine learning, data mining, some statistics, data preparation, and how to interpret the results.

  • You will see how to formulate business questions in terms of data science hypotheses and experiments, and how to prepare inputs to answer those questions.

  • We will cover common issues and mistakes, how to resolve them, like overtraining, and how to cope with rare events, such as fraud.


At the end of this course you will be able to plan and run data science projects.

Obsah kurzu

Module 1: Data Science Fundamentals



  • Introduction to data science and its components

  • Machine learning vs data mining vs artificial intelligence

  • Tools landscape

  • Statistics

  • Big data

  • Data wrangling

  • Teamwork


Module 2.1: Tools (SQL & R)



  • Getting started with and using SSAS DM, and SQL R

  • Structures, models, data flows

  • Configuration concerns and pricing

  • Using Rattle with R and RStudio

  • Using SQL Server 2016 R Server and Services

  • Getting a feel for the data: interpreting notched boxplots in R


Module 2.2: Tools (R & Azure ML)



  • Overview of Cortana Intelligence Suite

  • Getting started with and using Azure ML and Cortana R

  • Azure requirements and dependencies

  • Provisioning workspaces

  • Uploading and connecting to SQL Azure data

  • Creating and running Azure ML experiments (programs)

  • Embedding R in Azure ML


Module 3: Data



  • Inputs and outputs, features and labels

  • Data formats, discretization vs continuous

  • Cases, observations, signatures

  • Feature engineering

  • Azure ML data preparation and manipulation modules

  • Preparing unstructured text for text analysis

  • Feature hashing

  • Moving data around and its storage

  • Briefly: other Cortana Intelligence Suite tools for data management and storage, including data lakes, BLOBs, and other Hadoop


Module 4: Process



  • Stating business question in data science term

  • CRISP-DM

  • Scientific method of reasoning

  • Hypothesis testing and experiments

  • Student’s t-test

  • Pearson chi-squared test

  • Iterative hypothesis refinement


Module 5: Algorithms



  • What does data mining do?

  • Algorithm classes in Azure ML, R, and SSAS

  • Supervised vs Unsupervised learning

  • Classifiers

  • Clustering

  • Regression

  • Similarity Matching

  • Recommenders


Module 6: Clustering, Segmentation, and Anomaly Detection and Prediction



  • Introduction to segmentation

  • Clustering algorithms (k-means, EM, and others)

  • Interpreting clusters

  • Cluster characteristics

  • Discrimination

  • Tornado charts

  • Using clustering for text analysis

  • Anomaly detection with clustering, PCA and SVMs


Module 7: Classification



  • Introduction to classifiers

  • Two-class (binary) vs multi-class

  • Decision trees, forests, and boosting

  • Decision jungles

  • Neural networks and logistic regression

  • Overfitting (overtraining) concerns

  • Using classifiers for text analysis

  • Associative decision trees


Module 8: Basic Statistics



  • Basic concepts of statistics: population vs sample, measure types, means and dispersion, distributions

  • Confidence intervals, p-values

  • Correlation

  • Descriptive statistics with R

  • Basic concepts of probability

  • Finding important features using p-values, linear regression and ANOVA


Module 9: Model Validation



  • Testing accuracy

  • Lift charts

  • Testing reliability

  • Testing usefulness


Module 10: Classifier Precision



  • Testing classifiers

  • False positives vs. false negatives

  • Classification (confusion) matrix

  • Precision

  • Recall

  • Balancing precision with recall vs business goals and constraints

  • Charting precision-recall (sensitivity-specificity)

  • ROC curves

  • Other measures of accuracy

  • Cross-validation

  • Optimising binary classifier thresholds for a known business goal of prediction quality

  • Refining models to improve accuracy and reliability

  • Hyperparameter tuning

  • Class imbalance problem (fraud analytics and rare event prediction)


Module 11: Regressions



  • Introduction to simple regressions

  • Linear regression (classic)

  • Regression decision trees and other ensemble regression algorithms

  • Relationship to ANOVA

  • Measuring linear regression quality (R-squared, predictor p-values, RMSE, MAE, RAE, RSE, and additional testing using R)


Module 12: Similarity Matching & Recommenders



  • Introduction to recommender concepts

  • Model-based, similarity-based, and hybrid recommenders

  • Association rules

  • Understanding itemsets and rules

  • Rule importance vs. rule probability

  • Data structures for association rules

  • Market Basket Analysis

  • Collaborative filtering

  • Matchbox recommenders

  • Validating recommenders


Module 13: Other Algorithms (Brief Overview)



  • Sequence clustering and Markov chains

  • SVM (Support Vector Machines)

  • Time series

  • Image recognition

  • Text analysis


Module 14: Production & Model Maintenance



  • Deploying models to production

  • SSAS models and DMX queries

  • Azure ML web services: preparation and publishing

  • REST APIs: request/response vs batch

  • On-going maintenance and model updates

Studijní materiály

V angličtině

Practical Data Science with Cortana Intelligence: Azure Machine Learning, SQL Data Mining and Microsoft R

Vybraný termín:

8.3.2021  ONLINE

Cena

Kontaktovat dodavatele


Kontrola proti spamu. Kolik je dvě a osm ? Součet zapište číslicemi.