Data Science Online Training

Data Science Online Training
Featured

Data Science Online Training

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...Loading...

Data Science Online Training Curriculum

Unit 1: Introduction to Data Science

Introduction to Big Data
Roles played by a Data Scientist
Analysing Big Data using Hadoop and R
Different Methodologies used for analysis in Data Science
The Architecture and Methodologies used to solve the Big Data problems
For example, Data Acquisition from various sources
Data preparation
Data transformation using Map Reduce (RMR)
Application of Machine Learning Techniques
Data Visualization etc.,
problem statement of few data science problems which we shall solve during the course

Unit 2: Basic Data Manipulation using R in Data Science.

Understanding vectors in R
Reading Data
Combining Data
sub-setting data
sorting data and some basic data generation functions

Unit 3: Machine Learning Techniques Using R Part-1

Machine Learning Overview
ML Common Use Cases and techniques
Clustering and Similarity Metrics
Distance Measure Types: Euclidean, Cosine Measures, Creating predictive models

Unit 4: Machine Learning Techniques Using R Part-2

Understanding K-Means Clustering in Data Science
Understanding TF-IDF and Cosine Similarity and their application to Vector Space Model
Implementing Association rule mining in R.

Unit 5: Data Science Machine Learning Techniques Using R Part-3

Understanding Process flow of Supervised Learning Techniques
Decision Tree Classifier
How to build Decision trees
Random Forest Classifier
What is Random Forests concept in data science
Features of Random Forest
Out of Box Error Estimate and Variable Importance
Naive Bayes Classifier

Unit 6: Introduction to Hadoop Architecture

Hadoop Architecture
Common Hadoop commands
MapReduce and Data loading techniques (Directly in R and in Hadoop using SQOOP, FLUME, and other data Loading Techniques)
Removing anomalies from the data

Unit 7: Integrating R with Hadoop

Integrating R with Hadoop using R
Hadoop and RMR package
Exploring RHIPE (R Hadoop Integrated Programming Environment)
Writing MapReduce Jobs in R and executing them on Hadoop

Unit 8: Data Science Mahout Introduction and Algorithm Implementation

Implementing Machine Learning Algorithms on larger Data Sets with Apache Mahout

Unit 9: Additional Mahout Algorithms and Parallel Processing using R

Implementation of different Mahout algorithms
Random Forest Classifier with parallel processing Library in R

Unit 10: Project

Project Discussion
Problem Statement and Analysis
Various approaches to solve a Data Science Problem
Pros and Cons of different approaches and algorithms

Lessons

  1. Introduction to Big Data

    Roles played by a Data Scientist

    Analysing Big Data using Hadoop and R

    Different Methodologies used for analysis in Data Science

    The Architecture and Methodologies used to solve the Big Data problems

    For example, Data Acquisition from various sources

    Data preparation

    Data transformation using Map Reduce (RMR)

    Application of Machine Learning Techniques

    Data Visualization etc.,

    problem statement of few data science problems which we shall solve during the course

  2. Machine Learning Overview

    ML Common Use Cases and techniques

    Clustering and Similarity Metrics

    Distance Measure Types: Euclidean, Cosine Measures, Creating predictive models

  3. Understanding K-Means Clustering in Data Science

    Understanding TF-IDF and Cosine Similarity and their application to Vector Space Model

    Implementing Association rule mining in R.

  4. Understanding Process flow of Supervised Learning Techniques

    Decision Tree Classifier

    How to build Decision trees

    Random Forest Classifier

    What is Random Forests concept in data science

    Features of Random Forest

    Out of Box Error Estimate and Variable Importance

    Naive Bayes Classifier

  5. Hadoop Architecture

    Common Hadoop commands

    MapReduce and Data loading techniques (Directly in R and in Hadoop using SQOOP, FLUME, and other data Loading Techniques)

    Removing anomalies from the data

  6. Integrating R with Hadoop using R

    Hadoop and RMR package

    Exploring RHIPE (R Hadoop Integrated Programming Environment)

    Writing MapReduce Jobs in R and executing them on Hadoop

  7. Project Discussion

    Problem Statement and Analysis

    Various approaches to solve a Data Science Problem

    Pros and Cons of different approaches and algorithms