Best SAP Online Training
  +91-7032598380 / 20    +1-9083252273
info@kmrsoft.com / training@kmrsoft.com
Facebook
Twitter
Google+
LinkedIn
YouTube
    Menu
    • Home
      • Close
    • Services
      • IT Consulting
      • IT Development
      • IT Training
        • Online Training
        • Classroom Training
        • Corporate Training
        • Close
      • Close
    • About Us
    • SAP
      • Technical
        • SAP ABAP
        • SAP ABAP BI
        • SAP ABAP ON HANA
        • SAP OO ABAP
        • SAP WEBDYNPRO ABAP
        • SAP ABAP-HR
        • SAP WEBDYNPRO
        • SAP NETWEAVER
        • SAP WEBDYNPRO JAVA
        • SAP WORKFLOW
        • SAP ADOBE FORMS
        • SAP FIORI
        • SAP UI5
        • SAP XI PI
        • SAP EP
        • SAP BASIS
        • SAP SECURITIES
        • SAP SECURITY WITH GRC
        • SAP GRC
        • SAP BW/BI
        • SAP BW ON HANA
        • SAP BODS
        • SAP BPC
        • SAP CRM
        • SAP DMO
        • SAP HANA
        • SAP SOLUTIONS MANAGER
        • SAP WEBDYNPRO JAVA
        • SAP CPM
        • S4 Hana
      • Functional
        • SAP FICO
        • SAP FSCM
        • SAP SD
        • SAP MM
        • SAP WM
        • SAP HR
        • SAP HR / HCM
        • SAP ESS & MSS
        • SAP E-RECRUITMENT
        • IS RETAIL
        • IS UTILITIES
        • IS OIL & GAS
        • SAP CS
        • SAP CRM
        • SAP MDM
        • SAP MDG
        • SAP SRM
        • SAP PP
        • SAP PM
        • SAP TESTING
        • SAP PS
        • SAP SNC
        • SAP PLM
        • SAP GTS
        • SAP FICA
        • SAP TM
        • SAP IM
        • SAP IS PHARMA
        • SAP SUCCESS FACTOR
        • SAP TPM
        • SAP TRM
        • SAP EHS
        • SAP VISTEX
        • SAP PRODUCT COSTING
        • SAP BO/BI
        • SAP PPM
        • SAP QM
      • Dimensional Modules
        • SAP APO
        • SAP EWM
        • sap.
      • Close
    • New Technologies
      • New Emerging Technologies
        • Hadoop dev
        • Hadoop Administartor
        • Hadoop Spark & Scala
        • org4.
      • .
        • Devops
        • Data Science
        • Active Directory
        • orgd1.
      • .
        • AWS
        • Salesforce
        • Tibco
        • org6.
      • .
        • Tableau
        • PowerBI
        • Qlikview
        • TeraData
        • ServiceNow Development
        • Servicenow Admin
        • org7.
      • Close
    • Warehousing
      • Data Warehousing Training
        • Micro Strategy
        • Data Stage
        • Cognos
        • Abinitio
        • Informatica
        • OBIEE
      • 2016-07-05 (2).
      • Close
    • Microsoft
      • Microsoft Technologies
        • MS CRM dynamics
        • MS CRM AX tech
        • MS AX Financial
        • MS AX Trade & Logistics
      • .
        • Microsoft VB.NET
        • Microsoft SQL Server DBA
        • Microsoft C#.NET
        • Microsoft SharePoint
        • Microsoft ASP.NET
        • Microsoft BI
      • 2016-07-05 (3).
      • Close
    • Other Modules
      • .
        • J2EE Online Training
        • J2ME Online Training
        • .Net
        • Angular js
        • PHP
        • Plsql
        • PYTHON
        • Linux
        • .
        • WEBLOGIC
      • .
        • VMWARE V Horizon view
        • VMWARE V Sphere admin
        • .
        • Citrix admin
      • Mobile apps
        • ANDROID APPS
        • IPHONE APPS
        • MOBILE TESTING
      • Testing Tools
        • Manual Testing
        • Load Runner
        • QC
        • QTP
        • QA
      • 2016-07-05.
      • Close
    • Blog
    • Contact Us

Hadoop Spark & Scala

Home Hadoop Spark & Scala

Upcoming Batches

New SAP online & classroom training starting shortly

New Tableau online & classroom training starting shortly

New Servicenow online & classroom training starting shortly

New Angular js online & classroom training starting shortly

New Salesforce online & classroom training starting shortly

New Hadoop online & classroom training starting shortly

New Data Science online training starting shortly

New Microsoft Dynamics CRM training starting shortly

New Linux online& classroom training starting shortly

Introduction to Big Data, Hadoop & Spark Architecture

  • Introduction to Course
  • What is covered and not covered
  • Data Explosion, Data Sources, Data types
  • What is Big Data, Benefits & Big Data Problem
  • Limitations of Traditional Parallel Systems
  • Solution using Hadoop Framework
  • Characteristics and Types of Big Data Systems
  • What is Hadoop, History of Hadoop
  • Hadoop Architecture, Namenode, Job Tracker
  • HDFS and Map Reduce, Map Reduce example
  • Limitations of Hadoop 1.0 and MapReduce
  • Hadoop 2.0 and YARN Architecture
  • What is Apache Spark?
  • Apache Spark and Map Reduce differences
  • Spark Stack Architecture and Advantages
  • Spark History and Releases
  • Spark for Data science & Data processing tasks

Learning Scala – Functional Programming

  • Functions, Methods & Procedures
  • Function Literals / Anonymous Functions
  • Higher Order Functions – Function as a variable
  • Higher Order Functions – Passing function as parameter
  • Higher Order Functions – Returning a function
  • Higher Order Functions – Closures
  • Higher Order Functions – Partially Applied functions
  • Higher Order Functions – Call by Name, Call by value
  • Regular expressions and Pattern Matching
  • Case classes and Pattern Matching

Learning Scala – Basic & Object Oriented Programming

  • Scala Installation & Scala REPL Interpreter
  • First Scala Program, Scala Scripts
  • Scala Basics – Variables, Types, Control Structures, Loops
  • Scala Basics – Strings & String interpolations
  • Scala Basics – Functions without Parameters
  • Scala Basics – Functions with parameters
  • Scala Basics – Arrays, Lists, Ranges and Tuples
  • Classes, Objects and Apply method
  • Constructors and Parameters
  • Method Declaration, Call by Name
  • Singleton Objects, Packaging
  • Inheritance, Extending a class, Overriding
  • Traits, Case classes

Hands-on Scala Programming Labs

  • Creating Strings, String equality & splitting
  • Finding and replacing patterns in strings
  • Looping with Foreach, Embedded if statements
  • Using If construct as a Ternary Operator
  • Using Match expressions and assigning the result to a variable
  • Using Pattern matching in Match expressions
  • Using classes, Objects, Methods and Traits
  • Using Function Literals
  • Working with Higher Order Functions
  • Creating Collections
  • Using Map, Flatmap, Filter on Collections
  • Hands on Lab – Using Foreach and reduce on Collections

Spark Essentials

  • Getting started with Spark
  • Spark Python and Scala Shells
  • Spark Context
  • Spark Runtime Architecture – Workers and Cluster Managers
  • Spark Runtime Architecture – Driver Programs, Executors and Tasks
  • How a Spark Application works
  • Data sources for loading data into Spark
  • Understanding Hadoop Input and Output Formats
  • Understanding Data Serialization Formats – Avro and Sequence files
  • Understanding Columnar file formats – RCFile, ORC and Parquet

Advanced Spark Programming

  • Data Partitioning in Spark
  • Operations that benefit from partitioning
  • Operations that affect partitioning
  • Saving RDDs
  • Caching RDDs and Persistence
  • Word Count program using Spark
  • Spark Program Lifecycle
  • Spark Variables
  • Spark Broadcast Variables
  • Spark Accumulators and Fault Tolerance

Spark Core Programming – Understanding RDDs

  • Resilient Distributed Datasets (RDD)
  • Data sources for creating RDDs
  • Creating RDDs from text, csv and tsv files.
  • Creating RDDs from JSON files & Sequence files
  • Creating RDDs from Hadoop InputFormat
  • Creating RDDs from HDFS and Amazon S3 files
  • Creating RDDs from NOSQL Databases
  • RDD Operations – Transformations and Actions
  • Lazy evaluations
  • Loading and Saving RDDs
  • Passing functions to Spark, Spark Closures
  • Spark Key Value RDDs, Creating Pair RDDs
  • Pair RDD Transformations – Aggregations, Grouping, Joins & Sorting
  • Actions on Pair RDDs

Building and Running a Spark Scala program

  • Spark Scala API , Spark JAR files
  • Running a Spark program using spark-submit
  • Running a spark program on Standalone Cluster
  • Running a spark program on YARN
  • Launching Spark jobs from Java and Scala
  • Building a Spark application with Eclipse/Scala IDE and Maven, Maven Dependencies
  • Building a Spark application with Eclipse/Scala IDE and SBT
  • Building a Spark Fat JAR

Tuning and Debugging Spark for Performance

  • Configuring Spark with SparkConf
  • Components of a Spark program – Jobs, Tasks and Stages
  • Spark Web UI Deep Dive
  • Spark RDD Lineage
  • Spark Logs
  • Serialization and Memory Management to improve performance
  • Project Tungsten
  • Hardware Provisioning an Performance Management
  • Monitoring and Debugging a Spark Application

Spark SQL and Dataframes Programming I

  • Spark SQL and Hive Interoperability, Spark SQL Performance Advantages
  • ETL and Data warehousing with Spark SQL
  • Initializing Spark SQL using SQLContext
  • Dataframes Introduction, Caching Dataframes
  • Creating Dataframe from RDD using case class and toDF method to infer schema
  • Creating Dataframe from RDD using StructType and createDataFrame to specify schema
  • Creating Dataframes from Scala Collections
  • Creating Dataframes from text files, csv and tsv files
  • Creating Dataframes from JSON files, Parquet files & Hive Tables
  • Loading & Saving Dataframes

Hands-on Projects using Spark RDDs

  • Project 1 – Word Count using Spark Shell
  • Project 2 – Building and Running Spark Word Count Application with Eclipse/Scala IDE and Maven (using Fat JAR)
  • Project 3 – Data Exploration and Log analysis using Spark Shell
  • Project 4 – Building and Running a Log Analysis Application in Spark with Eclipse/Scala IDE and Maven

Spark SQL and Dataframes Programming II

  • Operations on Dataframes – Basic Operations, Language Integrated Query Methods
  • Operations on Dataframes – RDD Operations, Actions, Output Operations
  • Dataframe Built-in Functions – Aggregate, Collection, Date/Time, Math, String and Window functions
  • Saving Dataframes to Temporary Tables
  • Using SQL& Hive Queries in Spark SQL Program
  • SQL Joins and Operations on Spark Dataframes using SQL queries and Scala
  • UDF’s and UDAF’s,
  • Spark SQL JDBC/ODBC Server
  • Spark SQL Performance Tuning, Spark Catalyst Optimizer

Using Spark with Apache Zeppelin

  • Apache Zeppelin Notebook
  • Installing Zeppelin on Windows
  • Installing Zeppelin on Linux
  • Interpreters in Zeppelin
  • Using Zeppelin
  • Dependency Management in Zeppelin

Programming with python

  • Python 2 vs Python 3
  • Python Setup, Python Interpreter, Python scripts Python IDEs
  • Python Identifiers, Indentation, Text input and output
  • Variables, Boolean and If statements
  • Files and Loops
  • List, Tuples and Dictionaries
  • Functions
  • Modules
  • Classes
  • Exception Handling
  • Strings and Regular expressions
  • Python Scripting
  • Object Oriented Programming
  • Lambda Functions
  • Parallel Processing in Python

Hands-on Projects using Spark SQL and Dataframes

  • Project 1 – Interactive Analysis of JSON dataset using Spark SQL Dataframes
  • Project 2 – Build and Run a Spark SQL application in Eclipse to explore a tabular auction dataset
  • Project 3 – Fuzzy String Matching using Spark SQL Dataframes and Zeppelin

Programming Spark with Python

  • Pyspark shell
  • Create RDD from Collections, text files
  • Create RDD from HDFS files, Sequence files
  • Python Lambda expressions, passing functions to Spark and Closures
  • Loading & Saving RDDs
  • RDD Transformations and Actions
  • RDD cache and Persistence
  • Broadcast variables and Accumulators
  • Building Spark Python Application
  • Running and Deploying Spark Python Applications
  • Dataframes using Python
  • Spark SQL in Python

Spark Datasets and Spark 2.0

  • What’s new in Spark 2.0?
  • Datasets and Dataframes
  • SparkSession
  • Dataset Operations
  • Running SQL Queries Programmatically
  • Convert Dataframe to Dataset
  • Creating Datasets from RDD’s
  • Creating Datasets from different Datasources
  • DataFrame and Dataset unified API in Spark 2.0
  • Components of Spark Stack – An overview
    • GraphX and GraphFrames
    • Spark Streaming and Structured Streaming
    • Spark MLlib and Spark ML Pipelines
  • Word count program using Datasets
  • Hands-on Labs using Datasets
  • Hand-on Project using Datasets

What Next – Spark Careers

  • Demand for Spark
  • Spark as a high Paying Career
  • Different types of Spark careers
  • Big Data Developer
    • Career as a Big Data Developer
    • Skills of a Big Data Developer
    • Learning Path and Certification
  • Big Data engineer
    • Career as a Big Data Engineer
    • Skills of a Big Data Engineer
    • Learning Path and Certification
  • Data Scientist
    • Career as a Data Scientist
    • Skills of a Data Scientist
    • Data Science MOOCs
    • Top Data Science certifications
    • Data Science Certification courses
    • Ultimate Guide to learn Data Science

Upcoming Batches

Upcoming Batches

New SAP online & classroom training starting shortly

New Tableau online & classroom training starting shortly

New Servicenow online & classroom training starting shortly

New Angular js online & classroom training starting shortly

New Salesforce online & classroom training starting shortly

New Hadoop online & classroom training starting shortly

New Data Science online training starting shortly

New Microsoft Dynamics CRM training starting shortly

New Linux online& classroom training starting shortly

Quick Enquiry

Your Name (required)

Your Email (required)

Country Code :

Mobile No :

Our Courses :

Your Message

Secure Codecaptcha

Courses

  • Power-Bi-logo
  • informatica
  • sales-block
  • 4
  • ServiceNow-logo-300x200
  • datascience
  • logo
  • 123
  • AngularJS-large
  • 2016-07-05 (3)
  • 3
  • sap
  • 220px-Hadoop_logo.svg

About Us

KMRSoft offers a wide-range of desired training programs for corporates and graduates using online and classroom training procedures.This program facilitates with online tools to track the learning capabilities, succeed learning process, administer nonstop assessments and provide learning access irrespective of the time and location. We excel at providing convention training solutions and tailor made solutions to your development and maintenance support projects.

Our Services

  • About Us
  • SAP
  • New Technologies
  • Data Warehousing
  • Microsoft Technologies
  • MOBILE APPS
  • Other Modules

Contact Us

32-32-56165014858e6dbadaf3ba00d782f125Flat # 202, Plot # 575, Sudharshan Plaza Apartment, 6th Phase Road, 6th Phase, KPHB Colony, Kukatpally, Hyderabad, Telangana-500072.

yellow-phone-32+91-7032598380,7032598320

yellow-phone-32+1-9083252273

email-edit-iconinfo@kmrsoft.com/ training@kmrsoft.com

Find Us

Facebook
KMR Software Services Pvt. Ltd. is NOT affiliated or associated with SAP® AG.KMR Software Services Pvt. Ltd. Ltd. does NOT sell or distribute any SAP ® Copyrighted materials or software. KMR Software Services Pvt. Ltd. does NOT provide SAP IDES Access.KMR Software Services Pvt. Ltd. is NOT an official Training Partner of SAP ® AG.