Join Free
Online Orientation Session on

Spark Structured Streaming on the Cloud: Introduction to Internals

  • Get Recording

  • About the Webinar

    Duration

    60 mins

    Day

    January 31, 2018

    Time

    3:00 pm

    Who is the Speaker?

    Vikram currently leads the Streaming team at Qubole. He completed his bachelors and masters in Computer Science and Engineering from IIT Delhi. He started his career at Lehman Brothers and lived the testing time of Bankruptcy and financial crisis. After a short stint understanding correlation financial product and its risk analysis, he started his technology start-up journey first as a co-founder of a web-based conferencing solution and then as an early engineer at Qubole. 

    In last five years at Qubole, he wore multiple hats and worked across stacks in providing data solutions in the cloud.

    Vikram Agrawal

    Key Takeaways
    • Understanding of Data Processing Architecture
    • Why and When to use Spark’s Structured Streaming
    • Spark’s Structured Streaming Programming Paradigm
    • Internals of Spark’s Structured Streaming
    • Spark Structured Streaming in the Real World – examples of how customers of Qubole use it

    Session Agenda

    Apache Spark has been gaining steam, with rapidity, both in the headlines and in real-world adoption. Spark was developed in 2009, and open-sourced in 2010. Since then, it has grown to become one of the largest open source communities in big data with over 200 contributors from more than 50 organizations. This open-source analytics engine stands out for its ability to process large volumes of data significantly faster than contemporaries such as MapReduce, primarily owing to in-memory storage of data on its own processing framework. That being said, one of the top real-world industry use cases for Apache Spark is its ability to process ‘streaming data‘.

    With so much data being processed on a daily basis, it has become essential for companies to be able to stream and analyze it all in real-time, and Spark Streaming has the capability to handle this extra workload. Some experts even theorize that Spark could become the go-to platform for stream-computing applications, no matter the type. The reason for this claim is that Spark Streaming unifies disparate data processing capabilities, allowing developers to use a single framework to accommodate all their processing needs. Among the general ways that Spark Streaming is being used by businesses today are Streaming ETL, Data Enrichment, Trigger Event Detection, and Complex Session Analysis. In this webinar, we will cover an introduction, internals and industry use cases of ‘Structured Streaming in Spark’.

    Who Should Attend?



    Students

    Computer Science Graduates

    Aspiring Machine Learning Engineers



    CTO’s

    Aspiring Data Analysts

    Aspiring Data Scientists

    Software Engineers