Join for free and connect with our local tech scene
Stay on top of the latest companies and upcoming events with our weekly newsletter, and be counted among the people building the future of your local tech community.
Topic: Spark and Impala on S3: Experiences, issues, and lessons learned
As more and more businesses start moving their Big Data infrastructure on to AWS, they are presented with several challenges. Hadoop clusters on cloud infrastructure can be built to take advantages of separating storage and compute. We have many tools in big data ecosystem which support running operations on top of S3 among which Spark/Impala are the primary ones but everything doesn't work as expected just because it is supported. Navigating through the various options, understanding the pros and cons of each of the options can be daunting and overwhelming. There are a number of things that most reference architectures do not talk about
In this talk, we will go over the issues/limitations that we have faced when using Impala/Spark with S3 and also understand some behaviors of these tools when working with data in S3.
Presenter: Dinesh Jasti, Big Data Engineer, Clairvoyant
Dinesh has been working in big data technologies for more than two years since he graduated from Illinois Tech. He designs and develops complex big data ETL pipelines at Clairvoyant. He is passionate about solving and working on complex problems that involve working on massive data sets. He is highly skilled in Hadoop, Spark, Impala, Airflow, S3 and many more big data frameworks.
Location: Room #206/207 - The University of Advancing Technology