What to expect at Big Data Tech

Engaging presentations for all skill levels

Big Data Tech is promising to be a highly-informative and engaging event, whether you are a NoSQL newbie or an open-source pro. There will be introductory presentations for Cassandra and NoSQL, as well as Apache Hadoop 101 and Apache Spark 101. Below are abstracts of just some of the presentations you will see at Big Data Tech. More details and the full lineup of speakers can be found at the event page. The schedule can be viewed online here.

Conference Preview

Delivering Enterprise-Ready Hadoop
Jamie Engesser

Enterprises are turning to Hadoop to build an enterprise data lake designed to offer new value streams and optimize existing business. This transformation not only requires a new approach and new tools, it also needs to be fully integrated across the data center. The challenge is now one of enterprise readiness. A data lake requires trusted governance, comprehensive security and consistent operations to provide the controls and agility to ensure data is appropriate, accurate and ready for use by the business. Jamie Engesser discusses improving user-experience, accelerating Hadoop adoption, the typical adoption journey to deliver an enterprise data lake and common use-cases from clickstream to IoT.

Enabling Search Applications Using Cassandra
Rachel Pedreschi

Wait! Back away from the Cassandra secondary index. It’s ok for some use cases, but it’s not an easy button. “But I need to search through a bunch of columns to look for the data… and I can’t model that in C*, even after watching all of Patrick McFadins data modeling videos. What do I do?” The answer, dear developer, is in DSE Search. With it’s easy Solr API, Lucene indexes (and fault tolerance) you can search data stored in your Cassandra database until your heart’s content. Take my hand. I will show you how.

HBase Application Archetypes
Matteo Bertozzi

Today, there are hundreds of production HBase clusters running a multitude of applications and use cases. Many well-known implementations exercise opposite ends of the HBase’s capabilities, emphasizing either entity-centric schemas or event-based schemas. This talk will give you an overview of HBase and a series of archetypes based on a use-case survey of clusters conducted by Cloudera’s development, product and services teams. By analyzing the data from the nearly 20,000 HBase cluster nodes Cloudera has under management, we’ll categorize HBase users and their use cases into a few simple archetypes, describe workload patterns and quantify the usage of advanced features. We’ll also explain what an HBase user can do to alleviate pressure points from these fundamentally different workloads, and use these results to provide insight into what lies in HBase’s future.

SQL for NoSQL? Querying CSV files, JSON, HBase, MapR-DB, Hive, Mongo and more through one high performance, schema free query engine
Keys Botzum

Query any relational and non-relational datastore (well, almost…) Keys Botzum of MapR will provide an overview of the newest 1.0 member of the Apache Software Foundation, Apache Drill. Apache Drill is a schema free, full ANSI SQL query engine for NoSQL and Hadoop. Keys will take you through a deeper overview of Apache Drill and speak to Drill’s architecture, scale and ability to query Hbase, Hadoop and NoSQL files all at the same time.

Internet of Things: Managing Unstructured Data
Frank Catrine & Mike Mulligan

Sensor, machine-to-machine and network data are expected to play a larger role in analytics as the Internet of Things becomes a reality. However, these data types present significant challenges related to data volume, variety and predictive modeling. Blending operational data with data from IT systems helps deliver business critical analytics to end users.

Continuous Integration and Delivery of Containerized Databases
Greg Hoelzer & Michael Heldebrant

As you deploy horizontally scalable database workloads, how can you manage the patching and deployment of these cloud capable workloads? Utilizing CI tools such as Jenkins and building containers will enable you to run a container factory that builds databases that can run on any infrastructure: physical, virtual or public cloud.

View event page

View full schedule

#DataTechMN