Spark Streaming Alternatives

When researching Spark alternatives it really depends on your use case. Are you processing streaming data or batch data? Do you prefer an open or closed source/proprietary alternative?  Do you need SQL support?

spark streaming logo

With that in mind let’s look at ten closed-source alternatives to Spark Streaming first:

  1. Amazon Kinesis – Collect, process, and analyze real-time, streaming data such as video, audio, application logs, website clickstreams, and IoT telemetry. See also Amazon Managed Streaming for Apache Kafka (Amazon MSK).
  2. Google Cloud Dataflow – a fully-managed service for transforming and enriching streaming and batch data.
  3. Confluent – The leading streaming data platform. Built on Apache Kafka. 
  4. Aiven for Apache Kafka – A fully managed streaming platform, deployable in the cloud of your choice. Also 
  5. IBM Event Streams – A high-throughput, fault-tolerant, event streaming platform. Built on Kafka.
  6. Striim – a streaming data integration and operational intelligence platform designed to enable continuous query and processing and streaming analytics.
  7. Spring Cloud Data Flow – Tools to create complex topologies for streaming and batch data pipelines.  Features graphical stream visualizations
  8. Lenses – The data streaming platform that simplifies your streams with Kafka and Kubernetes.
  9. StreamSets – Brings continuous data to every part of your business, delivering speed, flexibility, resilience and reliability to analytics.
  10. Solace – A complete event streaming and management platform for the real-time enterprise. 

Here are five open source alternatives to Spark Streaming

  • Apache Flink
  • Apache Apex
  • Apache Beam
  • Apache Samza
  • Apache Storm

Details about each alternative:

  1. Apache Flink – considered one of the best Apache Spark alternatives, Apache Flink is an open source platform for stream as well as the batch processing at scale. It provides a fault tolerant operator based model for streaming and computation rather than the micro-batch model of Apache Spark.
  2. Apache Beam – a workflow manager for batch and streaming data processing jobs that run on any execution engine. It executes pipelines on multiple execution environments.
  3. Apache Apex – Enterprise-grade unified stream and batch processing engine.
  4. Apache Samza – A distributed stream processing framework
  5. Apache Storm – distributed realtime computation system 

So there you have it. Hopefully you can now find a suitable alternative to Spark streaming. Learn more about Spark SQL vs Presto in our comparison article, or learn about using the invoking the Spark engine from Presto.