Spark Streaming Alternatives
When researching Spark alternatives it really depends on your use case. Are you processing streaming data or batch data? Do you prefer an open or closed source/proprietary alternative? Do you need SQL support?
With that in mind let’s look at ten closed-source alternatives to Spark Streaming first:
- Amazon Kinesis – Collect, process, and analyze real-time, streaming data such as video, audio, application logs, website clickstreams, and IoT telemetry. See also Amazon Managed Streaming for Apache Kafka (Amazon MSK).
- Google Cloud Dataflow – a fully-managed service for transforming and enriching streaming and batch data.
- Confluent – The leading streaming data platform. Built on Apache Kafka.
- Aiven for Apache Kafka – A fully managed streaming platform, deployable in the cloud of your choice. Also
- IBM Event Streams – A high-throughput, fault-tolerant, event streaming platform. Built on Kafka.
- Striim – a streaming data integration and operational intelligence platform designed to enable continuous query and processing and streaming analytics.
- Spring Cloud Data Flow – Tools to create complex topologies for streaming and batch data pipelines. Features graphical stream visualizations
- Lenses – The data streaming platform that simplifies your streams with Kafka and Kubernetes.
- StreamSets – Brings continuous data to every part of your business, delivering speed, flexibility, resilience and reliability to analytics.
- Solace – A complete event streaming and management platform for the real-time enterprise.
And here are five open source alternatives to Spark Streaming:
- Apache Flink – considered one of the best Apache Spark alternatives, Apache Flink is an open source platform for stream as well as the batch processing at scale. It provides a fault tolerant operator based model for streaming and computation rather than the micro-batch model of Apache Spark.
- Apache Beam – a workflow manager for batch and streaming data processing jobs that run on any execution engine. It executes pipelines on multiple execution environments.
- Apache Apex – Enterprise-grade unified stream and batch processing engine.
- Apache Samza – A distributed stream processing framework
- Apache Storm – distributed realtime computation system
So there you have it. Hopefully you can now find a suitable alternative to Spark streaming. Learn more about Spark SQL vs Presto in our comparison article, or learn about using the invoking the Spark engine from Presto.