Do I have to use AWS Lambda to connect to data sources with Athena?

The Athena Federated Query Journey

AWS announced the public preview of Athena federated query in November 2019 and moved the service to General Availability (GA) in November 2020. As of this writing, Athena supports two versions of Athena – engine 1 based on Presto 0.172, and engine 2 based on Presto 0.217. Among other features, the federated query functionality is only supported on engine 2 of Athena. 

What connectors are supported?

Athena currently supports a number of data sources connectors

  • Amazon Athena CloudWatch Connector
  • Amazon Athena CloudWatch Metrics Connector
  • Athena AWS CMDB Connector
  • Amazon Athena DocumentDB Connector
  • Amazon Athena DynamoDB Connector
  • Amazon Athena Elasticsearch Connector
  • Amazon Athena HBase Connector
  • Amazon Athena Connector for JDBC-Compliant Data Sources (PostgreSQL, MySQL, and Amazon Redshift)
  • Amazon Athena Neptune Connector
  • Amazon Athena Redis Connector
  • Amazon Athena Timestream Connector
  • Amazon Athena TPC Benchmark DS (TPC-DS) Connector

In line with the serverless model of AWS Athena, in order to maintain an impedance match between Athena and the Connectors, the connectors are also Serverless based on AWS Lambda and run within the region. These connectors are open-sourced under the Apache 2.0 license and are available on GitHub or can be accessed from the AWS Serverless Application Repository. Third-party connectors are also available in the serverless application repository.

Athena is based on Presto, Can I use the open-source Presto data source connectors with Athena?

Presto has an impressive set of connectors right out of the box, these connectors however cannot be used as-is with Athena. The Presto service provider interface (SPI) required by the Presto connectors is different from AWS Athena’s Lambda-based implementation which is based on the Athena Query Federation SDK.

You can use the Athena Query Federation SDK to write your own connector using Lamba or to customize one of the prebuilt connectors that Amazon Athena provides and maintains. The Athena connectors use Apache Arrow format for returning data requested in the query. An example Athena connector can be found here.

If you’re looking for a managed service approach for PrestoDB, Ahana Cloud offers that based on open source Presto. Ahana’s managed deployment is 100% compatible with the open source Presto connectors, and you can get started with a click of a button.

Check out the case study from ad tech company Carbon on why they moved from AWS Athena to Ahana Cloud for better query performance and more control over their deployment.