Securing Azure Databricks Secrets with Akeyless

Posted by Wayne Zhu
March 25, 2025

In modern data engineering and machine learning (MLOps) workflows, secure and scalable secrets management is crucial. Azure Databricks, a powerful cloud-based analytics platform, offers native options for managing secrets, including secret scopes and integration with Azure Key Vault. However, organizations seeking a cloud-agnostic and centralized approach benefit from integrating Akeyless for secrets management.

Akeyless enables secure, seamless management of secrets across cloud environments, eliminating cloud-provider lock-in and simplifying secrets access in Azure Databricks environments. This blog post explores how Akeyless enhances security and operational efficiency for DataOps and MLOps workloads running in Azure Databricks.

Why Use Akeyless for Secrets in Azure Databricks?

1. Avoiding Secret Scattering and Cloud Lock-in

Native Databricks secret scopes are limited to the Databricks workspace, and Azure Key Vault ties secrets management to Microsoft’s cloud ecosystem.
Akeyless provides a centralized repository that supports Databricks while also allowing seamless portability between AWS and Azure without code modifications.

2. Comprehensive Language Support

Databricks supports multiple programming languages: Python, Scala, R, and SQL.
Akeyless provides a Python SDK, enabling secrets retrieval within Python notebooks. For Scala and R, secrets can be retrieved via a Python cell and then passed using Databricks utilities like dbutils or Spark configuration.

3. Enhanced Security via Azure Managed Identity

Azure Databricks uses a managed identity, removing the need for cross-account IAM roles.
Akeyless integrates with Azure AD, authenticating workloads via tenant and object IDs, ensuring secure access.

Implementing Akeyless in Azure Databricks

Step 1: Authenticate Azure Databricks with Akeyless

Set up Azure AD authentication in Akeyless using the Databricks managed identity assigned to a specific Databricks workspace in Azure. Alternatively, you can use an API key and secret for authentication.
Install the required dependencies in a Databricks Python notebook:

%pip install akeyless
%pip install akeyless_cloud_id
%restart_python

3. Generate an Azure Cloud ID to authenticate:

from akeyless_cloud_id import CloudId
import akeyless

configuration = akeyless.Configuration(
    host="https://api.akeyless.io"
)
api_client = akeyless.ApiClient(configuration)
api = akeyless.V2Api(api_client)

cloud_id_generator = CloudId()
cloud_id = cloud_id_generator.generateAzure()

access_id = 'p-64dkb0pjpal7zm'
token_body = akeyless.Auth(access_id=access_id, access_type='azure_ad', cloud_id=cloud_id)
res = api.auth(token_body)
token = res.token

Step 2: Retrieve and Use Secrets in Databricks

Create a Python cell to retrieve the secret and store it in a Spark configuration:

API_KEY = "your_secret_api_key"

spark.conf.set("api.key", API_KEY)

Pass the secret to Scala using Spark configuration:

val API_KEY = spark.conf.get("api.key")

println(s"API Key from Python: $API_KEY")

Retrieve the secret from Akeyless and store it in a Spark configuration:

data_gov_api_key = '/devops/data_gov_api_key'

secret_body = akeyless.GetSecretValue(names=[data_gov_api_key], token=token)

secret_response = api.get_secret_value(secret_body)

API_KEY = secret_response[data_gov_api_key]

spark.conf.set("api.key", API_KEY)

Use the secret in an API request and store the result in a Databricks table:

import requests
import pandas as pd
from io import StringIO

url = "https://health.data.ny.gov/api/views/jxy9-yhdk/rows.csv"
params = {"api_key": API_KEY, "per_page": 5}

response = requests.get(url, params=params)

if response.status_code == 200:
    pdf = pd.read_csv(StringIO(response.text))
    pdf.columns = pdf.columns.str.replace(" ", "_")
    pdf = pdf.fillna("")

    df = spark.createDataFrame(pdf)
    df.write.mode("overwrite").saveAsTable("default.baby_names_by_non_dlt")
else:
    print(f"❌ API Request Failed! Status {response.status_code}: {response.text}")

Step 3: Query the Stored Data

To validate the stored data, run the following SQL query in a Databricks notebook:

SELECT * FROM default.baby_names_by_non_dlt;

Conclusion

By integrating Akeyless with Azure Databricks, organizations gain a secure, cloud-agnostic, and scalable solution for managing secrets in DataOps and MLOps. workflows. This approach reduces cloud lock-in, enhances security via managed identities, and simplifies secret retrieval across different programming environments.

With the rapid expansion of AI-driven workloads, ensuring secure secrets management in Databricks is more critical than ever. Akeyless provides a future-proof solution that supports both current and emerging use cases.

Get Started with Akeyless

Interested in implementing Akeyless for your Azure Databricks environment? Contact us today to learn more!

Table of Contents