AWS Big Data Blog Official Big Data Blog of Amazon Web Services
- Getting started with Apache Iceberg write support in Amazon Redshift – Part 2by Sanket Hase on April 15, 2026 at 9:29 pm
Amazon Redshift now supports DELETE, UPDATE, and MERGE operations for Apache Iceberg tables stored in Amazon S3 and Amazon S3 table buckets. With these operations, you can modify data at the row level, implement upsert patterns, and manage the data lifecycle while maintaining transactional consistency using familiar SQL syntax. You can run complex transformations in Amazon Redshift and write results to Apache Iceberg tables that other analytics engines like Amazon EMR or Amazon Athena can immediately query. In this post, you work with datasets to demonstrate these capabilities in a data synchronization scenario.
- Get to insights faster using Notebooks in Amazon SageMaker Unified Studioby Praveen Kumar on April 15, 2026 at 9:24 pm
In this post, we demonstrate how Notebooks in Amazon SageMaker Unified Studio help you get to insights faster by simplifying infrastructure configuration. You’ll see how to analyze housing price data, create scalable data tables, run distributed profiling, and train machine learning (ML) models within a single notebook environment.
- How to use Parquet Column Indexes with Amazon Athenaby Matt Wong on April 13, 2026 at 3:57 pm
In this blog post, we use Athena and Amazon SageMaker Unified Studio to explore Parquet Column Indexes and demonstrate how they can improve Iceberg query performance. We explain what Parquet Column Indexes are, demonstrate their performance benefits, and show you how to use them in your applications.
- Implementing Kerberos authentication for Apache Spark jobs on Amazon EMR on EKS to access a Kerberos-enabled Hive Metastoreby Krishna Kumar Venkateswaran on April 13, 2026 at 3:51 pm
In this post, we show how to configure Kerberos authentication for Spark jobs on Amazon EMR on EKS, authenticating against a Kerberos-enabled HMS so you can run both Amazon EMR on EC2 and Amazon EMR on EKS workloads against a single, secure HMS deployment.
- Introducing Amazon MSK Express Broker power for Kiroby Stephan Schiller on April 9, 2026 at 2:40 pm
In this post, we’ll show you how to use Kiro powers, a new capability that equips Kiro with contextual knowledge and tooling. You can simplify your MSK cluster management, from initial setup to diagnosing common issues, all through natural language conversations.
- Introducing workload simulation workbench for Amazon MSK Express brokerby Manu Mishra on April 7, 2026 at 4:49 pm
In this post, we introduce the workload simulation workbench for Amazon Managed Streaming for Apache Kafka (Amazon MSK) Express Broker. The simulation workbench is a tool that you can use to safely validate your streaming configurations through realistic testing scenarios.
- Proactive monitoring for Amazon Redshift Serverless using AWS Lambda and Slack alertsby Cristian Restrepo Lopez on April 7, 2026 at 4:27 pm
In this post, we show you how to build a serverless, low-cost monitoring solution for Amazon Redshift Serverless that proactively detects performance anomalies and sends actionable alerts directly to your selected Slack channels.
- Modernize business intelligence workloads using Amazon Quickby Satesh Sonti on April 6, 2026 at 5:56 pm
In this post, we provide implementation guidance for building integrated analytics solutions that combine the generative BI features of Amazon Quick with Amazon Redshift and Amazon Athena SQL analytics capabilities.
- Agentic AI for observability and troubleshooting with Amazon OpenSearch Serviceby Muthu Pitchaimani on April 2, 2026 at 9:44 pm
Now, Amazon OpenSearch Service brings three new agentic AI features to OpenSearch UI. In this post, we show how these capabilities work together to help engineers go from alert to root cause in minutes. We also walk through a sample scenario where the Investigation Agent automatically correlates data across multiple indices to surface a root cause hypothesis.
- Streamline Apache Kafka topic management with Amazon MSKby Swapna Bandla on April 2, 2026 at 3:32 pm
In this post, we show you how to use the new topic management capabilities of Amazon MSK to streamline your Apache Kafka operations. We demonstrate how to manage topics through the console, control access with AWS Identity and Access Management (IAM), and bring topic provisioning into your continuous integration and continuous delivery (CI/CD) pipelines.
- How to set up a network-isolated VPC for Amazon SageMaker Unified Studioby Rohit Vashishtha on April 2, 2026 at 3:28 pm
In this post, we explore scenarios where customers need more control over their network infrastructure when building their unified data and analytics strategic layer. We’ll show how you can bring your own Amazon Virtual Private Cloud (Amazon VPC) and set up Amazon SageMaker Unified Studio for strict network control.
- Navigating multi-account deployments in Amazon SageMaker Unified Studio: a governance-first approachby Ben Shafabakhsh on April 1, 2026 at 7:12 pm
In this post, we explore SageMaker Unified Studio multi-account deployments in depth: what they entail, why they matter, and how to implement them effectively. We examine architecture patterns, evaluate trade-offs across security boundaries, operational overhead, and team autonomy. We also provide practical guidance to help you design a deployment that balances centralized control with distributed ownership across your organization.
- Improve the discoverability of your unstructured data in Amazon SageMaker Catalog using generative AIby Nishchai JM on April 1, 2026 at 7:09 pm
This is a two-part series post. In the first part, we walk you through how to set up the automated processing for unstructured documents, extract and enrich metadata using AI, and make your data discoverable through SageMaker Catalog. The second part is currently in the works and will show you how to discover and access the enriched unstructured data assets as a data consumer. By the end of this post, you will understand how to combine Amazon Textract and Anthropic Claude through Amazon Bedrock to extract key business terms and enrich metadata using Amazon SageMaker Catalog to transform unstructured data into a governed, discoverable asset.
- Automated tag-based DAG permission management in Amazon MWAAby Amey Ramakant Mhadgut on March 31, 2026 at 3:57 pm
In this post, we show you how to use Apache Airflow tags to systematically manage DAG permissions, reducing operational burden while maintaining robust security controls that complement infrastructure-level security measures.
- Secure multi-warehouse Amazon Redshift access behind a Network Load Balancer using Microsoft Entra IDby Raghu Kuppala on March 30, 2026 at 5:46 pm
In this post, we show you how to configure a native identity provider (IdP) federation for Amazon Redshift Serverless using Network Load Balancer. You will learn how to enable secure connections from tools like DBeaver and Power BI while maintaining your enterprise security standards.
- Securely connect Kafka client applications to your Amazon MSK Serverless cluster from different VPCs and AWS accountsby Subham Rakshit on March 30, 2026 at 5:44 pm
In this post, we show you how Kafka clients can use Zilla Plus to securely access your MSK Serverless clusters through Identity and Access Management (IAM) authentication over PrivateLink, from as many different AWS accounts or VPCs as needed. We also show you how the solution provides a way to support a custom domain name for your MSK Serverless cluster.
- Build AWS Glue Data Quality pipeline using Terraformby Viquar Khan on March 26, 2026 at 3:35 pm
AWS Glue Data Quality is a feature of AWS Glue that helps maintain trust in your data and support better decision-making and analytics across your organization. You can use Terraform to deploy AWS Glue Data Quality pipelines. Using Terraform to deploy AWS Glue Data Quality pipeline enables IaC best practices to ensure consistent, version controlled and repeatable deployments across multiple environments, while fostering collaboration and reducing errors due to manual configuration. In this post, we explore two complementary methods for implementing AWS Glue Data Quality using Terraform.
- Automating data classification in Amazon SageMaker Catalog using an AI agentby Ramesh H Singh on March 24, 2026 at 9:44 pm
If you’re struggling with manual data classification in your organization, the new Amazon SageMaker Catalog AI agent can automate this process for you. Most large organizations face challenges with the manual tagging of data assets, which doesn’t scale and is unreliable. In some cases, business terms aren’t applied consistently across teams. Different groups name and tag data assets based on local conventions. This creates a fragmented catalog where discovery becomes unreliable and governance teams spend more time normalizing metadata than governing. In this post, we show you how to implement this automated classification to help reduce the manual tagging effort and improve metadata consistency across your organization.
- Designing centralized and distributed network connectivity patterns for Amazon OpenSearch Serverless – Part 2by Ankush Goyal on March 24, 2026 at 9:42 pm
(Continued from Part 1) In this post, we show how you can give on-premises clients and spoke account resources private access to OpenSearch Serverless collections distributed across multiple business unit accounts.
- Designing centralized and distributed network connectivity patterns for Amazon OpenSearch Serverless – Part 1by Ankush Goyal on March 24, 2026 at 9:42 pm
In this post, we show how organizations can provide secure, private access to multiple Amazon OpenSearch Serverless collections from both on-premises environments and distributed AWS accounts using a single centralized interface VPC endpoint and Route 53 Profiles.

























