AWS Big Data Blog | Website Cyber Security ☁️

AWS Big Data Blog Official Big Data Blog of Amazon Web Services

Migrate from Apache Solr to Amazon OpenSearch Serverless
by Jon Handler on July 16, 2026 at 3:52 pm
In this post, you will learn why now is the time to take advantage of the ease of operations and native AI capabilities of OpenSearch Serverless, and migrate from Solr.
High-performance Remote Shuffle Service on Amazon EMR with Apache Celeborn
by Suvojit Dasgupta on July 15, 2026 at 3:56 pm
In this post, we show how Apache Celeborn resolves this trade-off for Amazon EMR on EKS and Amazon EMR on EC2, improving job reliability while unlocking additional cost savings.
Zero Copy access to Apache Iceberg tables in Amazon S3 from Salesforce Data 360 using the Iceberg REST endpoint from AWS Glue Data Catalog
by Avijit Goswami on July 15, 2026 at 3:51 pm
In this post, we demonstrate how AWS and Salesforce customers can access their enterprise data lakes on AWS from Salesforce Data 360 using zero-copy file federation.
Patch perfect: Automating Amazon Redshift patch testing
by Eva Donaldson on July 14, 2026 at 4:37 pm
In this post, we demonstrate an automated test suite that validates your Amazon Redshift cluster automatically after any patch, reboot, or modification. It uses standard drivers against real workload patterns to provide a verified gate between a patch landing and that patch reaching production.
Multi-cloud lakehouse architecture on AWS for Agentic AI, Part 1: Architecture and best practices
by Sakti Mishra on July 13, 2026 at 5:02 pm
This post focuses on explaining the architecture approach to build the open lakehouse architecture on AWS, unifying the metadata catalog across providers for the AI agents to access. In addition, it highlights the architecture trade-offs and best practices.
How Razorpay Built Real-Time Anomaly Detection with Amazon MSK
by Narendra Kumar on July 13, 2026 at 4:59 pm
In this post, we explore Razorpay’s anomaly detection and alerting platform (ADA) architecture using Amazon Managed Streaming for Apache Kafka (Amazon MSK) and other AWS services. According to Razorpay the system detects transaction anomalies in under 30 seconds, supports thousands of merchant-level alerts, and reduced monitoring costs by approximately 80 percent. The platform maintains 99.99 percent uptime for over 500 million transactions per month.
Cut costs and simplify operations with writable warm storage in Amazon OpenSearch Service
by Bharav Patel on July 8, 2026 at 3:52 pm
In this post, I show you how writable warm storage removes the costly migration cycle. You can reduce your infrastructure costs by up to 48 percent and update historical data in seconds instead of hours. I walk through a real-world cost comparison and performance benchmarks, and help you decide when to use writable warm versus UltraWarm.
Introducing Apache Spark Connect support in AWS Glue interactive sessions
by Zach Mitchell on July 7, 2026 at 4:38 pm
Apache Spark Connect bridges the gap between these two worlds: you develop in local Python, but execute on AWS Glue against actual data. Today, AWS Glue interactive sessions support Spark Connect natively. You can connect from any environment that supports the PySpark remote() API, including VS Code, PyCharm, Amazon SageMaker Unified Studio notebooks, and standalone Python applications. You don’t need to install specialized kernels or manage cluster infrastructure.
How BigBasket uses the Iceberg based lakehouse architecture on AWS to power lightning-fast grocery delivery across India
by Annie Mattoo on July 6, 2026 at 4:50 pm
In this post, we demonstrate how BigBasket implemented the lakehouse architecture on AWS, including their architecture decisions, implementation approach, and the measurable business results you can expect from a similar modernization. Whether you’re facing scalability challenges or planning your own lakehouse implementation, this blueprint provides actionable insights you can adapt for your organization.
Accelerating log analytics at scale with AWS Glue and Apache Iceberg materialized views
by Shinu Tharol on July 2, 2026 at 5:46 pm
In this post, you learn how to build an application log pipeline for production use with Amazon CloudWatch Logs, AWS Lambda, Amazon Data Firehose, AWS Glue, and Apache Iceberg materialized tables. You then use materialized views to accelerate query performance. This solution helps you achieve faster query response times on large-scale log data without requiring you to manage continuous data lake refresh.
Serverless analytics pipelines using the Apache Spark engine in Amazon Athena
by Avichay Marciano on July 2, 2026 at 4:27 pm
This post shows how developers, data engineers, and analysts can connect to a secure Spark Connect endpoint in Athena with Apache Spark. You can use your preferred tools, such as Jupyter notebooks, VS Code, or dbt with Apache Airflow, without managing cluster lifecycle or scaling.
Deploy modern data platforms in minutes with MDAA
by Sudeshna Dash on July 2, 2026 at 4:26 pm
In this post, we explore how MDAA transforms data architecture development from months of manual coding to production-ready deployment through configuration-driven infrastructure and embedded governance, examine a real customer transformation, and provide a clear implementation pathway for your own data modernization journey.
Amazon Redshift RG: Faster and lower cost, Graviton-powered
by Stefan Gromoll on July 2, 2026 at 4:21 pm
In this post, we describe the innovations that make RG instances so much faster. We also share benchmark results showing that RG delivers up to 4.2x better price-performance than other leading data warehouses.
$Run log analytics for a fraction of the cost with the new engine for Amazon OpenSearch Service$
Run log analytics for a fraction of the cost with the new engine for Amazon OpenSearch Service
by Jagadish Kumar on July 1, 2026 at 8:16 pm
We’re introducing a purpose-built log analytics engine for Amazon OpenSearch Service. This new engine delivers up to 4x price performance, 2x faster data ingestion, up to 2x faster analytical queries, and up to 70 percent lower storage costs. You get all of this without sacrificing search capabilities on the same data. In this post, you learn how to take advantage of these benefits, see how to get started, and review benchmark results at billion-document scale.
AI-powered performance recommendations for Amazon Redshift
by Steve Phillips on July 1, 2026 at 6:39 pm
In this post, you learn how to build an AI-powered solution that collects the telemetry, pre-computes performance signals, correlates them with CloudWatch, and uses Amazon Bedrock to generate prioritized recommendations.
Scale analytics with Amazon Redshift multi-warehouse enhancements
by Raza Hafeez on June 29, 2026 at 7:59 pm
In this post, we introduce new capabilities of Amazon Redshift that enhance our multi-warehouse and scaling capabilities: remote materialized view (MV) operations, remote table DDL support, and concurrency scaling enhancements for zero-ETL and S3 event integration. These features help you build more scalable, performant decentralized analytics architectures on Amazon Redshift.
Amazon Redshift delivers faster performance for BI dashboards and real-time analytics
by Stefan Gromoll on June 29, 2026 at 5:05 pm
Today, we’re excited to announce a new performance optimization in Amazon Redshift that improves the response times of low-latency SQL queries, such as those used in real-time analytics applications or generated by BI dashboards. With this enhancement, you can experience improved query latencies because of a reduction in the time Amazon Redshift spends preparing SQL queries for execution. SQL queries start faster, so they return results quicker.
Optimize your Tableau integration with Amazon Redshift Serverless
by Nidhi Nayak on June 29, 2026 at 5:00 pm
In this post, we provide a guide to help you use Tableau’s Relationships and Amazon Redshift Serverless architecture to deliver sub-second insights while maximizing every Redshift Processing Unit (RPU). We also provide guidance on five key areas: data model architecture for optimal query performance, security configuration and access control, performance optimization through smart configuration, cost management strategies, and query and join optimization techniques.
Implement multi-tenant search with Amazon OpenSearch Serverless next generation
by Jon Handler on June 24, 2026 at 6:31 pm
In this post, we show how the next-generation OpenSearch Serverless architecture makes the collection-per-tenant model practical for multi-tenant search.
Multi-Region identity-based access to Amazon Redshift and S3 Tables
by Maneesh Sharma on June 24, 2026 at 6:15 pm
In Part 1 of this series, we showed how to simplify enterprise data access using the Amazon Redshift integration with Amazon S3 Access Grants. In this post, we extend that solution across AWS Regions. We introduce a fictional company, AnyCompany Global, to illustrate how organizations with global operations can use AWS IAM Identity Center Multi-Region to set up consistent, identity-based access to Amazon Redshift and Amazon S3 Tables across Regions.