The moment AI meets the cloud demands a fundamental reassessment. At AWS re:Invent 2025, the keynote began by posing a critical question:
“What is this AI transformation going to mean for the cloud?“
Peter DeSantis, Senior Vice President of Utility Computing at AWS, steered the audience toward the AWS cloud’s core guiding attributes. He spoke of the foundational principles behind every technical decision and service at AWS. Desantis underscored their importance, stating:
“We think about these things a lot. And we make big, deep, long-term investments to support these attributes.”

Peter explained these core attributes in context of AWS and AI transformations.
- Security: Essential due to the increased attack surface from AI tools.
- Performance: Critical for the scale and speed demanded by modern AI applications.
- Elasticity: Providing the same capacity planning freedom for volatile AI workloads.
- Cost: Addressing the high expense of building and running AI.
- Agility: The ability to “launch, optimize, and pivot quickly.
He emphasized that while AI transformation is reshaping nearly every aspect of our lives, the fundamental attributes defining AWS remain unchanged. Only the techniques and innovations required to deliver them will evolve.
The Big Reveal: AWS Graviton5 Processors
Before Peter discussed the future of AI for the cloud, he wanted to take us back to the foundation that makes any of this possible. The year was 2010, and the demanding workloads on Amazon EC2 suffered from outlier latencies, or “jitter” caused by the virtualization layer.
He explained that running demanding workloads meant tackling jitter, which are occasional microsecond delays from virtualization that disrupted applications. Virtualization was essential for Amazon EC2, but conventional wisdom said it could never match bare metal performance.
The breakthrough came with the AWS Nitro System, which is a custom silicon that moves virtualization off the server onto dedicated hardware. The result was eliminated jitter, better than bare metal performance, stronger security, and support for more instance types. Peter used the Nitro System as the perfect example of AWS’s deep, multi-year investment strategy.
15 years later, Nitro continues to live as a core part of AWS’s infrastructure. AWS Nitro’s deep investment set the stage for something even bigger. This foundation led to the advent of Graviton processors and Trainium AI chips.
Dave Brown took the stage and recalled the genesis of Graviton.
“If custom silicon could optimize networking and storage, why not compute?”
The answer was the original Graviton processor, built from the ground up to deliver higher performance and lower cost for cloud workloads. Years of continuous, customer-driven performance improvements followed its release, culminating in Dave’s announcement of AWS Graviton5. It’s not surprising that this is AWS’s most powerful CPU ever!

Key Features
- Delivers up to 25% better compute performance.
- Features 192 CPU cores per chip in a single package.
- Inter-core communication latency is up to 33% lower.
- L3 cache is 5x larger than Graviton4, reducing data delays.
- Network bandwidth is up to 15% higher.
- Amazon EBS bandwidth is up to 20% higher on average across instance sizes.
- Up to 25% better performance than Graviton4-based M8g instances
- Ideal for application servers, microservices, gaming servers, midsize data stores, and caching fleets
- Built on AWS Nitro System with dedicated hardware and a lightweight hypervisor
- Provides isolated multitenancy, private networking, and fast local storage
Amazon EC2 M9g Instances
M9g instances are powered by Graviton5 and deliver significant improvements over M8g. They offer the best price and performance in Amazon EC2 to date.

Key performance gains compared to M8g include:
- Up to 25% better compute performance.
- Higher networking and Amazon EBS bandwidth.
- Databases are up to 30% faster.
- Up to 35% faster for web applications and machine learning.
AWS Lambda Managed Instances: Serverless + Amazon EC2
Going down memory lane back to 2013, Dave Brown recalled a thought that led to a gigantic breakthrough in the AWS world:
“What if a developer could just hand their code to AWS and have it run? No servers, no provisioning, no capacity.”
To realize this vision, the Amazon S3 team attached compute directly to the storage layer, allowing a small function to execute the moment a new object arrived. This core idea evolved into AWS Lambda in 2014.
The Evolution of Serverless: AWS Lambda
With AWS Lambda, developers provide code while AWS handles execution, scaling, and availability. Customers would now only pay for the compute they actually use. While AWS Lambda grew rapidly, increasing customer demand eventually led to another critical question posed by Dave Brown:
“Could we preserve everything that customers love about serverless? Serverless. The simplicity, the automatic scaling, the operational model while giving them the performance choice of EC2.”
To address this requirement, AWS merged Lambda and EC2 teams which led to the creation of AWS Lambda Managed Instances. The goal? To provide the performance choice of Amazon EC2 without sacrificing serverless simplicity.

Key Features
- AWS Lambda functions run on Amazon EC2 instances inside your account.
- You choose the instance type and hardware.
- AWS Lambda manages provisioning, patching, availability, and scaling.
- Opened entirely new use cases for AWS Lambda, including:
- Video and media processing
- Machine learning pre-processing
- High-volume data and analytics pipelines
- Any workload that historically needed dedicated Amazon EC2 performance
However, Dave stressed that AWS Lambda Managed Instances isn’t a departure from serverless. It is only expanding the existing model. Developers get precise performance control of Amazon EC2 while keeping the event-driven Lambda experience they already love. With this new update, builders never have to choose between convenience and capability.

Project Mantle: A New Inference Engine for Amazon Bedrock
“Why is Inference such a hard problem?”
Dave addressed this challenge, pointing out that inference is central to AI yet behaves differently from traditional compute optimized over the past two decades. In particular, cloud elasticity doesn’t automatically apply to these AI workloads.
To solve this, AWS is focused on delivering the same flexibility and efficiency customers expect to ensure resilience against instant demand spikes.
Inference is a four-stage pipeline explained below. Each stage stresses the system in different ways.
- Tokenization breaks the prompts into tokens the model can understand.
- Prefill processes the full prompt and builds the Key-Value (KV) Cache for the subsequent Decode stage.
- Decode generates the response, creating one token at a time, guided by that KV Cache.
- Detokenization converts the output tokens back into human-understandable language, leading to a final response.
So, what seems to be the problem with inference? Dave explained that each stage stresses different parts of the system. The requests vary widely in size, urgency, and resource needs. Scaling all of this globally across many models is a very different challenge.
To combat this issue, AWS has introduced Project Mantle. Think of it as a new inference engine powering many Amazon Bedrock models, designed to run real customer workloads at massive scale efficiently and reliably.

Key Features
- Service Tiers for Prioritization: AWS has launched Amazon Bedrock service tiers so that the customers can now assign inference requests to three lanes:
- Priority: Real-time, latency-sensitive tasks
- Standard: Steady, predictable workloads
- Fairness & Predictable Performance: Each customer gets their own queue, so spikes from other users don’t impact performance. Amazon Bedrock also learns usage patterns to pre-allocate capacity.
- New Durable Journal for Reliability: AWS has also added “Journal” to Amazon Bedrock. Through this new feature, long-running requests are continuously tracked. If a failure occurs, jobs resume exactly where they are left. This helps reduce wasted compute and improves fault tolerance.
- Efficient Fine-Tuning: Fine-tuning jobs now pause and resume automatically during traffic spikes, eliminating the need for separate training clusters.
- Enhanced Security: Amazon Bedrock integrates confidential computing to encrypt data and model weights during inference, giving customers cryptographic proof that computations run in a trusted environment.
- Deep AWS Integrations: Includes AWS Lambda tool calls, OpenAI Responses, API support, and integration with AWS IAM and Amazon CloudWatch for enterprise-grade observability.
Amazon Bedrock is now a fully managed, production-ready Generative AI platform that combines fairness, performance, fault tolerance, and security. It allows developers to focus on building without worrying about infrastructure or operational overhead.
Amazon Nova Multimodal Embeddings
AWS drives innovation by embedding new capabilities such as Vector Search, directly into its services. Peter emphasized the core problem. For example, a company’s institutional knowledge is locked up in unstructured data like videos and documents. Traditional systems cannot handle this because the vectors created by specialized AI models cannot be understood or searched together.
This is precisely the challenge AWS wanted to handle with the new Amazon Nova Multimodal Embeddings model.

Key Features
- This state-of-the-art embedding model supports text, documents, images, video, and audio.
- By converting every modality into a shared vector space, the model delivers a unified understanding of your data.
The Amazon Nova Multimodal Embeddings model does all with industry-leading precision. A powerful Embedding Model is just the beginning. The real magic happens when vector search is available on all your data, wherever it lives.
AWS is integrating vector capabilities across its services. There have been notable transformations in Amazon OpenSearch Service and Amazon S3 Vectors have been introduced for massive-scale AI search.
- Amazon OpenSearch Service: This service has evolved into a vector-driven intelligence engine. It now features Hybrid Search, which eliminates the trade-off between traditional keyword search and semantic search.
- Amazon S3 Vectors: Amazon S3 Vectors bring the cost, structure, and massive scale of Amazon S3 directly to vector storage. It offers the simplicity of creating a vector index like an Amazon S3 bucket. There is no provisioning, and you may leverage the pay-as-you-use model.
AWS Trainium3: Redefining AI Infrastructure
In the final segment, Peter brought attention to another of AWS’s deep infrastructure investments supporting AI. He stated the core challenge that led to its development:
“AI workloads are growing explosively and running these workloads is expensive, power-hungry, and capacity-constrained workloads.”

Luckily, AWS Trainium3 is optimized for all imaginable AI workloads from dense models to Mixture Experts (MoE). Text, image, and video modalities are all fully supported.
Key Features
1. Cost and Performance Breakthroughs
A closer look at what makes Trainium3 so powerful:
- Cost Savings: AWS Trainium3 delivers up to 40% lower costs for even the most demanding AI workloads as compared to previous generations, leading to massive savings.
- Performance Metrics: The chip offers 5x higher output tokens per megawatt (tokens/MW) at the same latency compared to AWS Trainium2, demonstrating superior efficiency.
2. Amazon EC2 Trainium3 UltraServer Architecture
The Trainium3 UltraServer is a single AI supercomputer composed of:
- Scale: 144 Trainium3 chips across two racks.
- Performance: 360 PetaFLOPS of FP8 compute and 20TB of High Bandwidth Memory (HBM). This is 4.4x higher compute and 3.9x more bandwidth than the previous generation.
- Interconnect: Custom Neuron Switches provide full bisectional bandwidth at extremely low latencies.
- Integrated Design: The server sleds combine Trainium3, Graviton, and Nitro chips.
3. Developer Tools and Optimization for AWS Trainium3
AWS is making AWS Trainium3 easy to use and deeply optimizable, here’s how:
- Ease of Use: AWS Trainium3 is becoming PyTorch native, allowing developers to run models by simply changing one line of code.

- Deep Optimization:
- Neuron Kernel Interface (NKI): NKI is a programming language providing deep performance engineers with direct, instruction-level access to Trainium’s hardware within Python. It allows for custom kernel creation and fine-grained control over memory. AWS is open-sourcing the stack for easier optimization.
- Neuron Profiler: It is a specialized tool designed to give AI developers deep insight into what their code is doing on AWS Trainium chips. Some standout features include:
1. Trainium includes dedicated hardware for profiling.
2. Profiling occurs without impacting the performance of the actual AI workload.
3. The profiler provides precise, instruction-level details for optimization. - Neuron Explorer: AWS has taken the Neuron Profiler’s detailed data a step further with Neuron Explorer. This tool takes all that complex, low-level profiling data and presents it in an easy-to-understand interface. The Neuron Explorer automatically detects bottlenecks and suggests optimizations. By doing so, it saves developers significant time in troubleshooting.
- Neuron Kernel Interface (NKI): NKI is a programming language providing deep performance engineers with direct, instruction-level access to Trainium’s hardware within Python. It allows for custom kernel creation and fine-grained control over memory. AWS is open-sourcing the stack for easier optimization.
What Will You Build Next?
The keynote delivered a clear, compelling message. The future of AI transformation is rooted in the core attributes of the AWS cloud. Every major announcement served the single purpose of removing customer constraints and empowering them to master AI adoption.
As Peter DeSantis reminded the audience that the journey is just beginning:
“It’s still day one with AI. AWS will be here, just like we’ve been for the past 20 years. Removing constraints, providing building blocks, and helping you navigate whatever’s next.”
This is where Cloudelligent accelerates your progress. We master the AWS building blocks and translate raw capability into engineered business outcomes. Our experts help you move beyond “Can I build this?” to rapidly deploy the “What shall I build next?” solutions that deliver a competitive edge in the evolving AI landscape.
Ready to realize your AI vision and turn it into reality? Book a FREE AI/ML assessment with us today.




