Rohit Bhardwaj

Director of Architecture, Expert in cloud-native solutions

Rohit Bhardwaj is a Director of Architecture working at Salesforce. Rohit has extensive experience architecting multi-tenant cloud-native solutions in Resilient Microservices Service-Oriented architectures using AWS Stack. In addition, Rohit has a proven ability in designing solutions and executing and delivering transformational programs that reduce costs and increase efficiencies.

As a trusted advisor, leader, and collaborator, Rohit applies problem resolution, analytical, and operational skills to all initiatives and develops strategic requirements and solution analysis through all stages of the project life cycle and product readiness to execution.
Rohit excels in designing scalable cloud microservice architectures using Spring Boot and Netflix OSS technologies using AWS and Google clouds. As a Security Ninja, Rohit looks for ways to resolve application security vulnerabilities using ethical hacking and threat modeling. Rohit is excited about architecting cloud technologies using Dockers, REDIS, NGINX, RightScale, RabbitMQ, Apigee, Azul Zing, Actuate BIRT reporting, Chef, Splunk, Rest-Assured, SoapUI, Dynatrace, and EnterpriseDB. In addition, Rohit has developed lambda architecture solutions using Apache Spark, Cassandra, and Camel for real-time analytics and integration projects.

Rohit has done MBA from Babson College in Corporate Entrepreneurship, Masters in Computer Science from Boston University and Harvard University. Rohit is a regular speaker at No Fluff Just Stuff, UberConf, RichWeb, GIDS, and other international conferences.

Rohit loves to connect on http://www.productivecloudinnovation.com.
http://linkedin.com/in/rohit-bhardwaj-cloud or using Twitter at rbhardwaj1.

Presentations

AI agents don’t behave like humans. A single prompt can trigger thousands of parallel API calls, retries, and tool chains—creating bursty load, cache-miss storms, and runaway costs. This talk unpacks how to design and operate APIs that stay fast, reliable, and affordable under AI workloads. We’ll cover agent-aware rate limiting, backpressure & load shedding, deterministic-result caching, idempotency & deduplication, async/event-driven patterns, and autoscaling without bill shock. You’ll learn how to tag and trace agent traffic, set SLOs that survive tail latency, and build graceful-degradation playbooks that keep experiences usable when the graph goes wild.

Why scaling is different with AI

  • Bursty, spiky traffic from tool-chaining and agent loops
  • High fan-out per request → N downstream calls per prompt
  • Non-stationary patterns (time-of-day + product launches + model changes)
  • Cost correlates with requests × context × retries, not just QPS

Failure modes to expect (and design for)

  • Cache-miss storms after deploy/flush; thundering herds on hot keys
  • Retry amplification (agents + gateways + SDKs all retry)
  • Unbounded concurrency → DB saturation, queue buildup, 99.9th pct tail spikes
  • “Version drift” between agents and APIs → malformed or expensive calls

Traffic control & fairness

  • Multi-dimensional rate limits: per-tenant, per-agent, per-tool, per-chain
  • Budget-aware throttling: cap by token/$ budget, not just requests
  • Adaptive backpressure: shed or downgrade when saturation signals trip
  • Fair queuing: prevent “noisy” agents from starving others

Resilience patterns

  • Idempotency keys + deduplication for writes & retries
  • Circuit breakers & bulkheads around fragile dependencies
  • Timeouts with jitter + bounded retries (server hints for clients)
  • Graceful degradation: return partials, cached/stale, queued-async receipts

Caching that actually works for AI

  • Deterministic-result caching (prompt+params hash)
  • Shard & tier caches (memory → Redis → CDN/edge) + TTL tuned to freshness
  • Negative caching to suppress repeated failures
  • Stale-while-revalidate to tame cache-miss storms

Async & event-driven designs

  • Queue first for heavy/long-running tasks (workflows > request/response)
  • Outbox/Saga patterns for consistency across services
  • Streaming APIs for incremental results; webhooks/callbacks for completion
  • Backlogs with priorities (gold/silver/bronze) and dead-letter policies

Autoscaling without bill shock

  • Pick the right compute: provisioned concurrency for cold-start-sensitive paths; on-demand for bursty tools
  • KEDA/HPA on meaningful signals (RPS, lag, token usage, queue depth)
  • Guardrails: max concurrency per tenant, per region; budget limits with kill-switches
  • Multi-region strategy: active-active for reads; controlled writes with leader/follower or per-tenant pinning

Observability & cost governance

  • Tag human vs agent traffic; propagate chain-ID / tool-ID across spans
  • Golden signals + tail-latency SLOs (p95/p99), not just averages
  • Attribution: per-tenant/per-agent cost & cache hit rate; anomaly alerts on $/request
  • Workload forensics: detect loops, entropy spikes, unusual tool mixes

Testing & readiness

  • Property-based & fuzz tests for tool payloads
  • Replay traffic with elevated fan-out to validate limits & caches
  • Chaos & load testing at dependency edges (DB, vector store, model API)
  • Stepped rollouts with automatic rollback on SLO breach

Runbooks & playbooks

  • Cache-miss storm → warmers + SWR + temporary TTL bump
  • Retry storm → clamp retries, raise backoff, enable dedupe window
  • Cost spike → lower budgets, switch to cheaper tier/model, enable result reuse
  • Dependency brownout → feature flags to serve partials or stubbed results

Deliverables for attendees

  • Idempotency & retry checklist
  • Rate-limit/budget policy template (per-tenant/per-chain)
  • Cache-key & SWR guide for deterministic responses
  • Incident playbooks (cache storm, retry storm, dependency brownout)

Learning Objectives (Takeaways)

  1. Design for bursty AI traffic with budget-aware rate limits, fair queuing, and adaptive backpressure.
  2. Harden reliability using idempotency, deduplication, circuit breakers, timeouts, and bulkheads.
  3. Cut latency & cost via deterministic-result caching, SWR, and shard/tiered cache strategies.
  4. Operate with confidence by tagging agent traffic, tracing chain-IDs, and enforcing tail-latency SLOs.
  5. Adopt async/event-driven patterns (queues, workflows, streaming) to keep UX snappy under heavy AI load.
  6. Ship safe with realistic load/chaos tests, stepped rollouts, and incident playbooks ready to go.

PIs built for humans often fail when consumed by AI agents.
They rely on documentation instead of contracts, return unpredictable structures, and break silently when upgraded. Large Language Models (LLMs) and autonomous agents need something different: machine-discoverable, deterministic, idempotent, and lifecycle-managed APIs.
This session introduces a five-phase API readiness framework—from discovery to deprecation—so you can systematically evolve your APIs for safe, predictable AI consumption.
You’ll learn how to assess current APIs, prioritize the ones that matter, and apply modern readiness practices: function/tool calling, schema validation, idempotency, version sunset headers, and agent-aware monitoring.

Problems Solved

  • LLMs fail due to polymorphic or unpredictable API responses
  • Agents retry or loop because APIs aren’t idempotent
  • Ambiguous error messages block autonomous remediation
  • Silent breaking changes halt long-lived agent integrations
  • Lack of lifecycle management creates risk and rework

What “AI-Readiness” Means

  • Machine-Discoverable: APIs described in OpenAPI 3.1 + JSON Schema; self-describing operations and data types.
  • Deterministic: Same input → same output shape; no hidden conditional payloads.
  • Idempotent: Safe retries using Idempotency-Key or request signature patterns.
  • Guardrailed: Strict schema validation, quota enforcement, and prompt-injection defense.
  • Lifecycle Managed: Semantic versioning, Deprecation/Sunset headers, contract testing, and migration guides.

Common Failure Modes Today

  • Polymorphic responses that confuse function-calling agents.
  • Ambiguous errors without remediation guidance.
  • Non-idempotent endpoints causing duplicate orders or charges.
  • Hidden side effects undocumented or triggered by retries.
  • Breaking changes without warning → agents silently fail.

Agenda
Introduction: The Shift from Human → Machine Consumption
Why LLMs and agents fundamentally change API design expectations.
Examples of human-centric patterns that break agent workflows.
Pattern 1: Assessment & Readiness Scorecard
How to audit existing APIs for AI-readiness.
Scoring dimensions: discoverability, determinism, idempotency, guardrails, lifecycle maturity.
Sample scorecard matrix and benchmark scoring.
Pattern 2: Prioritization Strategy
How to choose where to start:

  • High traffic + high risk first (payments, claims, healthcare, orders)
  • Partner/customer-facing before internal
  • Regulated domains (HIPAA, PCI, SOX) before unregulated
  • Consolidate schema, security, and idempotency changes together
    Pattern 3: Five-Phase Readiness Roadmap
  • Discovery: Audit specs, tag agent traffic, document gaps.
  • Redesign: Harden schemas, fix errors, add idempotency keys and prompt-injection defenses.
  • Versioning: Adopt SemVer, support multiple versions, and emit Deprecation/Sunset headers.
  • Monitoring: Track agent vs human usage, retries, anomalies, cost attribution.
  • Deprecation: Communicate timelines, throttle old versions, enable fallback modes.
    Pattern 4: Security & Guardrails
Inject prompt-defense filters at the edge.
Schema validation and rate-limiting.
Automated regression testing against contract schemas to ensure safety.
    Pattern 5: Case Studies
  • Stripe Idempotency: Eliminating duplicate charges with the Idempotency-Key pattern.
  • Deprecation Done Right: APIs that use Sunset headers for graceful agent migration.
  • Agent Tool Example: Mapping operationId=ReserveInventory directly to an LLM tool schema.
    Wrap-Up & Discussion
Recap of framework and quick wins.
Using the Readiness Scorecard and KPI checklist to measure progress from human-centric APIs → agent-ready APIs.
Discussion on embedding readiness audits in CI/CD governance.

Key Framework References

  • OpenAPI 3.1 + JSON Schema: Machine-readable API contracts
  • FinOps + AI Cost Governance: Tagging and metering agent usage
  • OWASP LLM Top 10: Prompt-injection and misuse defenses
  • API Lifecycle Standards: RFC 8594 (Deprecation), RFC 9457 (Sunset Header)
  • ISO/IEC 38507: Governance implications for AI-integrated systems

Takeaways

  • API Readiness Scorecard to evaluate current maturity
  • 5-phase modernization roadmap: Discovery → Redesign → Versioning → Monitoring → Deprecation
  • Checklist + KPIs to align API modernization with AI readiness
  • Case patterns demonstrating resilient, agent-safe API evolution