Rohit Bhardwaj

Rohit Bhardwaj is a Director of Architecture working at Salesforce. Rohit has extensive experience architecting multi-tenant cloud-native solutions in Resilient Microservices Service-Oriented architectures using AWS Stack. In addition, Rohit has a proven ability in designing solutions and executing and delivering transformational programs that reduce costs and increase efficiencies.

As a trusted advisor, leader, and collaborator, Rohit applies problem resolution, analytical, and operational skills to all initiatives and develops strategic requirements and solution analysis through all stages of the project life cycle and product readiness to execution.
Rohit excels in designing scalable cloud microservice architectures using Spring Boot and Netflix OSS technologies using AWS and Google clouds. As a Security Ninja, Rohit looks for ways to resolve application security vulnerabilities using ethical hacking and threat modeling. Rohit is excited about architecting cloud technologies using Dockers, REDIS, NGINX, RightScale, RabbitMQ, Apigee, Azul Zing, Actuate BIRT reporting, Chef, Splunk, Rest-Assured, SoapUI, Dynatrace, and EnterpriseDB. In addition, Rohit has developed lambda architecture solutions using Apache Spark, Cassandra, and Camel for real-time analytics and integration projects.

Rohit has done MBA from Babson College in Corporate Entrepreneurship, Masters in Computer Science from Boston University and Harvard University. Rohit is a regular speaker at No Fluff Just Stuff, UberConf, RichWeb, GIDS, and other international conferences.

Rohit loves to connect on http://www.productivecloudinnovation.com.
http://linkedin.com/in/rohit-bhardwaj-cloud or using Twitter at rbhardwaj1.

Presentations

Scaling APIs for Millions of AI-Driven Calls

AI agents don’t behave like humans. A single prompt can trigger thousands of parallel API calls, retries, and tool chains—creating bursty load, cache-miss storms, and runaway costs. This talk unpacks how to design and operate APIs that stay fast, reliable, and affordable under AI workloads. We’ll cover agent-aware rate limiting, backpressure & load shedding, deterministic-result caching, idempotency & deduplication, async/event-driven patterns, and autoscaling without bill shock. You’ll learn how to tag and trace agent traffic, set SLOs that survive tail latency, and build graceful-degradation playbooks that keep experiences usable when the graph goes wild.

Why scaling is different with AI

Bursty, spiky traffic from tool-chaining and agent loops
High fan-out per request → N downstream calls per prompt
Non-stationary patterns (time-of-day + product launches + model changes)
Cost correlates with requests × context × retries, not just QPS

Failure modes to expect (and design for)

Cache-miss storms after deploy/flush; thundering herds on hot keys
Retry amplification (agents + gateways + SDKs all retry)
Unbounded concurrency → DB saturation, queue buildup, 99.9th pct tail spikes
“Version drift” between agents and APIs → malformed or expensive calls

Traffic control & fairness

Multi-dimensional rate limits: per-tenant, per-agent, per-tool, per-chain
Budget-aware throttling: cap by token/$ budget, not just requests
Adaptive backpressure: shed or downgrade when saturation signals trip
Fair queuing: prevent “noisy” agents from starving others

Resilience patterns

Idempotency keys + deduplication for writes & retries
Circuit breakers & bulkheads around fragile dependencies
Timeouts with jitter + bounded retries (server hints for clients)
Graceful degradation: return partials, cached/stale, queued-async receipts

Caching that actually works for AI

Deterministic-result caching (prompt+params hash)
Shard & tier caches (memory → Redis → CDN/edge) + TTL tuned to freshness
Negative caching to suppress repeated failures
Stale-while-revalidate to tame cache-miss storms

Async & event-driven designs

Queue first for heavy/long-running tasks (workflows > request/response)
Outbox/Saga patterns for consistency across services
Streaming APIs for incremental results; webhooks/callbacks for completion
Backlogs with priorities (gold/silver/bronze) and dead-letter policies

Autoscaling without bill shock

Pick the right compute: provisioned concurrency for cold-start-sensitive paths; on-demand for bursty tools
KEDA/HPA on meaningful signals (RPS, lag, token usage, queue depth)
Guardrails: max concurrency per tenant, per region; budget limits with kill-switches
Multi-region strategy: active-active for reads; controlled writes with leader/follower or per-tenant pinning

Observability & cost governance

Tag human vs agent traffic; propagate chain-ID / tool-ID across spans
Golden signals + tail-latency SLOs (p95/p99), not just averages
Attribution: per-tenant/per-agent cost & cache hit rate; anomaly alerts on $/request
Workload forensics: detect loops, entropy spikes, unusual tool mixes

Testing & readiness

Property-based & fuzz tests for tool payloads
Replay traffic with elevated fan-out to validate limits & caches
Chaos & load testing at dependency edges (DB, vector store, model API)
Stepped rollouts with automatic rollback on SLO breach

Runbooks & playbooks

Cache-miss storm → warmers + SWR + temporary TTL bump
Retry storm → clamp retries, raise backoff, enable dedupe window
Cost spike → lower budgets, switch to cheaper tier/model, enable result reuse
Dependency brownout → feature flags to serve partials or stubbed results

Deliverables for attendees

Idempotency & retry checklist
Rate-limit/budget policy template (per-tenant/per-chain)
Cache-key & SWR guide for deterministic responses
Incident playbooks (cache storm, retry storm, dependency brownout)

Learning Objectives (Takeaways)

Design for bursty AI traffic with budget-aware rate limits, fair queuing, and adaptive backpressure.
Harden reliability using idempotency, deduplication, circuit breakers, timeouts, and bulkheads.
Cut latency & cost via deterministic-result caching, SWR, and shard/tiered cache strategies.
Operate with confidence by tagging agent traffic, tracing chain-IDs, and enforcing tail-latency SLOs.
Adopt async/event-driven patterns (queues, workflows, streaming) to keep UX snappy under heavy AI load.
Ship safe with realistic load/chaos tests, stepped rollouts, and incident playbooks ready to go.

Your API Is Not Ready for AI (Yet): A Lifecycle Readiness Guide

APIs designed for humans break when consumed by LLMs and autonomous agents. Documentation isn’t enough—endpoints must be machine-discoverable, deterministic, idempotent, and versioned with clear deprecation signals. This talk gives you a pragmatic lifecycle readiness framework: assess your current APIs, prioritize the ones that matter, and execute a phased roadmap (discovery → redesign → versioning → monitoring → deprecation). We’ll align with current best practices for function/tool calling, prompt-injection defenses, idempotency, and version sunset/deprecation headers, and show how to instrument agent traffic so you can govern cost and risk. You’ll leave with a scorecard, checklists, and KPIs to move from “works for humans” to agent-friendly, enterprise-grade APIs.

What “AI-Readiness” Means

Machine-discoverable: APIs described in OpenAPI 3.1 + JSON Schema, not just prose docs.
Deterministic: Same input → same output shape every time.
Idempotent: Agents can retry safely without duplicate side effects.
Guardrailed: Schema validation, quotas, and prompt-injection defenses at the edge.
Lifecycle managed: Versioning, Sunset/Deprecation headers, contract tests, and migration guides.

Common Failure Modes Today

Polymorphic responses (different shapes → agent confusion).
Ambiguous error messages with no codes or remediation.
Missing idempotency → duplicate orders, payments, claims.
Hidden side effects not documented → agents fail or loop.
Silent breaking changes → long-lived agents stop working.

Assessment Framework (API Readiness Scorecard)

Prioritization Strategy

High traffic + high risk APIs first (payments, claims, healthcare, orders).
Partner & customer-facing APIs over internal ones.
Regulated domains (HIPAA, PCI) before non-regulated.
Consolidate changes (schema + idempotency + security) together to reduce churn.

Roadmap Phases

Discovery: Audit specs, tag agent traffic, collect gaps.
Redesign: Harden schemas, add idempotency keys, fix error grammar, add prompt-injection guardrails.
Versioning: Adopt SemVer, support multiple versions, emit Deprecation/Sunset headers.
Monitoring: Dashboards for agent vs human usage, retries, anomalies.
Deprecation: Communicate timelines, progressive throttles, safe fallback modes.

Case Studies / Examples

Stripe Idempotency: Solved duplicate charge risk with Idempotency-Key.
Deprecation Done Right: APIs with Sunset headers → agents migrated smoothly.
Agent Tools: Mapping operationId=ReserveInventory directly to an LLM tool with strict schema.

Takeaways

Docs aren’t enough → agents need contracts, determinism, and schemas.
Most APIs today will fail agents (polymorphism, hidden side effects, poor errors).
Use the Readiness Scorecard to measure and prioritize which APIs to fix first.
Follow the 5-phase roadmap: Discovery → Redesign → Versioning → Monitoring → Deprecation.
With checklists and KPIs, you can evolve from human-centric APIs to agent-ready, enterprise-grade APIs.

Director of Architecture, Expert in cloud-native solutions

Presentations

Scaling APIs for Millions of AI-Driven Calls

Your API Is Not Ready for AI (Yet): A Lifecycle Readiness Guide