Token-Based Rate Limiting for AI APIs in Next.js (Production Guide)
If you're building with Claude, GPT-4o, or any other LLM API, you need rate limiting. Without it, one viral moment -- or one buggy loop -- can burn through your entire month's API budget in hours. ...

Source: DEV Community
If you're building with Claude, GPT-4o, or any other LLM API, you need rate limiting. Without it, one viral moment -- or one buggy loop -- can burn through your entire month's API budget in hours. Here's a production-grade rate limiting setup for Next.js AI routes, with real code you can drop in. Why AI Routes Are Different Standard rate limiting (by IP, by user) is well-understood. AI routes have a harder problem: token consumption varies wildly. A user who sends "hi" costs you $0.0001. A user who sends a 10,000-token document costs you $0.03. If you rate limit by requests, you're not actually limiting cost. You need to limit by tokens, not requests. The Implementation 1. Install Upstash Redis Upstash has a free tier and a Next.js SDK. Perfect for serverless. npm install @upstash/redis @upstash/ratelimit Add to .env.local: UPSTASH_REDIS_REST_URL=your_url UPSTASH_REDIS_REST_TOKEN=your_token 2. Create the Rate Limiter // src/lib/rate-limit.ts import { Ratelimit } from "@upstash/ratelimi