AI Prompt Market

[LIVE] [sergeykrin9/nvidia-nim-cascade] 2 endpoint(s)

ChatGPT API Proxy/ChatGPT

2,187 characters

Status: [LIVE] Endpoints found: https://integrate.api.nvidia.com/v1/chat/completions https://integrate.api.nvidia.com/v1/chat Source: https://github.com/sergeykrin9/nvidia-nim-cascade <img src="https://capsule-render.vercel.app/api?type=waving&color=0:1a1a1d,50:76B900,100:1a1a1d&height=200&section=header&text=NVIDIA%20NIM%20Cascade&fontSize=48&fontColor=f5f1e8&fontAlignY=38&desc=9-model%20fallback%20chain%20·%20~360%20RPM%20·%20$0/month%20·%20Replaces%20OpenAI%20fallback&descSize=14&descAlignY=62" alt="NVIDIA NIM Cascade" /> <div align="center"> <p> <a href="https://github.com/SergeyKrin9/nvidia-nim-cascade/stargazers"><img src="https://img.shields.io/github/stars/SergeyKrin9/nvidia-nim-cascade?style=for-the-badge&color=76B900&logo=github&labelColor=1a1a1d"></a> <img src="https://img.shields.io/badge/license-MIT-76B900?style=for-the-badge&labelColor=1a1a1d"> <img src="https://img.shields.io/badge/Python-3.10%2B-76B900?style=for-the-badge&logo=python&logoColor=white&labelColor=1a1a1d"> <img src="https://img.shields.io/badge/NVIDIA%20NIM-free%20tier-76B900?style=for-the-badge&logo=nvidia&logoColor=white&labelColor=1a1a1d"> </p> <sub>🐍 Python · ⚡ ~360 RPM · 💸 $0/mo · 🔁 9-model fallback · ✅ OpenAI-compatible API</sub> </div> > [!IMPORTANT] > **Key insight that saved us $1,500/month.** > NVIDIA NIM rate-limits **per MODEL, not per ACCOUNT**. Same endpoint, different `model` field = independent 40 RPM buckets. 9 different models → ~360 RPM at zero cost. Replaces OpenAI gpt-4o-mini fallback in production with zero downtime. ## The insight Most teams that hit `meta/llama-3.1-70b-instruct` quickly run into 429 rate limits. They assume the limit is **per account**. It isn't. **NVIDIA NIM enforces rate limits per-model.** Hitting the same OpenAI-compatible endpoint with a different `model` field gives you a fresh 40 RPM bucket. Less-popular slugs (Nemotron, gpt-oss-20b, mixtral) almost never 429. So: a sequential cascade across 9 model slugs gives you **~360 RPM of free LLM capacity** with identical OpenAI-compatible code. ## Verified working models (May 2026) API endpoint: `https://integrate.api.nvidia.com/v1` (Open

Download .txt