Home About Who We Are Team Services Startups Businesses Enterprise Case Studies Blog Guides Contact Connect with Us
Back to Guides
Software & Platforms 9 min read

How to Get Your Groq API Key: Fast Inference Setup Guide

How to Get Your Groq API Key: Fast Inference Setup Guide

Groq runs LLM inference on custom LPU hardware that delivers 276 tokens per second for Llama 3.3 70B, according to independent benchmarks by Artificial Analysis. That is roughly 3-5x faster than most GPU-based providers. Getting a Groq API key takes about two minutes, and the free tier requires no credit card. This guide covers account setup, key generation, pricing, supported models, and how to connect the key to tools like Openclaw.

One thing to clarify before we start: Groq and Grok are different products. Groq (with a Q) is an inference hardware company at console.groq.com. Grok (with a K) is xAI’s chatbot. The names cause confusion, and search results regularly mix them up. This guide is about Groq, the fast inference platform.


Step 1: Create Your Groq Console Account

Go to console.groq.com and click Sign up. You can register with an email address or use Google single sign-on.

After signing up, Groq asks you to verify your email. Complete the verification and you land on the GroqCloud dashboard. This is where you manage API keys, monitor usage, and check rate limits.

No payment information is required. The free tier is immediately active.


Step 2: Generate Your API Key

Navigate to API Keys in the left sidebar, or go directly to console.groq.com/keys.

Click Create API Key. A dialog asks you to name the key. Use something descriptive: “Openclaw Agent” or “Dev Testing” works well. If you are part of a team, note that only team owners or users with the developer role can create keys.

Click Submit and your key appears.

Copy it immediately. Groq displays the full key exactly once. After you close this dialog, the key is masked and cannot be retrieved. If you lose it, you need to create a new one.

Store the key in a password manager, a .env file, or your platform’s secrets manager. Never paste it into code that gets committed to a repository.


Step 3: Verify Your Key Works

Open a terminal and run this curl command (replace your-key-here with the actual key):

curl https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer your-key-here" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.3-70b-versatile", "messages": [{"role": "user", "content": "Say hello"}]}'

If you get a JSON response with a message, the key works. Notice the base URL: api.groq.com/openai/v1. Groq uses an OpenAI-compatible API, which means any tool or SDK that works with OpenAI can point to Groq by changing the base URL and API key.

A common gotcha: if you get a 429 Too Many Requests error on a brand-new key, you have hit the free tier rate limit. Wait a minute and try again. The free tier caps at 30 requests per minute.


What Models Can You Run on Groq

Groq supports a range of open-weight models running on their LPU inference engine. Here are the main options with current pricing:

ModelInput CostOutput CostSpeedContext
Llama 3.1 8B Instant$0.05/M tokens$0.08/M tokens840 tok/s128K
Llama 4 Scout (17Bx16E)$0.11/M tokens$0.34/M tokens594 tok/s128K
Qwen3 32B$0.29/M tokens$0.59/M tokens662 tok/s131K
Llama 3.3 70B Versatile$0.59/M tokens$0.79/M tokens394 tok/s128K
GPT OSS 20B$0.075/M tokens$0.30/M tokens1,000 tok/s128K

Groq also supports Whisper v3 for speech-to-text ($0.111/hour) and Whisper v3 Turbo ($0.04/hour).

To put the pricing in perspective: sending 1,000 short prompts through Llama 3.1 8B costs roughly $0.05 in input tokens. For personal projects and prototyping, the free tier covers most use cases without spending anything.


Free Tier Limits and What You Can Build

The free tier gives you access to every model with no credit card required. The constraint is rate limits, not model access.

ModelRequests/MinTokens/MinRequests/DayTokens/Day
Llama 3.1 8B Instant306,00014,400500,000
Llama 3.3 70B Versatile3012,0001,000100,000

At 30 requests per minute, you can build a personal chatbot, run batch analysis scripts (with pacing), or power an AI agent that handles a few dozen tasks per hour. The daily cap of 14,400 requests on Llama 3.1 8B is generous for development and personal use.

For production workloads, the Developer tier removes most of these limits and offers a 25% cost discount. There is no fixed monthly fee; you pay per token consumed.

We have used Groq’s free tier during development to validate prompt chains before switching to paid models for production. The speed makes iteration cycles noticeably faster, because you spend less time waiting for responses.


Why Groq Is Fast: The LPU Advantage

Most cloud inference runs on NVIDIA GPUs. Groq built a custom chip called the Language Processing Unit (LPU) specifically for sequential token generation. The result is deterministic latency and throughput that GPU clusters struggle to match.

Artificial Analysis independently benchmarked Groq at 276 tokens per second for Llama 3.3 70B. For smaller models, Groq hits 840+ tokens per second on Llama 3.1 8B. That is not a theoretical peak; it is what the API delivers in production.

The practical impact: a 500-word response from Llama 3.3 70B generates in roughly 2 seconds. On a typical GPU provider, the same response takes 6-10 seconds. For interactive use cases like chatbots or agent loops, that difference compounds fast.


What to Do With Your Key Next

Use Groq With Any OpenAI-Compatible Tool

Because Groq’s API is OpenAI-compatible, you can use it with the OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.groq.com/openai/v1",
    api_key="your-groq-api-key"
)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Explain LPU inference in one sentence"}]
)
print(response.choices[0].message.content)

The same pattern works with LangChain, LiteLLM, and any framework that accepts a custom base URL.

Connect Groq to Openclaw

Openclaw is a personal AI agent that runs on your machine and handles tasks through Telegram or WhatsApp. Groq works well as a fast fallback model provider in Openclaw, particularly for tasks where speed matters more than maximum reasoning depth.

To connect Groq to Openclaw, add your key to the environment configuration:

GROQ_API_KEY=your-groq-api-key

Then configure Openclaw to use a Groq model for specific task types. We cover the full configuration in our Openclaw Setup Guide.

If you also need keys for other providers, see our guides for OpenAI and Anthropic.


Keeping Your Key Secure

Three rules that prevent most problems:

  1. Never commit your key to version control. Add .env to your .gitignore file. If you accidentally push a key, revoke it at console.groq.com/keys and generate a new one.

  2. Use environment variables, not hardcoded strings. Store the key in a .env file and load it at runtime. The curl and Python examples above both follow this pattern.

  3. Create separate keys for separate projects. If one key gets compromised, you can revoke it without breaking everything else. Groq lets you create multiple keys at no cost.


Frequently Asked Questions

Is the Groq API free to use?

Creating an account and generating a key is free with no credit card required. You get access to all models on the free tier, but with rate limits (30 requests per minute, daily token caps). For production workloads, the Developer tier charges per token with a 25% discount and higher limits.

What are the rate limits on Groq’s free tier?

The free tier allows 30 requests per minute across most models. Token limits vary by model: Llama 3.1 8B gets 6,000 tokens per minute and 500,000 per day, while Llama 3.3 70B gets 12,000 tokens per minute but only 1,000 requests and 100,000 tokens per day.

How fast is Groq compared to OpenAI or other providers?

Groq’s LPU hardware generates tokens 3-5x faster than typical GPU providers. Artificial Analysis independently measured 276 tokens per second for Llama 3.3 70B and 840+ tokens per second for Llama 3.1 8B. For comparison, most GPU-based providers deliver 50-150 tokens per second on similar models.

What is the difference between Groq and Grok?

Groq (with a Q) is an inference hardware company that runs open-weight models on custom LPU chips. You access it at console.groq.com. Grok (with a K) is xAI’s conversational AI chatbot, associated with Elon Musk. They are completely separate companies and products.

Can I use the OpenAI Python SDK with Groq?

Yes. Set the base URL to https://api.groq.com/openai/v1 and pass your Groq API key. The API is OpenAI-compatible, so any tool or library that supports a custom base URL works with Groq without code changes beyond the URL and key.

Does Groq support vision or audio models?

Groq supports Whisper v3 and Whisper v3 Turbo for speech-to-text transcription. The platform also lists OCR and vision capabilities. Check the Groq docs for the latest model availability.

What happens if I lose my API key?

You cannot retrieve a key after closing the creation dialog. Generate a new key at console.groq.com/keys, update your environment variables, and delete the old key to prevent unauthorized use.


Key Takeaways

  • Groq API keys are free to create at console.groq.com with no credit card required. The free tier includes access to all models.
  • Copy your key the moment it appears. Groq shows it exactly once.
  • Groq’s LPU hardware delivers 276-840+ tokens per second depending on the model, making it one of the fastest inference providers available.
  • The API is OpenAI-compatible. Change the base URL and key, and existing code works with Groq.
  • Connect your key to Openclaw as a fast fallback model provider for agent workflows.

Last Updated: Apr 11, 2026

SL

SFAI Labs

SFAI Labs helps companies build AI-powered products that work. We focus on practical solutions, not hype.

Get OpenClaw Running — Without the Headaches

  • End-to-end setup: hosting, integrations, and skills
  • Skip weeks of trial-and-error configuration
  • Ongoing support when you need it
Get OpenClaw Help →
From zero to production-ready in days, not weeks

Related articles