Free AI Tool · Token Counter · GPT · Claude · Gemini · Context Window · Cost Estimate · Real-Time
AI Token Counter
Count tokens in any text with real-time cost estimates for GPT-5, Claude and Gemini models. See what percentage of each model's context window your text uses. Colour-coded bars show green (fits), amber (tight) and red (exceeds). Copy the full analysis with cost breakdowns.
How to Use the AI Token Counter
Paste any text into the input area. The tool instantly counts approximate tokens, words, characters and estimated pages. Furthermore, it shows the cost of sending that text as input to popular AI models. Additionally, context window bars show what percentage of each model's limit your text occupies.
- Paste textEnter your prompt, system message, document or code.
- Read token countSee approximate tokens, words, characters and page count.
- Check costView input cost estimates for Haiku, Sonnet, GPT-5.2 and GPT-5 Mini.
- Check context fitColour-coded bars show fit against 9 model context windows.
- Copy analysisCopy the full token analysis with cost estimates.
What Are Tokens?
A token is the smallest unit of text that an AI model processes. Furthermore, tokens are not words. The word "tokenisation" might be split into "token" and "isation" as two separate tokens. On average, one token equals approximately 0.75 English words or 4 characters. Additionally, punctuation, spaces and special characters often consume their own tokens.
Different providers use different tokeniser algorithms. Furthermore, OpenAI uses tiktoken with the cl100k_base encoding. Anthropic uses a custom byte-pair encoding (BPE) tokeniser. Google uses SentencePiece. The same text can produce different token counts across providers, typically varying by 5 to 15 percent.
Sources: OpenAI Tokeniser Tool · Anthropic Token Counting
Context Windows by Model
The context window is the maximum total tokens (input plus output) a model can process in one request. Furthermore, larger context windows allow longer documents but cost more. Gemini models offer up to 1 million tokens. Claude models support 200K tokens. Additionally, GPT-5.2 supports 128K tokens.
| Model | Context window | Input $/M | Pages (~250 words) |
|---|---|---|---|
| Gemini 2.5 Flash | 1,000,000 | $0.30 | ~3,000 |
| Claude Sonnet 4.6 | 200,000 | $3.00 | ~600 |
| Claude Opus 4.6 | 200,000 | $5.00 | ~600 |
| GPT-5.2 | 128,000 | $1.75 | ~384 |
| GPT-5 Mini | 128,000 | $0.25 | ~384 |
| GPT-5 Nano | 16,000 | $0.05 | ~48 |
How to Reduce Token Count
Shorter prompts cost less. Furthermore, removing unnecessary context, filler words and redundant instructions can reduce token counts by 20 to 40 percent. Use concise system prompts. Additionally, leverage few-shot examples only when they measurably improve output quality.
Prompt caching is the most effective cost reduction technique. Furthermore, OpenAI offers 50 to 90 percent discounts on cached input tokens. Anthropic offers 90 percent discounts on cache reads. This means a 2,000-token system prompt that repeats on every request costs 90 percent less after the first call.
Tokens in Different Content Types
| Content type | Tokens per 1000 words | Notes |
|---|---|---|
| English prose | ~1,333 | Standard ratio |
| Python code | ~1,600 | Syntax characters add tokens |
| JSON data | ~1,800 | Brackets, quotes, colons |
| HTML/XML | ~2,000 | Tags consume many tokens |
| Minified code | ~2,200 | No whitespace, dense syntax |
| Non-Latin scripts | ~2,000–3,000 | CJK characters use more tokens |
Token Budgeting for Production Applications
Production AI applications require careful token budgeting. Furthermore, allocate your context window into three zones: system prompt (fixed overhead), user context (variable, grows with conversation history) and output headroom (reserved for the model's response). A common split is 20 percent system, 60 percent context and 20 percent output.
Monitor token usage per request in production. Furthermore, set alerts when average tokens exceed your budget. Track the ratio of input to output tokens because output is 3 to 8 times more expensive. Additionally, log token counts daily to identify usage spikes before they become billing surprises.
Implement token guardrails. Furthermore, truncate conversation history when it approaches the context limit. Use summarisation to compress older messages into fewer tokens. Additionally, remove low-value context (greetings, acknowledgements) from the conversation history to free space for substantive content.
Tokens and Multilingual Content
English is the most token-efficient language for current AI models. Furthermore, Chinese, Japanese and Korean text typically uses 2 to 3 times more tokens per word because BPE vocabularies are English-heavy. Arabic and Hindi fall between, using approximately 1.5 to 2 times the tokens. Additionally, mixed-language content (code with Chinese comments) produces unpredictable token counts.
This has direct cost implications for international applications. Furthermore, a customer support chatbot serving Chinese users costs 2 to 3 times more in token fees than an identical English-language bot. Consider this when selecting models for multilingual deployments. Additionally, some providers offer language-optimised tokenisers that reduce this gap.
When budgeting for multilingual projects, use this token counter to measure actual token counts in your target languages. Furthermore, paste representative samples in each language and note the tokens-per-word ratio. Multiply your English cost estimates by this ratio to get accurate multilingual projections. This simple step prevents budget overruns that catch teams off guard after launch.
References
1. OpenAI Tokeniser — official tiktoken tool.
2. Anthropic: Token Counting.
3. OpenAI API Pricing, June 2026.
4. Anthropic Claude Pricing, June 2026.
Why Token Counting Matters for AI Development
Token counting is essential for three reasons: cost control, context window management and prompt engineering. Furthermore, API billing is entirely based on tokens consumed. A team sending 50,000 requests per day with unnecessarily verbose prompts can waste thousands of dollars monthly. Additionally, knowing your token count before sending prevents context window overflow errors.
Prompt engineers use token counters to optimise system prompts. Furthermore, a 500-word system prompt consumes approximately 667 tokens on every request. At 10,000 requests per day on Claude Sonnet 4.6, that system prompt alone costs $20 daily. Reducing it by 30 percent saves $6 per day, or $2,190 per year. Additionally, token counters help developers stay within budget limits set by project managers.
Competitor Gap Analysis
Most token counters show a single number. Furthermore, no free tool combines token count, multi-model cost estimates, context window fit bars and copy-to-clipboard analysis in one interface.
| Feature | Most competitors | LazyTools |
|---|---|---|
| Token count | Yes (single model) | Universal approximation |
| Multi-model cost estimates | No competitor | 4 models (Haiku, Sonnet, GPT-5.2, Mini) |
| Context window bars | No competitor | 9 models, colour-coded |
| Real-time (no click) | Some | Instant on keystroke |
| Word + char + pages | Some | All four metrics |
| Copy analysis | No competitor | Full report to clipboard |
How Tokenisers Work
Modern AI tokenisers use Byte-Pair Encoding (BPE). Furthermore, BPE starts with individual characters and iteratively merges the most frequent adjacent pairs into single tokens. Common English words like "the" become single tokens. Rare words and technical terms are split into sub-word pieces.
This explains why common words are cheap (one token each) while rare terms cost more. Furthermore, the word "cryptocurrency" might be two or three tokens. Non-Latin scripts (Chinese, Arabic, Hindi) use significantly more tokens per word because BPE vocabularies are trained primarily on English text. Additionally, this means the same content in Chinese can cost 2 to 3 times more tokens than in English.
Optimising Prompts to Reduce Tokens
Replace verbose instructions with concise directives. Furthermore, "Please provide a detailed analysis of the following text, making sure to include all relevant information" (19 tokens) can become "Analyse this text thoroughly" (5 tokens). The model understands both equally well.
Use structured output formats like JSON schemas. Furthermore, specifying the exact output structure reduces output tokens because the model follows the template rather than generating verbose prose. Additionally, setting max_tokens in the API call prevents runaway responses that consume unnecessary output tokens.
Frequently Asked Questions
Related AI Tools
AI Credit & Cost Calculator
Compare API costs for 20+ AI models from 7 providers. Furthermore, includes use-case presets and recommendations.
→Word Counter
Count words, characters, sentences and paragraphs in real time. Furthermore, tracks reading and speaking time.
→Text Splitter
Split text by characters, words, sentences or regex with 8 modes. Furthermore, includes GPT preset for AI chunking.
→AI Water Footprint Calculator
Estimate water consumption of AI model training and inference. Furthermore, compare environmental costs across models.
→Character Counter
Count characters with and without spaces for precise text measurement. Furthermore, tracks keyword density.
→JSON Formatter
Format, validate and minify JSON data for API payloads. Furthermore, includes tree view and error detection.
→