Free Text Tool · Word Frequency · N-Grams · Keyword Density · Stop Words · CSV Export
Word Frequency Analyser
Count word occurrences and analyse vocabulary patterns. Filter stop words, switch between unigrams, bigrams and trigrams, check keyword density and export results as CSV. Includes Type-Token Ratio and hapax legomena metrics for vocabulary richness analysis.
| # | Word / Phrase | Count | Density | Distribution |
|---|
How to Use the Word Frequency Analyser
Paste or type your text in the input field, set your filtering options, and click Analyse. The tool counts every word and displays results in a sortable table with frequency counts, density percentages and visual distribution bars. Furthermore, switch between Words, Bigrams and Trigrams tabs to analyse single words and multi-word phrases. Additionally, export results as CSV or copy the table to your clipboard.
- Paste your textEnter the text you want to analyse. The tool handles texts up to 100,000+ words.
- Set filtering optionsToggle stop word exclusion, case sensitivity, minimum word length and minimum frequency.
- Click AnalyseView results in a sortable table. Click column headers to sort by word, count or density.
- Switch n-gram tabsView single words, bigrams (two-word phrases) or trigrams (three-word phrases).
- Export resultsCopy the table or download as CSV for spreadsheet analysis and reporting.
What Is Word Frequency Analysis?
Word frequency analysis counts how often each word appears in a text. It is a foundational technique in computational linguistics, natural language processing and content optimisation. Furthermore, it reveals vocabulary patterns that are invisible to casual reading. Writers spot overused words. SEO professionals check keyword density. Researchers analyse terminology distribution across documents.
The technique dates back to the early 20th century when linguists manually counted word occurrences in printed texts. George Kingsley Zipf published his seminal work on word frequency distributions in 1935. Furthermore, his discovery that word frequency follows a power law (the nth most common word appears roughly 1/n times as often as the most common word) remains one of the most robust empirical findings in linguistics.
Understanding N-Gram Analysis
An n-gram is a contiguous sequence of n words from a text. Unigrams are single words. Bigrams are two-word phrases. Trigrams are three-word phrases. Furthermore, n-gram analysis reveals meaningful phrases and collocations that single-word frequency misses entirely.
Bigram analysis is especially valuable for SEO content analysis. Search queries are often multi-word phrases. Furthermore, checking whether your target phrases like "mortgage calculator" or "best running shoes" appear with appropriate frequency ensures natural keyword placement.
Stop Words and Why They Matter
Stop words are function words that carry grammatical meaning but little semantic content. Articles (the, a, an), prepositions (in, on, at, to) and conjunctions (and, but, or) are the most common stop words. Furthermore, they typically account for 40 to 60 percent of all words in English text. Filtering them reveals the content-carrying vocabulary.
This tool includes 175+ English stop words. Toggle the filter on for content analysis and SEO keyword checking. Toggle it off for linguistic research, readability analysis and authorship attribution. Furthermore, stop word distributions can identify writing style patterns. Academic writers use different function word patterns than journalists or novelists.
Vocabulary Richness Metrics
Type-Token Ratio (TTR)
Type-Token Ratio divides unique words (types) by total words (tokens). A 500-word text with 250 unique words has a TTR of 0.500. Furthermore, higher TTR indicates greater vocabulary diversity. TTR decreases as text length increases because longer texts inevitably repeat words. Compare TTR only across texts of similar length.
Hapax Legomena
Hapax legomena are words that appear exactly once in a text. The term comes from Greek meaning "said once." Furthermore, the percentage of hapax legomena indicates vocabulary richness and specificity. Literary fiction typically has 40 to 60 percent hapax legomena. Technical documentation has 20 to 35 percent because it repeats specialised terminology frequently.
| Metric | Formula | Typical range | Interpretation |
|---|---|---|---|
| Type-Token Ratio | Unique words / Total words | 0.30 – 0.70 | Higher = more diverse vocabulary |
| Hapax Legomena % | Words appearing once / Unique words | 20% – 60% | Higher = more specialised vocabulary |
| Keyword Density | Word count / Total words × 100 | 0.5% – 3% | SEO target range for primary keywords |
Word Frequency for SEO
Word frequency analysis serves as a keyword density checker. Paste your content, exclude stop words and check whether your target keyword appears with appropriate frequency. SEO guidelines suggest primary keyword density between 1 and 3 percent. Furthermore, density above 3 percent may trigger over-optimisation penalties from search engines.
Bigram and trigram analysis is more valuable than single-word frequency for SEO. Users search with phrases, not isolated words. Furthermore, checking that your target phrases appear naturally in headings, opening paragraphs and body text ensures content relevance without keyword stuffing. Additionally, comparing your frequency table against top-ranking competitors reveals vocabulary gaps.
Word Frequency for Writers
Writers use frequency analysis to identify overused words and repetitive patterns. If a particular adjective appears 15 times in a 2,000-word article, the frequency table makes this immediately obvious. Furthermore, comparing frequency tables across drafts tracks vocabulary improvement over time.
Fiction writers use frequency analysis to check character voice consistency. Each character should use different vocabulary. Furthermore, detecting unintentional repetition of sentence starters, transition words and filler phrases improves prose quality. Additionally, academic writers check whether technical terms are used consistently throughout a paper.
Word Frequency in Research
Computational linguists use word frequency distributions for authorship attribution, genre classification and language change studies. Zipf's Law predicts that the frequency of any word is inversely proportional to its rank. Furthermore, deviations from Zipf's Law can indicate unusual texts, machine-generated content or genre-specific vocabulary patterns.
Corpus linguists compare word frequencies across large text collections to study language variation. British and American English differ in function word frequencies. Furthermore, historical corpora reveal vocabulary shifts over decades. Additionally, forensic linguists use frequency profiles to identify anonymous authors by comparing their word usage patterns against known writing samples.
Zipf's Law Explained
Zipf's Law states that in any natural language text, the frequency of a word is inversely proportional to its rank. The most common word appears approximately twice as often as the second most common word, three times as often as the third, and so on. Furthermore, this power-law distribution holds across virtually all human languages and even extends to city populations and income distributions.
Best Practices for Frequency Analysis
Always exclude stop words when analysing content vocabulary. Stop words dominate frequency tables and obscure meaningful patterns. Furthermore, use case-insensitive mode unless you specifically need to distinguish proper nouns from common words. Set minimum word length to 2 or 3 to filter out single-letter artifacts.
Compare frequency tables across multiple texts for the most valuable insights. A single frequency table tells you what words appear most often. Comparing tables across drafts, competitors or time periods reveals what changed and what is missing. Furthermore, export CSV files and use spreadsheet tools for advanced analysis including pivot tables and conditional formatting.