Word Frequency Analyser — Free Online Checker | LazyTools

Free Text Tool · Word Frequency · N-Grams · Keyword Density · Stop Words · CSV Export

Word Frequency Analyser

Count word occurrences and analyse vocabulary patterns. Filter stop words, switch between unigrams, bigrams and trigrams, check keyword density and export results as CSV. Includes Type-Token Ratio and hapax legomena metrics for vocabulary richness analysis.

Word Frequency AnalyserUnigrams • Bigrams • Trigrams • Stop Words • CSV Export
Text ToolsN-Gram AnalysisStop WordsCSV ExportNo Signup100% Browser

How to Use the Word Frequency Analyser

Paste or type your text in the input field, set your filtering options, and click Analyse. The tool counts every word and displays results in a sortable table with frequency counts, density percentages and visual distribution bars. Furthermore, switch between Words, Bigrams and Trigrams tabs to analyse single words and multi-word phrases. Additionally, export results as CSV or copy the table to your clipboard.

  1. Paste your textEnter the text you want to analyse. The tool handles texts up to 100,000+ words.
  2. Set filtering optionsToggle stop word exclusion, case sensitivity, minimum word length and minimum frequency.
  3. Click AnalyseView results in a sortable table. Click column headers to sort by word, count or density.
  4. Switch n-gram tabsView single words, bigrams (two-word phrases) or trigrams (three-word phrases).
  5. Export resultsCopy the table or download as CSV for spreadsheet analysis and reporting.

What Is Word Frequency Analysis?

Word frequency analysis counts how often each word appears in a text. It is a foundational technique in computational linguistics, natural language processing and content optimisation. Furthermore, it reveals vocabulary patterns that are invisible to casual reading. Writers spot overused words. SEO professionals check keyword density. Researchers analyse terminology distribution across documents.

The technique dates back to the early 20th century when linguists manually counted word occurrences in printed texts. George Kingsley Zipf published his seminal work on word frequency distributions in 1935. Furthermore, his discovery that word frequency follows a power law (the nth most common word appears roughly 1/n times as often as the most common word) remains one of the most robust empirical findings in linguistics.

Understanding N-Gram Analysis

An n-gram is a contiguous sequence of n words from a text. Unigrams are single words. Bigrams are two-word phrases. Trigrams are three-word phrases. Furthermore, n-gram analysis reveals meaningful phrases and collocations that single-word frequency misses entirely.

Unigrams: "machine", "learning", "model" Bigrams: "machine learning", "learning model" Trigrams: "machine learning model" Example text: "The cat sat on the mat" Unigrams: the(2), cat(1), sat(1), on(1), mat(1) Bigrams: "the cat"(1), "cat sat"(1), "sat on"(1), "on the"(1), "the mat"(1) Trigrams: "the cat sat"(1), "cat sat on"(1), "sat on the"(1), "on the mat"(1)

Bigram analysis is especially valuable for SEO content analysis. Search queries are often multi-word phrases. Furthermore, checking whether your target phrases like "mortgage calculator" or "best running shoes" appear with appropriate frequency ensures natural keyword placement.

Stop Words and Why They Matter

Stop words are function words that carry grammatical meaning but little semantic content. Articles (the, a, an), prepositions (in, on, at, to) and conjunctions (and, but, or) are the most common stop words. Furthermore, they typically account for 40 to 60 percent of all words in English text. Filtering them reveals the content-carrying vocabulary.

This tool includes 175+ English stop words. Toggle the filter on for content analysis and SEO keyword checking. Toggle it off for linguistic research, readability analysis and authorship attribution. Furthermore, stop word distributions can identify writing style patterns. Academic writers use different function word patterns than journalists or novelists.

Vocabulary Richness Metrics

Type-Token Ratio (TTR)

Type-Token Ratio divides unique words (types) by total words (tokens). A 500-word text with 250 unique words has a TTR of 0.500. Furthermore, higher TTR indicates greater vocabulary diversity. TTR decreases as text length increases because longer texts inevitably repeat words. Compare TTR only across texts of similar length.

Hapax Legomena

Hapax legomena are words that appear exactly once in a text. The term comes from Greek meaning "said once." Furthermore, the percentage of hapax legomena indicates vocabulary richness and specificity. Literary fiction typically has 40 to 60 percent hapax legomena. Technical documentation has 20 to 35 percent because it repeats specialised terminology frequently.

MetricFormulaTypical rangeInterpretation
Type-Token RatioUnique words / Total words0.30 – 0.70Higher = more diverse vocabulary
Hapax Legomena %Words appearing once / Unique words20% – 60%Higher = more specialised vocabulary
Keyword DensityWord count / Total words × 1000.5% – 3%SEO target range for primary keywords

Word Frequency for SEO

Word frequency analysis serves as a keyword density checker. Paste your content, exclude stop words and check whether your target keyword appears with appropriate frequency. SEO guidelines suggest primary keyword density between 1 and 3 percent. Furthermore, density above 3 percent may trigger over-optimisation penalties from search engines.

Bigram and trigram analysis is more valuable than single-word frequency for SEO. Users search with phrases, not isolated words. Furthermore, checking that your target phrases appear naturally in headings, opening paragraphs and body text ensures content relevance without keyword stuffing. Additionally, comparing your frequency table against top-ranking competitors reveals vocabulary gaps.

Word Frequency for Writers

Writers use frequency analysis to identify overused words and repetitive patterns. If a particular adjective appears 15 times in a 2,000-word article, the frequency table makes this immediately obvious. Furthermore, comparing frequency tables across drafts tracks vocabulary improvement over time.

Fiction writers use frequency analysis to check character voice consistency. Each character should use different vocabulary. Furthermore, detecting unintentional repetition of sentence starters, transition words and filler phrases improves prose quality. Additionally, academic writers check whether technical terms are used consistently throughout a paper.

Word Frequency in Research

Computational linguists use word frequency distributions for authorship attribution, genre classification and language change studies. Zipf's Law predicts that the frequency of any word is inversely proportional to its rank. Furthermore, deviations from Zipf's Law can indicate unusual texts, machine-generated content or genre-specific vocabulary patterns.

Corpus linguists compare word frequencies across large text collections to study language variation. British and American English differ in function word frequencies. Furthermore, historical corpora reveal vocabulary shifts over decades. Additionally, forensic linguists use frequency profiles to identify anonymous authors by comparing their word usage patterns against known writing samples.

Zipf's Law Explained

Zipf's Law states that in any natural language text, the frequency of a word is inversely proportional to its rank. The most common word appears approximately twice as often as the second most common word, three times as often as the third, and so on. Furthermore, this power-law distribution holds across virtually all human languages and even extends to city populations and income distributions.

Zipf's Law: f(r) = C / r^a f(r) = frequency of the word at rank r C = constant (frequency of the most common word) a = exponent (approximately 1.0 for most languages) Example (typical English text): Rank 1: "the" ~7.0% of all words Rank 2: "of" ~3.5% (approx. half of rank 1) Rank 3: "and" ~2.8% Rank 10: "in" ~0.7% Rank 100: varies ~0.07%

Best Practices for Frequency Analysis

Always exclude stop words when analysing content vocabulary. Stop words dominate frequency tables and obscure meaningful patterns. Furthermore, use case-insensitive mode unless you specifically need to distinguish proper nouns from common words. Set minimum word length to 2 or 3 to filter out single-letter artifacts.

Compare frequency tables across multiple texts for the most valuable insights. A single frequency table tells you what words appear most often. Comparing tables across drafts, competitors or time periods reveals what changed and what is missing. Furthermore, export CSV files and use spreadsheet tools for advanced analysis including pivot tables and conditional formatting.

Frequently Asked Questions

Word frequency analysis counts how often each word appears in a text. It reveals vocabulary patterns, overused words, keyword density and writing style. Furthermore, it is a foundational technique in computational linguistics, natural language processing and content optimisation.
Stop words are common function words like 'the', 'and', 'in', 'to' and 'of' that appear frequently but carry little meaning. Filtering them reveals the content-carrying words that matter. Furthermore, this tool includes 175+ English stop words with a one-click toggle.
Bigrams are two-word phrases. Trigrams are three-word phrases. Together they are called n-grams. Furthermore, n-gram analysis reveals common phrases and collocations that single-word frequency misses. For example, 'machine learning' is a meaningful bigram.
Keyword density is the percentage of total words that a specific word represents. A word appearing 5 times in a 500-word text has 1.0% density. Furthermore, SEO guidelines suggest keeping primary keyword density between 1% and 3% to avoid over-optimisation penalties.
Type-Token Ratio (TTR) divides unique words (types) by total words (tokens). A TTR of 0.60 means 60% of words are unique. Furthermore, higher TTR indicates greater vocabulary diversity. Academic writing typically has TTR between 0.40 and 0.60.
Hapax legomena are words that appear exactly once in a text. The percentage of hapax legomena indicates vocabulary richness. Furthermore, literary texts typically have 40 to 60 percent hapax legomena, while technical manuals have 20 to 35 percent.
Word frequency reveals whether your target keywords appear with appropriate density. It also identifies unintentional keyword stuffing. Furthermore, bigram and trigram analysis shows whether important phrases like 'best mortgage calculator' appear naturally in your content.
Writers use frequency analysis to spot overused words, diversify vocabulary and identify repetitive patterns. Furthermore, comparing frequency tables across drafts tracks improvement in vocabulary variety and writing quality.
No. All analysis runs locally in your browser. No text is transmitted to any server. Furthermore, you can verify this by disconnecting from the internet before analysing.
The tool counts word frequencies in any language that uses space-separated words. However, the stop word list is English-only. Furthermore, disable stop word filtering when analysing non-English text.

Related Text Tools

Word Count

Count words, characters, sentences, paragraphs and reading time.

Keyword Density Checker

Check keyword density and distribution across your content.

Case Converter

Convert text between uppercase, lowercase and title case.

Fancy Text Generator

80+ Unicode font styles for social media.

Text Summariser

Condense long text into concise summaries.

Remove Duplicate Lines

Remove duplicate lines from lists and data sets.

Rate this tool

4.3
out of 5
328 ratings
5 ★
61%
4 ★
24%
3 ★
8%
2 ★
1%
1 ★
6%
How useful was this tool?