String Similarity

View mode:

Enter a string in both fields to see the results.

Insights

Try a preset pair

Quick reference (six algorithms)
Levenshtein distance
3
0 to max(|A|, |B|)Typo detection at the character level
Damerau–Levenshtein distance
1
0 to max(|A|, |B|)Typo detection including adjacent-character transpositions
Sørensen–Dice coefficient
~0.3
0 to 1Fuzzy duplicate detection and near-duplicate identification
Soundex code
S530 / S530
4-character LDDD codeAmerican English surname matching in legacy systems
Metaphone code
0MPSN / TMSN
variable-length letter codeGeneral English phonetic matching — sharper than Soundex
Cologne phonetics (Kölner Phonetik)
657 / 657
variable-length numeric codeGerman name and address matching — handles umlauts and sharp-s

About String Similarity

String Similarity compares two strings using six classical algorithms and tells you when they disagree. Use it for typo detection, name matching, duplicate detection, or any time you need to ask "how close are these two strings — by letters, by edits, or by sound?"

The six algorithms

  • Levenshtein distance — minimum number of single-character edits to turn one string into another.
  • Damerau–Levenshtein distance — Levenshtein plus adjacent-character transpositions (good for "teh" vs "the").
  • Sørensen–Dice coefficient — character-bigram overlap in [0, 1] (the workhorse of fuzzy duplicate detection).
  • Soundex — classic American English phonetic code (LDDD), used by the US census.
  • Metaphone — sharper English phonetic code, designed by Lawrence Philips in 1990.
  • Cologne phonetics — German phonetic code (Kölner Phonetik), the standard for German name matching.

When the algorithms disagree, the tool tells you

Each result row is paired with a plain-language explanation, a "best for" hint, and a longer description. The insight panel surfaces interesting cases: "Sounds the same, spelled differently" (use a phonetic algorithm), "Close at both levels" (likely a true near-duplicate), "Phonetic match despite spelling differences" (a phonetic algorithm is the right choice), and "Phonetic codes may be uninformative for this input" (when the input is non-Latin).

Common use cases

  • Detecting typos and misspellings in user input.
  • Matching customer records by name (especially with the Soundex / Metaphone / Cologne phonetic codes).
  • Identifying near-duplicate entries in a database.
  • Choosing the right similarity algorithm for a downstream task.
  • Verifying that two spellings of a German / English name map to the same phonetic code.

All six algorithms run in the browser. No string, result, or preset description ever leaves your machine.

Comments

Please accept the "Functionality" cookie category to view and post comments.