Levenshtein Distance (Edit Distance)

Calculator to compute the minimal edit operations with formulas and examples

Levenshtein Distance Calculator

What is calculated?

The Levenshtein distance (also called edit distance) is the minimal number of operations (insert, delete, substitute) required to transform one string into another.

Input strings

Source string (can be empty)

Target string (can be empty)
Result
Levenshtein distance:
Minimal number of edit operations

Levenshtein Info

Operations

Allowed operations:

  • Insert a character
  • Delete a character
  • Substitute a character

Applications: Spell checking, DNA sequence analysis, plagiarism detection and version control.

Special cases
Identical strings:
LD("ABC", "ABC") = 0
Empty string:
LD("", "ABC") = 3 (3× insertions)
Completely different:
LD("CAT", "DOG") = 3
Related distances

→ Hamming distance
Damerau-Levenshtein: + transposition
Jaro-Winkler: For similar strings


Formulas for Levenshtein distance

Recursive definition
\[LD(i,j) = \begin{cases} j & \text{if } i = 0 \\ i & \text{if } j = 0 \\ LD(i-1,j-1) & \text{if } s_i = t_j \\ 1 + \min \begin{cases} LD(i-1,j) \\ LD(i,j-1) \\ LD(i-1,j-1) \end{cases} & \text{otherwise} \end{cases}\]
Matrix form
\[D[i,j] = \min \begin{cases} D[i-1,j] + 1 & \text{(deletion)} \\ D[i,j-1] + 1 & \text{(insertion)} \\ D[i-1,j-1] + c & \text{(substitution)} \end{cases}\] \[c = \begin{cases} 0 & \text{if } s_i = t_j \\ 1 & \text{otherwise} \end{cases}\]
Initialization
\[D[i,0] = i \text{ for all } i\] \[D[0,j] = j \text{ for all } j\] Base: empty strings
Complexity
\[\text{Time: } O(m \cdot n)\] \[\text{Space: } O(m \cdot n)\] m, n = string lengths
Normalized form
\[LD_n(s,t) = \frac{LD(s,t)}{\max(|s|,|t|)}\] Values between 0 and 1
Similarity
\[\text{Sim}(s,t) = 1 - LD_n(s,t)\] Similarity measure (0-1)

Detailed calculation example

Example: Levenshtein("Das Tier im Zoo", "Das Tor am Zoo")

Given:

  • String 1: "Das Tier im Zoo"
  • String 2: "Das Tor am Zoo"

Required operations:

  1. Substitute io in "Tier"
  2. Delete e from "Tier"
  3. Substitute ia in "im"

Transformation:

"Das Tier im Zoo""Das Tor im Zoo""Das Tor m Zoo""Das Tor am Zoo"

Result: Levenshtein distance = 3

Dynamic programming matrix

Example: computation for "KITTEN" → "SITTING"

DP matrix:

"" S I T T I N G
""01234567
K11234567
I22123456
T33212345
T44321234
E55432234
N66543323

Operations:

  1. S for K
  2. I for I (match)
  3. T for T (match)
  4. T for T (match)
  5. I for E
  6. N for N (match)
  7. Insert G

Distance: 3

Practical applications

Text processing
  • Spell checking
  • Autocorrect
  • Search suggestions
  • Plagiarism detection
Bioinformatics
  • DNA sequence alignment
  • Protein comparisons
  • Genetic distances
  • Evolutionary analysis
Software development
  • Version control (git diff)
  • Code similarity
  • Merge conflicts
  • Refactoring tools

Algorithm & complexity

Dynamic programming vs. recursion

Naive recursion:

  • Time complexity: O(3^min(m,n))
  • Space complexity: O(min(m,n))
  • Problem: Exponential growth
  • Use: Only for very short strings

Dynamic programming:

  • Time complexity: O(m×n)
  • Space complexity: O(m×n)
  • Advantage: Practical for long strings
  • Optimization: O(min(m,n)) space possible
Optimizations:

Space optimization: Store only two rows of the matrix

Early termination: Abort early for large distances

Real-world examples

Search suggestions

Scenario: User types "Pariz" instead of "Paris"

Candidates:
• Paris: LD = 1 (z→s)
• Berlin: LD = 5
• London: LD = 6
→ Suggestion: Paris
DNA mutation

Scenario: Comparing gene sequences

Original: ATCGATCG
Mutated: ATGGATCG
LD = 1: C→G at position 3
→ Point mutation detected

Variants and extensions

Related algorithms

Damerau-Levenshtein:

+ transposition (swap)
"ab" → "ba" costs 1 instead of 2

Weighted edit distance:

Different costs for different operations
Takes keyboard layout into account

Longest Common Subsequence:

Only insertions/deletions
LD = m + n - 2×LCS