Levenshtein Distance (Edit Distance)
Calculator to compute the minimal edit operations with formulas and examples
Levenshtein Distance Calculator
What is calculated?
The Levenshtein distance (also called edit distance) is the minimal number of operations (insert, delete, substitute) required to transform one string into another.
Levenshtein Info
Operations
Allowed operations:
- Insert a character
- Delete a character
- Substitute a character
Applications: Spell checking, DNA sequence analysis, plagiarism detection and version control.
Special cases
LD("ABC", "ABC") = 0
LD("", "ABC") = 3 (3× insertions)
LD("CAT", "DOG") = 3
Related distances
→ Hamming distance
Damerau-Levenshtein: + transposition
Jaro-Winkler: For similar strings
|
Formulas for Levenshtein distance
Recursive definition
Matrix form
Initialization
Complexity
Normalized form
Similarity
Detailed calculation example
Example: Levenshtein("Das Tier im Zoo", "Das Tor am Zoo")
Given:
- String 1: "Das Tier im Zoo"
- String 2: "Das Tor am Zoo"
Required operations:
- Substitute i → o in "Tier"
- Delete e from "Tier"
- Substitute i → a in "im"
Transformation:
Result: Levenshtein distance = 3
Dynamic programming matrix
Example: computation for "KITTEN" → "SITTING"
DP matrix:
"" | S | I | T | T | I | N | G | |
---|---|---|---|---|---|---|---|---|
"" | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
K | 1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
I | 2 | 2 | 1 | 2 | 3 | 4 | 5 | 6 |
T | 3 | 3 | 2 | 1 | 2 | 3 | 4 | 5 |
T | 4 | 4 | 3 | 2 | 1 | 2 | 3 | 4 |
E | 5 | 5 | 4 | 3 | 2 | 2 | 3 | 4 |
N | 6 | 6 | 5 | 4 | 3 | 3 | 2 | 3 |
Operations:
- S for K
- I for I (match)
- T for T (match)
- T for T (match)
- I for E
- N for N (match)
- Insert G
Distance: 3
Practical applications
Text processing
- Spell checking
- Autocorrect
- Search suggestions
- Plagiarism detection
Bioinformatics
- DNA sequence alignment
- Protein comparisons
- Genetic distances
- Evolutionary analysis
Software development
- Version control (git diff)
- Code similarity
- Merge conflicts
- Refactoring tools
Algorithm & complexity
Dynamic programming vs. recursion
Naive recursion:
- Time complexity: O(3^min(m,n))
- Space complexity: O(min(m,n))
- Problem: Exponential growth
- Use: Only for very short strings
Dynamic programming:
- Time complexity: O(m×n)
- Space complexity: O(m×n)
- Advantage: Practical for long strings
- Optimization: O(min(m,n)) space possible
Optimizations:
Space optimization: Store only two rows of the matrix
Early termination: Abort early for large distances
Real-world examples
Search suggestions
Scenario: User types "Pariz" instead of "Paris"
• Paris: LD = 1 (z→s)
• Berlin: LD = 5
• London: LD = 6
→ Suggestion: Paris
DNA mutation
Scenario: Comparing gene sequences
Mutated: ATGGATCG
LD = 1: C→G at position 3
→ Point mutation detected
Variants and extensions
Related algorithms
Damerau-Levenshtein:
+ transposition (swap)
"ab" → "ba" costs 1 instead of 2
Weighted edit distance:
Different costs for different operations
Takes keyboard layout into account
Longest Common Subsequence:
Only insertions/deletions
LD = m + n - 2×LCS