Canberra Distance
Calculator to compute the weighted Canberra distance with formulas and examples
Canberra Distance Calculator
What is calculated?
The Canberra distance is a weighted version of the Manhattan distance. It normalizes each component by the sum of absolute values and is robust against outliers.
Canberra Info
Properties
Canberra distance:
- Range: [0, n] (n = number of dimensions)
- Weighted Manhattan distance
- Per-component normalization
- Robust to outliers
Advantage: Less sensitive to large values than Euclidean distance because each component is normalized individually.
Special cases
If xi + yi = 0 the component is ignored
Are dampened by normalization
Receive higher weight
Related distances
→ Manhattan distance
→ Bray-Curtis distance
→ Minkowski distance
Formulas for Canberra distance
Basic formula
Weighted form
Normalized form
Limit case (xi + yi ≠ 0)
Symmetry
Range
Detailed calculation example
Example: compute Canberra([3,4,5], [2,3,6])
Given:
- x = [3, 4, 5]
- y = [2, 3, 6]
Step 1 - Component 1:
Step 2 - Component 2:
Step 3 - Component 3:
Step 4 - Total sum:
Interpretation: Each component is weighted individually based on the sum of absolute values.
Robustness to outliers
Example: comparison with and without outlier
Normal values:
x = [1, 2, 3], y = [1, 3, 2]
With outlier:
x = [1, 2, 100], y = [1, 3, 2]
Euclidean distance normal:
Euclidean distance with outlier:
Conclusion: The Canberra distance is less influenced by outliers (factor 2.9 vs. factor 69.5 for Euclidean).
Practical applications
Data Mining
- Dataset similarity
- Clustering with outliers
- Anomaly detection
- Dimensionality reduction
Information retrieval
- Document similarity
- Text analysis
- Search engine ranking
- Recommendation systems
Time series analysis
- Comparing time series
- Pattern recognition
- Trend analysis
- Financial market analysis
Mathematical properties
Metric properties
- Non-negativity: d_C(x,y) ≥ 0
- Symmetry: d_C(x,y) = d_C(y,x)
- Identity: d_C(x,x) = 0
- Triangle inequality: Not always satisfied
Special properties
- Weighting: Per-component normalization
- Robustness: Less sensitive to outliers
- Scaling: Components are individually scaled
- Range: [0, n] for n dimensions
Important notes
Division by zero: If |xi| + |yi| = 0 the component is ignored or treated as 0
Interpretability: Each component contributes at most 1 to the total distance
Comparison with other distance measures
For vectors [1,2,10] and [2,1,1]
Canberra
Weighted normalization
Euclidean
Highly affected by outlier
Manhattan
Sum of absolute differences
Bray-Curtis
Global normalization
Observation: Canberra distance dampens the influence of the large component (10 vs 1) through individual normalization.