Canberra Distance

Calculator to compute the weighted Canberra distance with formulas and examples

Canberra Distance Calculator

What is calculated?

The Canberra distance is a weighted version of the Manhattan distance. It normalizes each component by the sum of absolute values and is robust against outliers.

Input vectors

Values separated by spaces

Same number of values as Vector X

Result
Canberra distance:
Weighted distance with per-component normalization

Canberra Info

Properties

Canberra distance:

  • Range: [0, n] (n = number of dimensions)
  • Weighted Manhattan distance
  • Per-component normalization
  • Robust to outliers

Advantage: Less sensitive to large values than Euclidean distance because each component is normalized individually.

Special cases
Zero components:
If xi + yi = 0 the component is ignored
Large differences:
Are dampened by normalization
Small values:
Receive higher weight

Formulas for Canberra distance

Basic formula
\[d_C(x,y) = \sum_{i=1}^n \frac{|x_i - y_i|}{|x_i| + |y_i|}\] Standard Canberra distance
Weighted form
\[d_C(x,y) = \sum_{i=1}^n w_i \frac{|x_i - y_i|}{|x_i| + |y_i|}\] With weights wi
Normalized form
\[d_C(x,y) = \frac{1}{n} \sum_{i=1}^n \frac{|x_i - y_i|}{|x_i| + |y_i|}\] Average distance
Limit case (xi + yi ≠ 0)
\[\lim_{x_i, y_i \to 0} \frac{|x_i - y_i|}{|x_i| + |y_i|} = 0\] For small values
Symmetry
\[d_C(x,y) = d_C(y,x)\] Symmetric property
Range
\[0 \leq d_C(x,y) \leq n\] n = number of dimensions

Detailed calculation example

Example: compute Canberra([3,4,5], [2,3,6])

Given:

  • x = [3, 4, 5]
  • y = [2, 3, 6]

Step 1 - Component 1:

\[\frac{|3-2|}{|3|+|2|} = \frac{1}{5} = 0.2\]

Step 2 - Component 2:

\[\frac{|4-3|}{|4|+|3|} = \frac{1}{7} = 0.143\]

Step 3 - Component 3:

\[\frac{|5-6|}{|5|+|6|} = \frac{1}{11} = 0.091\]

Step 4 - Total sum:

\[d_C = 0.2 + 0.143 + 0.091 = 0.434\]

Interpretation: Each component is weighted individually based on the sum of absolute values.

Robustness to outliers

Example: comparison with and without outlier

Normal values:

x = [1, 2, 3], y = [1, 3, 2]

\[d_C = \frac{0}{2} + \frac{1}{5} + \frac{1}{5} = 0.4\]

With outlier:

x = [1, 2, 100], y = [1, 3, 2]

\[d_C = \frac{0}{2} + \frac{1}{5} + \frac{98}{102} ≈ 1.16\]

Euclidean distance normal:

\[\sqrt{0^2 + 1^2 + 1^2} = \sqrt{2} ≈ 1.41\]

Euclidean distance with outlier:

\[\sqrt{0^2 + 1^2 + 98^2} ≈ 98.01\]

Conclusion: The Canberra distance is less influenced by outliers (factor 2.9 vs. factor 69.5 for Euclidean).

Practical applications

Data Mining
  • Dataset similarity
  • Clustering with outliers
  • Anomaly detection
  • Dimensionality reduction
Information retrieval
  • Document similarity
  • Text analysis
  • Search engine ranking
  • Recommendation systems
Time series analysis
  • Comparing time series
  • Pattern recognition
  • Trend analysis
  • Financial market analysis

Mathematical properties

Metric properties
  • Non-negativity: d_C(x,y) ≥ 0
  • Symmetry: d_C(x,y) = d_C(y,x)
  • Identity: d_C(x,x) = 0
  • Triangle inequality: Not always satisfied
Special properties
  • Weighting: Per-component normalization
  • Robustness: Less sensitive to outliers
  • Scaling: Components are individually scaled
  • Range: [0, n] for n dimensions
Important notes

Division by zero: If |xi| + |yi| = 0 the component is ignored or treated as 0

Interpretability: Each component contributes at most 1 to the total distance

Comparison with other distance measures

For vectors [1,2,10] and [2,1,1]
Canberra
1.491

Weighted normalization

Euclidean
9.055

Highly affected by outlier

Manhattan
10.000

Sum of absolute differences

Bray-Curtis
0.714

Global normalization

Observation: Canberra distance dampens the influence of the large component (10 vs 1) through individual normalization.