Cosine Similarity

Calculator to compute cosine similarity with formulas and examples

Cosine Similarity Calculator

What is calculated?

The cosine similarity measures the similarity between two vectors by the cosine of the angle between them. Values near 1 indicate high similarity, near 0 orthogonality.

Input vectors

Values separated by spaces

Same number of values as Vector X

Result
Cosine similarity:
Based on the angle between vectors (direction-based)

Cosine Info

Properties

Cosine similarity:

  • Range: [-1, 1]
  • 1 = identical direction
  • 0 = orthogonal vectors
  • -1 = opposite direction

Direction-based: Vector magnitude is ignored — only direction matters.

Special cases
Parallel vectors:
cos(0°) = 1 (maximum similarity)
Orthogonal vectors:
cos(90°) = 0 (no similarity)
Antiparallel vectors:
cos(180°) = -1 (opposite)

Formulas for Cosine similarity

Similarity formula
\[\text{sim}(x,y) = \frac{x \cdot y}{||x||_2 \cdot ||y||_2}\] Cosine of the angle
Distance formula
\[d_{\cos}(x,y) = 1 - \frac{x \cdot y}{||x||_2 \cdot ||y||_2}\] Cosine distance
Dot product
\[x \cdot y = \sum_{i=1}^n x_i y_i\] Dot product
Euclidean norm
\[||x||_2 = \sqrt{\sum_{i=1}^n x_i^2}\] L₂-norm (magnitude)
Angle relation
\[\cos(\theta) = \frac{x \cdot y}{||x|| \cdot ||y||}\] Angle θ between vectors
Normalized vectors
\[\cos(\theta) = \hat{x} \cdot \hat{y}\] For unit vectors

Detailed calculation example

Example: compute Cosine([3,5], [0,3])

Given:

  • x = [3, 5]
  • y = [0, 3]

Step 1 - Dot product:

\[x \cdot y = 3 \cdot 0 + 5 \cdot 3 = 15\]

Step 2 - Norms:

\[||x||_2 = \sqrt{3^2 + 5^2} = \sqrt{34}\] \[||y||_2 = \sqrt{0^2 + 3^2} = 3\]

Step 3 - Similarity:

\[\text{sim} = \frac{15}{\sqrt{34} \cdot 3} = \frac{15}{3\sqrt{34}} = \frac{5}{\sqrt{34}}\]

Step 4 - Cosine distance:

\[d_{\cos} = 1 - \frac{5}{\sqrt{34}} \approx 1 - 0.858 = 0.142\]

Interpretation: The vectors have an angle of about 31° and are relatively similar (small distance).

Text analysis example

Example: Document similarity with TF-IDF

Document A:

"Cat sits on mat"
TF-IDF: [0.5, 0.3, 0.2, 0.0, 0.0]

Document B:

"Dog lies on sofa"
TF-IDF: [0.0, 0.0, 0.3, 0.4, 0.3]

Calculation:

\[\text{sim} = \frac{0.5 \cdot 0 + 0.3 \cdot 0 + 0.2 \cdot 0.3 + 0 \cdot 0.4 + 0 \cdot 0.3}{||(0.5,0.3,0.2,0,0)|| \cdot ||(0,0,0.3,0.4,0.3)||}\] \[= \frac{0.06}{\sqrt{0.38} \cdot \sqrt{0.34}} \approx \frac{0.06}{0.36} \approx 0.167\]

Result: Low similarity due to few shared terms (only "on").

Geometric interpretation

Angle and similarity
0° (parallel)
cos = 1.0

Identical direction

45°
cos = 0.707

High similarity

90° (orthogonal)
cos = 0.0

No similarity

180° (antiparallel)
cos = -1.0

Opposite

Note: Cosine similarity ignores vector length and focuses only on direction.

Practical applications

Information Retrieval
  • Document similarity
  • Search engine ranking
  • TF-IDF comparisons
  • Semantic search
Recommender systems
  • User-item matrices
  • Collaborative filtering
  • Product recommendations
  • Netflix-style algorithms
Machine Learning
  • Feature comparisons
  • Clustering algorithms
  • Neural networks
  • Similarity learning

Mathematical properties

Similarity properties
  • Range: [-1, 1]
  • Symmetry: sim(x,y) = sim(y,x)
  • Self-similarity: sim(x,x) = 1
  • Direction-based: Ignores magnitude
Geometric properties
  • Angle measure: Cosine of the enclosed angle
  • Projection-based: Uses the dot product
  • Normalization-invariant: Independent of vector lengths
  • Linearity: Linear with respect to dot product
Important notes

Zero vectors: Cosine is undefined if one of the vectors is the zero vector

Scaling: Multiplying by positive scalars does not change similarity

Comparison: Cosine vs. Pearson correlation

For vectors [1,2,3] and [2,4,6]

Cosine similarity:

\[\text{sim} = \frac{(1 \cdot 2 + 2 \cdot 4 + 3 \cdot 6)}{\sqrt{14} \cdot \sqrt{56}} = \frac{28}{28} = 1.0\]

Identical direction (perfect similarity)

Pearson correlation:

\[r = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2}\sqrt{\sum(y_i - \bar{y})^2}} = 1.0\]

Perfect linear correlation

Difference: Cosine ignores the mean while Pearson measures deviations from the mean.