Pearson Correlation Coefficient

Calculator for the linear relationship with formulas and examples

Correlation Coefficient Calculator

What is calculated?

The Pearson correlation coefficient measures the strength of the linear relationship between two variables. Values between -1 and +1 indicate negative to positive correlation.

Input data

Data points separated by spaces

Same number of values as Variable X

Result
Pearson r:
Measure of linear association between variables

Correlation Info

Properties

Pearson correlation:

  • Range: [-1, +1]
  • +1 = perfect positive correlation
  • 0 = no linear correlation
  • -1 = perfect negative correlation

Linear: Measures only linear relationships, not curved or other nonlinear associations.

Interpretation
|r| ≥ 0.7: Strong correlation
0.3 ≤ |r| < 0.7: Moderate correlation
0.1 ≤ |r| < 0.3: Weak correlation
|r| < 0.1: No correlation
Related measures

→ Cosine similarity
Spearman rank: For nonlinear relationships
Kendall tau: Robust to outliers


Formulas for the Pearson correlation coefficient

Basic formula
\[r = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2 \sum(y_i - \bar{y})^2}}\] Standard Pearson correlation
Covariance form
\[r = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}\] With covariance and standard deviations
Computational formula
\[r = \frac{n\sum xy - \sum x \sum y}{\sqrt{(n\sum x^2 - (\sum x)^2)(n\sum y^2 - (\sum y)^2)}}\] Numerically stable computation
Z-score form
\[r = \frac{1}{n-1}\sum_{i=1}^n z_{x_i} z_{y_i}\] Using standardized values
Coefficient of determination
\[R^2 = r^2\] Proportion of explained variance
Fisher's Z-transform
\[z = \frac{1}{2}\ln\left(\frac{1+r}{1-r}\right)\] For significance testing

Detailed calculation example

Example: Correlation([1,2,3,4,5], [2,4,6,8,10])

Given:

  • X = [1, 2, 3, 4, 5]
  • Y = [2, 4, 6, 8, 10]
  • n = 5

Step 1 - Means:

\[\bar{x} = \frac{1+2+3+4+5}{5} = 3\] \[\bar{y} = \frac{2+4+6+8+10}{5} = 6\]

Step 2 - Deviations:

\[\sum(x_i - \bar{x})(y_i - \bar{y}) = 20\] \[\sum(x_i - \bar{x})^2 = 10\] \[\sum(y_i - \bar{y})^2 = 40\]

Step 3 - Correlation:

\[r = \frac{20}{\sqrt{10 \cdot 40}} = \frac{20}{20} = 1.0\]

Interpretation: Perfect positive correlation (r = 1.0), because Y = 2X for all data points.

Realistic example

Example: Temperature vs. ice cream sales

Data:

Temperature (°C): [20, 22, 25, 28, 30]
Ice cream sales (€): [150, 180, 220, 280, 320]

Calculation:

\[\bar{x} = 25°C, \bar{y} = 230€\] \[r \approx 0.98\]

Interpretation:

Very strong positive correlation (r = 0.98)
R² = 0.96 → 96% of variance in ice cream sales is explained by temperature

Correlation ≠ Causation

Important note: Correlation is not Causation

Example - spurious correlation:

Variable A: Number of storks
Variable B: Birth rate
Correlation: r = 0.62 (moderate positive)

Explanation:

Third variable: Rural vs. urban areas
Storks and higher birth rates both occur more often in rural areas.

Conclusion: High correlation does not automatically imply causation. Always consider potential confounders or alternative explanations!

Practical applications

Statistics & research
  • Hypothesis validation
  • Exploratory data analysis
  • Variable selection
  • Check multicollinearity
Finance
  • Portfolio diversification
  • Asset correlations
  • Risk management
  • Hedging strategies
Machine Learning
  • Feature selection
  • Dimensionality reduction
  • Preprocessing step
  • Model evaluation

Mathematical properties

Basic properties
  • Range: -1 ≤ r ≤ +1
  • Symmetry: r(X,Y) = r(Y,X)
  • Linear transformation: Invariant under affine transformations
  • Unitless: Independent of measurement units
Statistical properties
  • Linearity: Only linear relationships
  • Sensitive to outliers: Affected by extreme values
  • Normality: Test statistics under bivariate normality
  • Effect size: Measure of practical significance
Assumptions

Data type: At least interval-scaled data

Distribution: For tests: bivariate normality

Interpretation guide

Correlation strength by Cohen (1988)

Positive correlations:

r ≥ 0.7: Strong positive correlation
0.3 ≤ r < 0.7: Moderate positive correlation
0.1 ≤ r < 0.3: Weak positive correlation

Negative correlations:

r ≤ -0.7: Strong negative correlation
-0.7 < r ≤ -0.3: Moderate negative correlation
-0.3 < r ≤ -0.1: Weak negative correlation
|r| < 0.1: Practically no linear correlation

Note: This categorization is context-dependent. Different fields (e.g. psychology) may use other standards.