Statistics & Random Numbers

Random number generation

Use rng(seed) at the start of any script that needs reproducible output.

Function	Description
`rand()`	scalar uniform in [0, 1)
`rand(n)`	n×n uniform matrix
`rand(m, n)`	m×n uniform matrix
`randn()`	scalar standard-normal sample
`randn(n)` / `randn(m, n)`	standard-normal matrix
`randi(max)`	random integer in [1, max]
`randi(max, n)` / `randi(max, m, n)`	matrix of random integers
`randi([lo hi], ...)`	integers from [lo, hi]
`rng(seed)`	seed RNG — same seed → same sequence
`rng('shuffle')`	reseed from OS entropy

rng(42)
x = randn(1, 5)         % reproducible 5-element sequence
d = randi(6, 1, 10)     % ten dice rolls

Descriptive statistics

All functions operate column-wise on M×N matrices and collapse to a scalar for vectors.

Function	Description
`std(v)`	sample standard deviation (n-1 denominator)
`std(v, 1)`	population standard deviation (n denominator)
`var(v)` / `var(v, 1)`	sample / population variance
`median(v)`	median (linear interpolation for even length)
`mode(v)`	most frequent value; smallest wins on ties
`cov(v)`	variance of a vector
`cov(A)`	N×N covariance matrix of an m×N data matrix

v = [2 4 4 4 5 5 7 9];
mean(v)      % 5.0
std(v)       % sample std ≈ 2.138
std(v, 1)    % population std = 2.0
median(v)    % 4.5
mode(v)      % 4

Shape statistics

These functions measure the shape of a distribution (symmetry and peakedness). Both use the population (biased) central-moment formula.

Function	Description
`skewness(v)`	`m3 / m2^(3/2)` — zero for symmetric data, positive for long right tail
`kurtosis(v)`	`m4 / m2^2` — ≈ 1.8 for uniform, ≈ 3 for normal, > 3 for heavy tails

Returns 0 for a scalar or constant vector; kurtosis returns NaN for constant data. Column-wise on M×N matrices, same as std / var.

v = [2 4 4 4 5 5 7 9];
skewness(v)    % 0.656  (slight right skew)
kurtosis(v)    % 2.781  (slightly platykurtic)

% Symmetric data → skewness exactly 0:
skewness(1:10)   % 0
kurtosis(1:10)   % 1.776

Percentiles and spread

Function	Description
`prctile(v, p)`	p-th percentile; `p` can be a vector
`iqr(v)`	interquartile range: `prctile(75) - prctile(25)`
`zscore(v)`	standardise: `(v - mean) / std`, same shape

v = [1 2 3 4 5 6 7 8];
prctile(v, 50)          % 4.5  (median)
prctile(v, [25 75])     % [2.75  6.25]  (quartiles)
iqr(v)                  % 3.5

z = zscore([2 4 6]);    % z = [-1  0  1]

Outlier detection (1.5 × IQR rule)

q1 = prctile(data, 25);
q3 = prctile(data, 75);
fence_lo = q1 - 1.5 * iqr(data);
fence_hi = q3 + 1.5 * iqr(data);
outliers = data(data < fence_lo | data > fence_hi);

Histogram

hist prints an ASCII bar chart to stdout and returns Void. histc returns a count vector for user-supplied bin edges.

hist(data)           % 10 bins (default)
hist(data, 20)       % 20 bins

edges = [0 10 20 30 40 50];
counts = histc(data, edges)

histc bin semantics: bin i counts elements where edges(i) <= x < edges(i+1); the last bin counts x == edges(end) exactly.

Normal distribution

Function	Description
`normcdf(x)`	P(Z ≤ x), Z ~ N(0, 1)
`normcdf(x, mu, s)`	P(X ≤ x), X ~ N(mu, s²)
`normpdf(x)`	standard normal PDF
`normpdf(x, mu, s)`	general normal PDF
`erf(x)`	Gauss error function
`erfc(x)`	1 − erf(x)

All six functions work element-wise on scalars and matrices.

normcdf(0)                        % 0.5
normcdf(1) - normcdf(-1)          % 0.6827  (68% rule)
normcdf(2) - normcdf(-2)          % 0.9545  (95% rule)
normcdf(3) - normcdf(-3)          % 0.9973  (99.7% rule)

% Probability that X ~ N(50, 10) falls between 40 and 60:
normcdf(60, 50, 10) - normcdf(40, 50, 10)   % ≈ 0.6827

The relationship between normcdf and erf:

normcdf(x) = 0.5 * (1 + erf(x / sqrt(2)))

Full example

% Generate 200 samples from N(50, 10) and analyse them.
rng(7)
n    = 200;
data = 50 + 10 * randn(1, n);

fprintf('mean     = %.4f\n', mean(data))
fprintf('std      = %.4f\n', std(data))
fprintf('median   = %.4f\n', median(data))
fprintf('IQR      = %.4f\n', iqr(data))
fprintf('skewness = %.4f\n', skewness(data))
fprintf('kurtosis = %.4f\n', kurtosis(data))

% Percentile table
pct = prctile(data, [5 25 50 75 95]);
fprintf('P5/P25/P50/P75/P95 = %.1f  %.1f  %.1f  %.1f  %.1f\n', ...
  pct(1), pct(2), pct(3), pct(4), pct(5))

% ASCII histogram
hist(data, 12)

See the full demo at examples/statistics.calc.

Keyboard shortcuts

ccalc