None
Generate Lognormal distribution
Suppose we have lognormal returns:
from scipy.stats import shapiro, lognorm
from pingouin import qqplot
from seaborn import histplot
logn = lognorm(1,.0001,.001)
ret = logn.rvs(1000000)
Lognormal is not normal!
Lognormal doesn’t pass Shapiro test (keep number of samples reasonably small due to https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless):
T = 252
np.random.seed(42)
return_sample = np.random.choice(ret,T)
shapiro(return_sample)
ShapiroResult(statistic=0.6952812671661377, pvalue=4.193903852562234e-21)
Neither is Q-Q Plot any near to normality:
figure,axes = plt.subplots(1,2)
figure.set_size_inches((10,5))
histplot(return_sample, ax=axes[0])
qqplot(return_sample, ax=axes[1]);

Single day expected return
Based on lognormal assumption
Single day estimates via properties of lognormal distribution properties:
- means of random samples drawn from lognormal distribution should lie within 2σ of estimated mean of lognormal
m, v = logn.stats(moments="mv")
s = np.sqrt(v)
N = 1000
i=0
for _ in range(N):
m_ = np.random.choice(ret,252).mean()
if np.abs(m_-m)>2*s: # within 2σ of lognormal
i+=1
print(i/N)
print(m,s)
0.0 0.0017487212707001283 0.0021611974158950876
Based on normality assumption
Single day estimates are insensitive to distribution assumptions:
m = ret.mean()
s = ret.std()
N = 1000
i=0
for _ in range(N):
m_ = np.random.choice(ret,252).mean()
if np.abs(m_-m)>2*s: # within 2σ of normal
i+=1
print(i/N)
print(m,s)
0.0 0.001750051880319707 0.0021519994287162256
Compounded return
Based on lognormal assumption
Though if we try for compounding returns only μ and σ for lognormal ensure the right brackets:
log_ret = np.log(1+ret)
m_log = log_ret.mean()*T
s_log = log_ret.std()*np.sqrt(T)
m,v= lognorm.stats(1,m_log,s_log, moments="mv")
s = np.sqrt(v)
N = 1000
i=0
for _ in range(N):
m_ = np.prod(1+np.random.choice(ret,T)) - 1
if np.abs(m_-m)>2*s:
i+=1
print(i/N)
print(m,s)
0.054 0.4959416758483652 0.07326363013031845
If normality assumed
Note, if we try to estimate μ of compounding returns based on normality assumption, there will be unacceptable number of means out of confidense interval µ±2σ:
m = (1+ret.mean())**T -1
s = ret.std()*np.sqrt(T)
N = 1000
i=0
for _ in range(N):
m_ = np.prod(1+np.random.choice(ret,T)) - 1
if np.abs(m_-m)>2*s:
i+=1
print(i/N)
print(m,s)
0.186 0.5536820431890146 0.034161931859617224
Conclusions
- Single day estimates are insensitive to probability assumptions
- Compounded returns estimates should be based on log returns (normal)
- There are two ways to estimate μ and σ of lognormal distribution:
- through 3 parameter
scipy.stats.lognorm.stats(s, loc, scale, moments = 'mv')
, or lognstat(mu, sigma)
- through 3 parameter
def lognstat(mu, sigma):
"""Calculate the mean of and variance of the lognormal distribution given
the mean (`mu`) and standard deviation (`sigma`), of the associated normal
distribution."""
m = np.exp(mu + sigma**2 / 2.0)
v = np.exp(2 * mu + sigma**2) * (np.exp(sigma**2) - 1)
return m, v
- Note, estimating properties of log returns does not rely on normality assumption, because adding several random non-normal distributions results in a normal distribution, propeerties of which are well known
1 Comment
Аркадий
14.03.2021
Understanding the formula used to calculate CAGR is an introduction to many other ways investors evaluate past returns or estimate future profits. The formula can be manipulated algebraically into a formula to find the present value or future value of money, or to calculate a hurdle rate of return.
Write a comment: