None

# Generate Lognormal distribution

Suppose we have lognormal returns:

```from scipy.stats import shapiro, lognorm
from pingouin import qqplot
from seaborn import histplot
logn = lognorm(1,.0001,.001)
ret = logn.rvs(1000000)
```

# Lognormal is not normal!

Lognormal doesn’t pass Shapiro test (keep number of samples reasonably small due to https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless):

```T = 252
np.random.seed(42)
return_sample = np.random.choice(ret,T)
shapiro(return_sample)
```
`ShapiroResult(statistic=0.6952812671661377, pvalue=4.193903852562234e-21)`

Neither is Q-Q Plot any near to normality:

```figure,axes = plt.subplots(1,2)
figure.set_size_inches((10,5))
histplot(return_sample, ax=axes)
qqplot(return_sample, ax=axes);
``` # Single day expected return

## Based on lognormal assumption

Single day estimates via properties of lognormal distribution properties:

• means of random samples drawn from lognormal distribution should lie within 2σ of estimated mean of lognormal
```m, v = logn.stats(moments="mv")
s = np.sqrt(v)
N = 1000
i=0
for _ in range(N):
m_ = np.random.choice(ret,252).mean()
if np.abs(m_-m)>2*s: # within 2σ of lognormal
i+=1
print(i/N)
print(m,s)
```
```0.0
0.0017487212707001283 0.0021611974158950876
```

## Based on normality assumption

Single day estimates are insensitive to distribution assumptions:

```m = ret.mean()
s = ret.std()
N = 1000
i=0
for _ in range(N):
m_ = np.random.choice(ret,252).mean()
if np.abs(m_-m)>2*s: # within 2σ of normal
i+=1
print(i/N)
print(m,s)
```
```0.0
0.001750051880319707 0.0021519994287162256
```

# Compounded return

## Based on lognormal assumption

Though if we try for compounding returns only μ and σ for lognormal ensure the right brackets:

```log_ret = np.log(1+ret)
m_log = log_ret.mean()*T
s_log = log_ret.std()*np.sqrt(T)
m,v= lognorm.stats(1,m_log,s_log, moments="mv")
s = np.sqrt(v)
N = 1000
i=0
for _ in range(N):
m_ = np.prod(1+np.random.choice(ret,T)) - 1
if np.abs(m_-m)>2*s:
i+=1
print(i/N)
print(m,s)
```
```0.054
0.4959416758483652 0.07326363013031845
```

## If normality assumed

Note, if we try to estimate μ of compounding returns based on normality assumption, there will be unacceptable number of means out of confidense interval µ±2σ:

```m = (1+ret.mean())**T -1
s = ret.std()*np.sqrt(T)
N = 1000
i=0
for _ in range(N):
m_ = np.prod(1+np.random.choice(ret,T)) - 1
if np.abs(m_-m)>2*s:
i+=1
print(i/N)
print(m,s)
```
```0.186
0.5536820431890146 0.034161931859617224
```

# Conclusions

• Single day estimates are insensitive to probability assumptions
• Compounded returns estimates should be based on log returns (normal)
• There are two ways to estimate μ and σ of lognormal distribution:
• through 3 parameter `scipy.stats.lognorm.stats(s, loc, scale, moments = 'mv')`, or
• `lognstat(mu, sigma)`
``````def lognstat(mu, sigma):
"""Calculate the mean of and variance of the lognormal distribution given
the mean (`mu`) and standard deviation (`sigma`), of the associated normal
distribution."""
m = np.exp(mu + sigma**2 / 2.0)
v = np.exp(2 * mu + sigma**2) * (np.exp(sigma**2) - 1)
return m, v``````
• Note, estimating properties of log returns does not rely on normality assumption, because adding several random non-normal distributions results in a normal distribution, propeerties of which are well known
##### 1 Comment
Write
1. ###### 14.03.2021

Understanding the formula used to calculate CAGR is an introduction to many other ways investors evaluate past returns or estimate future profits. The formula can be manipulated algebraically into a formula to find the present value or future value of money, or to calculate a hurdle rate of return.