2018-11-02

変動値の従う分布(4)

R ドル円為替統計

denovor.hatenablog.com
denovor.hatenablog.com
denovor.hatenablog.com

これまでのまとめとして、ドル円の変動値の基本要約量を計算しておく。
以下は順に平均、標準偏差、歪度、尖度となる。

d <- read.csv("~/Documents/FX/USDJPY/USDJPYD.csv")

co <- d$closing-d$opening
ho <-d$high-d$opening
ol <- d$low-d$opening
oo <- diff(d$opening, lag = 1)

#対数正規分布を用いるために0を0.01に変換しておく
ho <- replace(ho, which(ho == 0), 0.01)
ol <- replace(ol, which(ol == 0), -0.01)

#順に平均、標準偏差、歪度、尖度
new_summary <- function (x) {
  return(
  list = c(mean(x), sd(x), skewness(x), kurtosis(x))
  )
}

new_summary(co)
new_summary(ho)
new_summary(ol)
new_summary(oo)

結果は以下。

> new_summary(co)
[1] -0.001664452  0.655583367 -0.117969245  7.450300434
> new_summary(ho)
[1]  0.4350133  0.4404715  2.5514705 15.4644506
> new_summary(ol)
[1] -0.4936977  0.5329145 -3.6730693 30.3727462
> new_summary(oo)
[1] -0.001831173  0.669821599 -0.153337597  7.581648452

（終値-始値）および（翌日の始値-当日の始値）はそれぞれ自由度6のt分布、（高値-始値）および（始値-安値）は指数分布に当てはまるとしたのであった。
t分布の標準偏差は自由度 $v$ が $2$ 以上のときに $\sqrt{\frac{v}{v-2}}$ となる。歪度は $v>3$ のときは常に $0$ 、尖度は $v>4$ の時 $\frac{6}{v-4}$ となる。
したがって、まず尖度から決めると、 $\frac{6}{v-4}=7.5$ と解いて、 $v=4.8$ として、スケールを $\frac{1}{2}$ とすれば良い。

高値と安値の方は、指数分布 $\beta\exp(-\beta y)$ の平均が $\frac{1}{\beta}$ であることから、 $\beta = 0.2$ とすれば良い。

2018-10-31

変動値の従う分布(3)

R 為替ドル円

denovor.hatenablog.com

前回の続き。Cullne and Frey plotからはドル円の（高値-始値）および（始値-安値）は対数正規分布に当てはまりが良さそうであることがわかった。
fitdistで分布を確認しておく。

d <- read.csv("~/Documents/FX/USDJPY/USDJPYD.csv")

co <- d$closing-d$opening
ho <-d$high-d$opening
ol <- d$low-d$opening
oo <- diff(d$opening, lag = 1)

#対数正規分布を用いるために0を0.01に変換しておく
ho <- replace(ho, which(ho == 0), 0.01)
ol <- replace(ol, which(ol == 0), -0.01)

fit1 <- fitdist(ho, "norm")
fit2 <- fitdist(ho, "exp")
fit3 <- fitdist(ho, "lnorm")
gofstat(list(fit1, fit2, fit3), fitnames = c("norm", "exp", "Lognormal"))

fit4 <- fitdist(-ol, "norm")
fit5 <- fitdist(-ol, "exp")
fit6 <- fitdist(-ol, "lnorm")
gofstat(list(fit4, fit5, fit6), fitnames = c("norm", "exp", "Lognormal"))

重要なのは対数正規分布を用いるので、データの中で0になる部分を影響が出ないように0.01（ドル円だと0.1銭）に置き換えていることである。
結果は以下のようになる。上が（高値-始値）、下が（始値-安値）である。

Goodness-of-fit statistics
                                   norm       exp   Lognormal
Kolmogorov-Smirnov statistic  0.1672559 0.0227256  0.07605878
Cramer-von Mises statistic   24.0586246 0.2497171  5.26195064
Anderson-Darling statistic          Inf 2.4998433 35.49801968

Goodness-of-fit criteria
                                   norm     exp Lognormal
Akaike's Information Criterion 3609.154 1011.08  1356.718
Bayesian Information Criterion 3621.173 1017.09  1368.737

Goodness-of-fit statistics
                                   norm        exp   Lognormal
Kolmogorov-Smirnov statistic  0.1819928 0.02441634  0.07951322
Cramer-von Mises statistic   27.5439694 0.37319902  5.55319056
Anderson-Darling statistic          Inf 3.17527521 36.01890155

Goodness-of-fit criteria
                                   norm      exp Lognormal
Akaike's Information Criterion 4756.056 1772.892  2081.480
Bayesian Information Criterion 4768.076 1778.901  2093.499

どちらの場合も、検定値やAICを見ると、対数正規分布よりは指数分布の方が当てはまりが良さそうである。
図示は以下のようになる。左が指数分布、右が対数正規分布である。上下は上が高値、下が安値である。

f:id:denovor:20181031225350p:plain

2018-10-31

変動値の従う分布(2)

R 為替ドル円

denovor.hatenablog.com

こちらの記事ではドル円の1日の変動と正規分布やCauchy、指数分布への当てはまりを考察した。
今回は、いかなる分布が最も良い当てはまりとなるのかを考えてみる。
fitdistrplusパッケージのdescdistを用いる。
（翌日の始値-当日の始値）も設定しておく（理論的には（当日の終値=翌日の始値）となるが、週末などギャップ（窓）が生じることもあるので）。

d <- read.csv("~/Documents/FX/USDJPY/USDJPYD.csv")

co <- d$closing-d$opening
ho <-d$high-d$opening
ol <- d$low-d$opening
oo <- diff(d$opening, lag = 1)

descdist(co, boot = 1000)
descdist(ho, boot = 1000)
descdist(ol, boot = 1000)
descdist(oo, boot = 1000)

Cullen and Freyの結果はこちら。

f:id:denovor:20181031194711p:plain

これを見ると、（高値-始値）および（始値-安値）は対数正規分布への当てはまりが良さそうである。
（終値-始値）および（翌日の始値-当日の始値）は当てはまる良い分布は一覧の中からは無さそうである。
そこで、t分布を代用できないか検討してみる。t分布の歪度（3次のモーメント）は自由度が高まると0に収束するからである。

k <- 30 #自由度の最大値の設定
x <- array(rep(0, k)) #歪度を入れる配列
y <- array(rep(0, k)) #尖度を入れる配列

for ( i in 3:k) {
  random_data <- rt(1000000, i) #1,000,000までのt分布に従う乱数の設定
  x[i] <- skewness(random_data) #歪度を代入
  y[i] <- kurtosis(random_data) #尖度を代入
}

x
y

> x
 [1]  0.0000000000  0.0000000000 -0.2250361592 -0.1583386859
 [5]  0.0289653880 -0.0102002463  0.0005842145 -0.0011316503
 [9]  0.0038441844  0.0060417318 -0.0009067284  0.0019035279
[13] -0.0046501525 -0.0014046954 -0.0005866513  0.0017743852
[17]  0.0007997743  0.0063177695  0.0020007963  0.0051682809
[21] -0.0006535019  0.0001786988 -0.0044503054  0.0053827387
[25] -0.0024208226 -0.0024442228  0.0056115286 -0.0044400234
[29]  0.0061474931 -0.0045200204
> y
 [1]   0.000000   0.000000 235.979663  19.873776   8.214593
 [6]   6.071697   4.989947   4.556066   4.159974   3.972367
[11]   3.854875   3.735797   3.659998   3.583152   3.541952
[16]   3.501044   3.466755   3.422839   3.400555   3.356011
[21]   3.346898   3.328669   3.312828   3.302193   3.285795
[26]   3.276926   3.265179   3.264603   3.245709   3.237856

結果は上記の通りで、自由度は5、あるいは6とするのが良さそうである。

2018-10-31

変動値の従う分布

ドル円為替 R

ドル円の（終値-始値）が正規分布に従うという仮説は棄却された。

denovor.hatenablog.com

それではドル円の分布はどのような確率密度関数に従うのか。
ここでは（終値-始値）だけではなく、(高値-始値)および(始値-安値)の分布の形状を調べてみる。
fitdistrplusパッケージを用いる。
まずは分布の可視化から。

d <- read.csv("~/Documents/FX/USDJPY/USDJPYD.csv")

co <- d$closing-d$opening
ho <-d$high-d$opening
ol <- d$low-d$opening
new_data <- data.frame(ho, ol)

ggplot() + theme_set(theme_bw(base_size = 14))
g <- ggplot(new_data)
g <- g + geom_histogram(aes(x = new_data$ho, fill = "high-opening"), binwidth = 0.1)
g <- g + geom_histogram(aes(x = new_data$ol, fill = "opening-low"), binwidth = 0.1)
g <- g + labs(fill = "high_low")
plot(g)

f:id:denovor:20181031152909p:plain

正規分布には従わない印象であるが、念のためにShapiro-Wilk検定を行っておく。

shapiro.test(ho)
shapiro.test(ol)

結果は以下。正規分布に従うという仮説は棄却される。

Shapiro-Wilk normality test

data:  ho
W = 0.78896, p-value < 2.2e-16
data:  ol
W = 0.72506, p-value < 2.2e-16

そこで、コーシー分布と指数分布への当てはまりを確認してみる。まずは(高値-始値)から。

fit1 <- fitdist(ho, "norm")
fit2 <- fitdist(ho, "cauchy")
fit3 <- fitdist(ho, "exp")
gofstat(list(fit1, fit2, fit3), fitnames = c("norm", "cauchy", "exp"))

結果は以下。

Goodness-of-fit statistics
                                   norm      cauchy        exp
Kolmogorov-Smirnov statistic  0.1619078   0.1914885 0.02524917
Cramer-von Mises statistic   23.8853432  21.7252403 0.25699176
Anderson-Darling statistic          Inf 145.2124529        Inf

Goodness-of-fit criteria
                                   norm   cauchy      exp
Akaike's Information Criterion 3612.521 3067.788 1007.585
Bayesian Information Criterion 3624.540 3079.807 1013.595

指数分布に良い当てはまりを見せている事がわかる。
（始値-安値）でも同様に。ただし、値にマイナスをかけておくのを忘れないように。

fit4 <- fitdist(-ol, "norm")
fit5 <- fitdist(-ol, "cauchy")
fit6 <- fitdist(-ol, "exp")
gofstat(list(fit4, fit5, fit6), fitnames = c("norm", "cauchy", "exp"))

Goodness-of-fit statistics
                                   norm      cauchy        exp
Kolmogorov-Smirnov statistic  0.1772351   0.1884551 0.02448435
Cramer-von Mises statistic   27.4343857  20.7339731 0.37108041
Anderson-Darling statistic          Inf 139.0491252        Inf

Goodness-of-fit criteria
                                   norm   cauchy      exp
Akaike's Information Criterion 4757.880 3722.788 1770.744
Bayesian Information Criterion 4769.899 3734.808 1776.754

こちらも指数分布に良い当てはまりがありそうだ。ともに図示しておく。上が（高値-始値）。

f:id:denovor:20181031154122p:plain f:id:denovor:20181031154143p:plain

最後に（終値-始値）の解析。こちらは分布にコーシー分布を仮定する。

fit7 <- fitdist(co, "norm")
fit8 <- fitdist(co, "cauchy")
gofstat(list(fit7, fit8), fitnames = c("norm", "cauchy"))

結果および図示はこちら。

f:id:denovor:20181031155850p:plain

Goodness-of-fit statistics
                                   norm      cauchy
Kolmogorov-Smirnov statistic  0.0667175  0.05095561
Cramer-von Mises statistic    4.6610269  1.42581370
Anderson-Darling statistic   26.6377431 20.49125071

Goodness-of-fit criteria
                                   norm   cauchy
Akaike's Information Criterion 6003.186 6115.888
Bayesian Information Criterion 6015.206 6127.907
[f:id:denovor:20181031155850p:plain]

AICをみるとどうも正規分布よりも当てはまりは悪そうである。

2018-10-29

ドル円の変動

ドル円の(終値-始値)の変動の分布を描く。
正規性の検定をShapiro-Wilk検定で行う。

d <- read.csv("~/Documents/FX/USDJPY/USDJPYD.csv")

co <- d$closing-d$opening
new_data <- data.frame(co)

ggplot() + theme_set(theme_bw(base_size = 14))
g <- ggplot(new_data, aes(x = new_data$co))
g <- g + geom_histogram(binwidth = 0.1)
g <- g + xlab("closing-opening")
plot(g)

結果は以下。

f:id:denovor:20181029222807p:plain

Shapiro-Wilk検定は次のコードで簡単に行える。

shapiro.test(co)

Shapiro-Wilk normality test

data:  co
W = 0.95404, p-value < 2.2e-16

以上からドル円の(終値-始値)は正規分布に従うと言えない。

2018-10-27

単位根過程（その2）

為替

前回と同じであるが、ADF検定も行っておく。

denovor.hatenablog.com

> summary(ur.df(d$closing, type = "none"))

############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression none 


Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.6601 -0.3179  0.0021  0.3462  5.1917 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
z.lag.1    -4.629e-05  1.189e-04  -0.389   0.6971  
z.diff.lag -3.363e-02  1.823e-02  -1.844   0.0652 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6638 on 3002 degrees of freedom
Multiple R-squared:  0.001182,	Adjusted R-squared:  0.000517 
F-statistic: 1.777 on 2 and 3002 DF,  p-value: 0.1693


Value of test-statistic is: -0.3893 

Critical values for test statistics: 
      1pct  5pct 10pct
tau1 -2.58 -1.95 -1.62