Learning the game 2010 -

Friday, May 2, 2014

Social security feuille de soin - address

When the carte vital is not accepted at a hospital (ex. american hospital), we have to send the document "La feuille de soin" to the social security office by ourselves to get refunded.

General instruction of La feuille de soin
http://www.ameli.fr/assures/soins-et-remboursements/comment-etre-rembourse/la-feuille-de-soins_paris.php

Where to send?
http://www.ameli.fr/assures/votre-caisse-paris/nous-contacter/envoi-de-vos-feuilles-de-soins-et-courriers_paris.php

Give old clothes - Le Relais

Le Relais is a group of companies, which deals with the recycle/reuse of clothes.
We can put the clothes which we don't wear any more in a box installed in several places in France. Then Le Relais separates the clothes based on their condition in order to sell at shops (Ding Fring), export to Africa, wiping cloth, material, or dispose as waste.

http://fr.wikipedia.org/wiki/Le_Relais
http://www.lerelais.org/webdocu.php?page=relais-images

Thursday, May 1, 2014

Free and quick disposition of big trash in Paris

Free and quick disposition of big trash in Paris

I had a mattress to dispose, Mairie de Paris has a service to pick up big trashes. It is free and very quick. I could order a pick up in the evening on the day.

http://www.paris.fr/pratique/environnement/ordures-menageres-tri/p5430
From formulaire in Enlèvement d'un objet encombrant: mode d'emploi

Sunday, October 6, 2013

Change the font of table of contents (TOC)

Here's how to change the font of TOC in Word
- select TOC, right click
- Edit field
- Choose TOC for "Field names", click on "Table of contents" button.
- Click on "Modify" button
- Choose "TOC1" "TOC2" "TOC3" and set font to Arial. DONE.

Thursday, January 3, 2013

Google Drive

1. Set up language of the application to ENGLISH (if using laptop bought in Japan)

1. Open your control panel
2. Go to System and Security
3. Click on System
4. Open Advanced system settings (in the left panel)
5. Click on Environment Variables
6. Click either the top (user) or bottom (system) New... button.
7. For Variable name input LANG and for Variable value enter en_US.

2. Download and install

Download the installer from google drive site. NOTE that customize the folder location "advanced setup".

Then you'll have the folder in the designated area and also in the laptop.

Unable to change language on Google Drive desktop application

Google drive

Sunday, December 30, 2012

statistic modeling: EM algorithm example with 1D mixed normal distribution

EMアルゴリズムの実装。

PRMLでは2次元の正規分布を例として扱っている。

この式は $\frac{1}{(\sqrt{2\pi})^m \sqrt{|S|}}\exp\!\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^{T}S^{-1}(\mathbf{x}-\boldsymbol{\mu})\right)$

が、ここではよりシンプルにするためにx軸のみ、つまり1次元のものを使う。

この式は $f(x)=\frac{1}{\sqrt{2\pi\sigma^{2}}} \exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2} \right)$

コードは以下のようになった。

# -------------------------------------------------
#  descripton
#     name : 1d normal distribution
# -------------------------------------------------

# ***********************
#   sub functions
# ***********************

# normal distribution
# same as dnorm() but using var instead of sd
normDist <- function(X,mu,vari) {
 1/sqrt(2*pi*vari)  *  exp( - (X-mu)^2 / (2*vari) )
}

# log liklifood (posterior probability)
logLikelihood <- function(X, mu, vari, Pi, K)
{
  Sum <- 0.0
  N <- length(X) # number of data
  for(n in 1:N)
  {  
    tmp <- 0.0
    for(k in 1:K)
    {
      tmp <- tmp + ( Pi[k] * normDist(X[n],mu[k],vari[k]) )
      # Note: "log" computes natural logarithms
    }
    Sum <- Sum + log(tmp)
  }

  return(Sum)
}


# ***********************
#   main computation
# ***********************

# -----------------------
#   Prepare data
# -----------------------
# load Old Faithful dataset, and normalize
data("faithful");
xx <- scale(faithful, apply(faithful, 2, mean), apply(faithful, 2, sd));
X <- xx[,1] # take only the first column this time.
X <- scale(X) # normalize, so that mean is 0 and var is 1. (why necessary?)
N = length(X)

# nb of distribution
K <- 2

# initialize the parameters
mu <- 0
# Note: vari <- mat.or.vec(1,K) can make zero matrix but in R 
# increasing the array length in loops is OK, so it is not necessary 
# to write like this.
increment <- 2/(K-1) # 2 is from 1 - (-1)
for(k in 1:K)
{
  # separately distribute between (-1, 1)
  mu[k] <- -1 + increment*(k-1)
}

vari <- 0
for(k in 1:K)
{
  # set to 1 for all distribution
  vari[k] <- 1
}

Pi <- 0
for(k in 1:K)
{
  Pi[k] <- 1/K
}

# for analysis
mu_initial <- mu
Pi_initial <- Pi

# prepare responsibility
resp <- mat.or.vec(N,K)

# compute initial log likelihood
like <- logLikelihood(X, mu, vari, Pi, K)

# initialize
likearray <- 0

nbIteration <- 1
while(1)
{
# -----------------------
# E-step:
#   compute the responsibility with the current parameters
# -----------------------
for(n in 1:N)
{
    # prepare denominator
    denominator = 0
    for(k in 1:K)
    {
     denominator <- denominator + Pi[k] * normDist(X[n], mu[k], vari[k])
    }

    # compute responsibility
    for(k in 1:K)
    {
     resp[n,k] <-  Pi[k] * normDist(X[n], mu[k], vari[k]) / denominator
    }
}

# -----------------------
# M-step:
# recompute the parameters with the computed responsibility.
# -----------------------

# initialize
mu_new <- 0
vari_new <- 0 
Pi_new <- 0

for(k in 1:K)
{
  # compute Nk
  Nk <- 0.0
  for(n in 1:N)
  {
    Nk <- Nk + resp[n,k]
  }
                                          
  # recompute mean
  mu_new[k] <- 0.0
  for(n in 1:N)
  {
    mu_new[k] <- mu_new[k] + resp[n,k] * X[n]
  }
  mu_new[k] <- mu_new[k]/Nk
  
  # recompute variance
  vari_new[k] <- 0.0
  for(n in 1:N)
  {
   vari_new[k] <- vari_new[k] + resp[n, k] * (X[n]-mu_new[k])^2
  }
  vari_new[k] <- vari_new[k]/Nk                                        
  
  # recompute mix ratio
  Pi_new[k] <- Nk/N
}   


# -----------------------
# judge of convergence
# -----------------------
like_new <- logLikelihood(X, mu_new, vari_new, Pi_new, K)
Diff <- like_new - like

# debug
likearray[nbIteration] <- like 

like <- like_new
mu <- mu_new
vari <- vari_new
Pi <- Pi_new

#if(nbIteration == 20)
if(Diff < 0.01)
{ 
  # print
  cat("nb of iteration", nbIteration, "\n")

  for(k in 1:K)
  {
  cat( "**** distribution", k, "****", "\n") # paste() is not necessary
  cat( "mean: ", mu[k], "\n")
  cat( "variance: ", vari[k], "\n")
  cat( "pi: ", Pi[k], "\n", "\n")   
  }
  break
}

nbIteration <- nbIteration + 1
} # close "while(1)"



# ***********************
#   Dispaly
# ***********************

# -----------------------------------
#   likelihood conversion bihavoir
# -----------------------------------
windows()
plot(likearray, type="o")
title("likelihood bihavoir")

# -------------------------------------------------------
#   original data and the estimated mixed distribution
# -------------------------------------------------------
windows()
hist(X, breaks=seq(-3,3,0.2), xlim=c(-3,3), ylim=c(0,1), freq=FALSE, 
      main="", xlab="", ylab="")
a <- seq(-5,5,length=1000)
for(k in 1:K)
{
par(new=T)
plot( a, dnorm(a, mu[k], sqrt(vari[k]))*Pi[k], xlab="", ylab="",
      xlim=c(-3,3), ylim=c(0,1), type="l", col=k+1)
}
title(main="data and estimation", xlab="x", ylab="density")

（Ｒのコンソールに直接張り付ければ良い）

結果は

nb of iteration 14
**** distribution 1 ****
mean: -1.286364
variance: 0.04316987
pi: 0.3488136

**** distribution 2 ****
mean: 0.6890519
variance: 0.1457242
pi: 0.6511864

log likelihood収束のようす

初期値の影響

Piの初期値が悪い（見た限りたして1以上になる場合）と、最初のiterationでは尤度は下がり、収束条件に引っかかってしまう。

muの初期条件で、十分に離れているものをとると、尤度は単調に増加し、早く収束する。（ここではPiは0.5, 0.5としている）

左: 0.979, 0.011

右: 0.516, 0.861

わかるように、muは均等に配分することで、ずっと早く収束する（コード参照）。

c( rnorm(400, 10, 2), rnorm(100, 2,1) )

参考サイト

Pattern Recognition and Machine Learning: Data Sets
Example 8.41: Scatterplot with marginal histograms (function of the plot with histogram)
R: 24. apply() ファミリー（このせつめい十分）
多変量正規分布（pdf）
混合ガウスモデルとEM（コードはこれをもとにした）
EMアルゴリズム答え合わせ（こちらも少し参考に）
EMアルゴリズムによる混合分布のパラメーター推定の解析計算＆実装例 from 「Rによるモンテカルロ法入門」（下のリンクとスライドを参考にした）
都市生活学・ネットワーク行動学研究室のPRML合宿2009（良く見ていないがまとまっていそう）
PRML 読書会 #12 ９章ＥＭアルゴリズム＆１０章変分ベイズ（下に英語pdfリンク）

Friday, December 28, 2012

Statistic modeling : EM algorithm

モデルの推定方法－EM algorithmの部分についてのまとめ。（サイトからのつぎはぎ）

ある確率変数 X = {Y , Z} があり，その一部 Y のみが観測でき，残り Z （＝潜在変数。下に説明）は観測できない状況を考える．観測データ {y1, y2, . . ., yT } が得られたときに，確率モデル p(x; θ) = p(y, z; θ)のパラメタ θ を推定したいとする（取り扱っている分布が、混合ガウス分布であればθはμ,Σ,πの3つ）。パラメータは、Xの分布からして、最もよさげなもの、というふに考える（最尤法）。p(y; θ) =∫p(y, z; θ)dz

最尤推定値を数値計算で求める代表的な方法
1．ニュートン・ラプソン法
2．Fisherのスコア法
3．EM algorithm

とあるが、Zの存在のために、これが簡単に解けず、EMアルゴリズムを使う（1や2では無理？）。モデルが混合分布でない場合は、このサイトで示しているように、反復なしで解がもとまる。

ステップ0．尤度関数をつくる

最初に、現象を式で表す必要がある。

観測された現象の起こる確率、イコール、観測されたデータについての尤度関数Lをつくる。

Given a statistical model consisting of a set $\mathbf{X}$ of observed data, a set of unobserved latent data or missing values $\mathbf{Z}$ , and a vector of unknown parameters $\boldsymbol\theta$ , along with a likelihood function $L(\boldsymbol\theta; \mathbf{X}, \mathbf{Z}) = p(\mathbf{X}, \mathbf{Z}|\boldsymbol\theta)$ , the maximum likelihood estimate (MLE) of the unknown parameters is determined by the marginal likelihood of the observed data

$L(\boldsymbol\theta; \mathbf{X}) = p(\mathbf{X}|\boldsymbol\theta) = \sum_{\mathbf{Z}} p(\mathbf{X}, \mathbf{Z}|\boldsymbol\theta)$

However, this quantity is often intractable(解決困難な)^{[citation needed]}.

The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying the following two steps:

ステップ1．E step

Eステップでは、完全データの対数尤度関数ln p(X, Z|θ)の期待値を計算する。最初の試行ではθに初期値を与える必要がある。また、この期待値を計算するために、潜在変数の事後分布p(Z|X, θold)を用いる。

対数尤度関数の期待値、という概念が分かりにくいので、追加説明。

確率論において、確率変数の期待値とは確率と確率変数を掛けた総和を取ったものである（高校数学）。ここでは、確率変数がln p(X, Z|θ)であり、その確率変数が出る確率が、p(Z|X, θold)となる。

wikiでは、
1/ Expectation step (E step): Calculate the expected value of the log likelihood function, with respect to the conditional distribution of $\mathbf{Z}$ given $\mathbf{X}$ under the current estimate of the parameters $\boldsymbol\theta^{(t)}$ :

$Q(\boldsymbol\theta|\boldsymbol\theta^{(t)}) = \operatorname{E}_{\mathbf{Z}|\mathbf{X},\boldsymbol\theta^{(t)}}\left[ \log L (\boldsymbol\theta;\mathbf{X},\mathbf{Z}) \right] \,$

となっているが、

この表現よりもPRML(equation 9.30, p156)の方が分かりやすい。つまり、

Q(θ, θold) = Σ p(Z|X, θold) ln p(X, Z|θ)

ステップ2．M step

Mステップでは、E ステップで求まった尤度の期待値を最大化するようなパラメータを求める。 M ステップで求まったパラメータは、次の E ステップで使われる潜在変数の分布を決定するために用いられる。

具体的には、θで偏微分した式イコール0となる等式を解く。

2/ Maximization step (M step): Find the parameter that maximizes this quantity

$\boldsymbol\theta^{(t+1)} = \underset{\boldsymbol\theta}{\operatorname{arg\,max}} \ Q(\boldsymbol\theta|\boldsymbol\theta^{(t)}) \,$ 　

*不完全データ（または隠れ変数、潜在変数）

通常Zで表す。
一般的にZとは、観測できている確率変数Xと区別される、観測できてない確率変数（XとZは対等）。とくに混合分布では、あるデータがどの分布に属するかという情報（ここではとりあえずこの意味に限る）。この場合、zは各xに与えられるK次元ベクトル（Kを分布の数）で、データxがどのガウス分布から生成されたかを表す。属する分布番号のみ1、他は0。

例1．2 次元の正規分布が 6 つ重なった正規混合分布を例に考えよう．この確率分布からデータが得られているとする．この場合，出力されるデータのみからはどの正規分布によって出力されたかが分らない．すなわち，どの正規分布から出力されたかという情報は観測できない．この「どの正規分布から発生したか」という情報が隠れ変数となる．正規混合分布の場合は隠れている確率変数は離散的な確率変数である．

例2．今、何カ国かの男性の身長データのサンプルがN個混在してあります。ただし、N個のデータはすべて、どの国の男性の身長データなのか分かっていません(不完全データ)。このとき、EMアルゴリズムを用いて、「何カ国のデータが混在しているのか※1」と、「各国の身長の平均値と分散※2」を同時に求めたいのですが、・・・（省略）