normal_ function(x,c=log(10),k1=200,k2=100,tol=1e-4,M=1e6,pi1=0.5,pi2=0.5,mu1=-0.5,mu2=0.5) { # Test for homogeneity in a two-component normal location mixture, # in which each component has unit variance. Results are given # for the likelihood ratio test, the modified likelihood ratio test # (Chen, Chen, and Kalbfleisch), the D-Test based on unpenalized maximum # likelihood estimation, and the D-Test based on penalized maximum # likelihood estimation. # # Scenario: # H_0: There exists mu_0 such that # p_1 f(x | mu_1, sigma^2_1 = 1) + p_2 f(x | mu_2, sigma^2_2 = 1) # may be written as f(x | mu_0, sigma^2_0 = 1). # H_1: There is no such mu_0. # # Required Input: # x is a vector of data, rescaled (if necessary) so that the presumption # of unit variance within each component is tenable. This # rescaling can be done using scaleunit.S or newscaleunit.S . # # Optional Inputs: # c is a penalty parameter governing the strength of # a log(4 p_1 p_2) term that is attached to # the log likelihood function in a penalized maximum # likelihood estimation scheme. # k1 is the maximum number of iterations for computing # unpenalized maximum likelihood estimates using the # expectation maximization algorithm. # k2 is the maximum number of iterations for computing # penalized maximum likelihood estimates using the # expectation maximization algorithm, with the # unpenalized maximum likelihood estimates taken as # the starting values. # tol is the largest sum of relative changes in parameter # estimates for which the expectation maximization # algorithm will terminate short of the maximum # number of iterations. # M defines the parameter space for mu as [-M, M]. # Finite M is a technical requirement, but in practice # M can be taken so large as to be essentially infinite. # pi1, pi2, mu1, mu2 are initial values for the expectation # maximization algorithm used to obtain # unpenalized maximum likelihood estimates. # These initial values can be obtained # by method of moments estimation or # by visual inspection of an undersmoothed # histogram of the data. # # Outputs: # lrt (Unpenalized) Likelihood Ratio Test: # critical values must be estimated via simulation # mlrt Modified (Penalized) Likelihood Ratio Test # developed by Chen, Chen, and Kalbfleisch: # for a large-sample level alpha test, compare to # the upper 2*alpha quantile of a chi square distribution # on one degree of freedom # pmlrt large-sample p-value for Modified Likelihood Ratio Test # ('LARGE' = 0.30 or more) # dk D-Test based on unpenalized maximum likelihood estimation: # critical values must be estimated via simulation; # if you are content with a simulation size of 10000 # (typical relative standard error between two and five percent), # you can access our simulation results through normalc.S # dkpen D-Test based on penalized maximum likelihood estimation: # for large-sample level alpha test, compare to the upper # 2*alpha quantile of a chi square distribution on one degree # of freedom multiplied by (3/16) pi^{-1/2} / length(x) # pdkpen large-sample p-value for D-Test based on # penalized maximum likelihood estimation # ('LARGE' = 0.30 or more) # nit1 number of expectation-maximization iterations to obtain # the unpenalized maximum likelihood estimates of the # mixture parameters # mu1, mu2, pi1, pi2 unpenalized maximum likelihood estimates # nit2 number of expectation-maximization iterations to obtain # the penalized maximum likelihood estimates of the # mixture parameters # mu1pen, mu2pen, pi1pen, pi2pen penalized maximum likelihood estimates n_ length(x) # n is the "sample size" mu0_ mean(x) mu1pen_ mu1 mu2pen_ mu2 pi1pen_ pi1 pi2pen_ pi2 lrt_ 0 mlrt_ 0 nit1_ 0 nit2_ 0 storage.mode(x)_"double" storage.mode(mu1)_"double" storage.mode(mu2)_"double" storage.mode(pi1)_"double" storage.mode(pi2)_"double" storage.mode(mu1pen)_"double" storage.mode(mu2pen)_"double" storage.mode(pi1pen)_"double" storage.mode(pi2pen)_"double" storage.mode(c)_"double" storage.mode(M)_"double" storage.mode(tol)_"double" storage.mode(mu0)_"double" storage.mode(lrt)_"double" storage.mode(mlrt)_"double" # Here we compute parameter estimates under both # unpenalized and penalized maximum likelihood schemes. w_.Fortran("normal", x=as.double(x), n=as.integer(n), nit1=as.integer(nit1), nit2=as.integer(nit2), c=as.double(c), M=as.double(M), k1=as.integer(k1), k2=as.integer(k2), tol=as.double(tol), pi1=as.double(pi1), pi2=as.double(pi2), mu1=as.double(mu1), mu2=as.double(mu2), pi1pen=as.double(pi1pen), pi2pen=as.double(pi2pen), mu1pen=as.double(mu1pen), mu2pen=as.double(mu2pen), mu0=as.double(mu0), lrt=as.double(lrt), mlrt=as.double(mlrt)) pi1_ w$pi1 pi2_ w$pi2 mu1_ w$mu1 mu2_ w$mu2 lrt_ w$lrt pi1pen_ w$pi1pen pi2pen_ w$pi2pen mu1pen_ w$mu1pen mu2pen_ w$mu2pen mlrt_ w$mlrt nit1_ w$nit1 nit2_ w$nit2 # Here we compute the D-test statistics based on # the parameter estimates we have. # unpenalized maximum likelihood dkp1_ (pi1^2 / (2 * sqrt(pi) ) ) + ( pi2^2 / (2 * sqrt(pi) ) ) + 1 / (2 * sqrt(pi)) dkp2a_ 2 * pi1 * exp(-0.5*((mu1^2)+(mu0^2)) + 0.5*((mu1+mu0)^2/(2))) dkp2b_ 2 * pi2 * exp(-0.5*((mu2^2)+(mu0^2)) + 0.5*((mu2+mu0)^2/(2))) dkp3_ 2 * pi1 * pi2 * exp(-0.5*((mu2^2)+(mu1^2)) + 0.5*((mu2+mu1)^2/(2))) dk_ dkp1 - dkp2a/sqrt(4*pi) - dkp2b/sqrt(4*pi) + dkp3/sqrt(4*pi) # penalized maximum likelihood dkp1_ (pi1pen^2 / (2 * sqrt(pi) ) ) + ( pi2pen^2 / (2 * sqrt(pi) ) ) + 1 / (2 * sqrt(pi)) dkp2a_ 2 * pi1pen * exp(-0.5*((mu1pen^2)+(mu0^2)) + 0.5*((mu1pen+mu0)^2/(2))) dkp2b_ 2 * pi2pen * exp(-0.5*((mu2pen^2)+(mu0^2)) + 0.5*((mu2pen+mu0)^2/(2))) dkp3_ 2 * pi1pen * pi2pen * exp(-0.5*((mu2pen^2)+(mu1pen^2)) + 0.5*((mu2pen+mu1pen)^2/(2))) dkpen_ dkp1 - dkp2a/sqrt(4*pi) - dkp2b/sqrt(4*pi) + dkp3/sqrt(4*pi) pmlrt_ 'LARGE' if (mlrt > .2749959) pmlrt_ 0.5-0.5*pchisq(mlrt,1) pdkpen_ 'LARGE' sdkpen_ (16/3)*sqrt(pi)*n*dkpen if (sdkpen > .2749959) pdkpen_ 0.5-0.5*pchisq(sdkpen,1) return(lrt,mlrt,pmlrt,dk,dkpen,pdkpen,nit1,mu1,mu2,pi1,pi2,nit2,mu1pen,mu2pen,pi1pen,pi2pen) result }