Inferences on Relative Failure Rates in Stratified Mark-Specific Proportional Hazards Models with Missing Marks, with Application to Human Immunodeficiency Virus Vaccine Efficacy Trials (2024)

Yanqing Sun

University of North Carolina at Charlotte

USA

Search for other works by this author on:

Oxford Academic

Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 64, Issue 1, January 2015, Pages 49–73, https://doi.org/10.1111/rssc.12067

Published:

03 July 2014

Article history

Received:

01 February 2013

Accepted:

01 March 2014

Published:

03 July 2014

PDF
Split View
Views
- Article contents
- Figures & tables
- Video
- Audio
- Supplementary Data
Cite

Cite

Peter B. Gilbert, Yanqing Sun, Inferences on Relative Failure Rates in Stratified Mark-Specific Proportional Hazards Models with Missing Marks, with Application to Human Immunodeficiency Virus Vaccine Efficacy Trials, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 64, Issue 1, January 2015, Pages 49–73, https://doi.org/10.1111/rssc.12067

Close
Permissions Icon Permissions

Navbar Search Filter Mobile Enter search term Search

Navbar Search Filter Enter search term Search

Advanced Search

Search Menu

Summary

The paper develops hypothesis testing procedures for the stratified mark-specific proportional hazards model in the presence of missing marks. The motivating application is preventive human immunodeficiency virus (HIV) vaccine efficacy trials, where the mark is the genetic distance of an infecting HIV sequence to an HIV sequence represented inside the vaccine. The test statistics are constructed on the basis of two-stage efficient estimators, which utilize auxiliary predictors of the missing marks. The asymptotic properties and finite sample performances of the testing procedures are investigated, demonstrating double robustness and effectiveness of the predictive auxiliaries to recover efficiency. The methods are applied to the RV144 vaccine trial.

Augmented inverse probability weighting, Auxiliary marks, Competing risks failure time data, Genetic data, Proportional hazards model, Semiparametric model

1. Introduction

The primary objective of a preventive human immunodeficiency virus (HIV) vaccine efficacy trial is to assess vaccine efficacy VE to prevent HIV infection, where typically VE is defined as 1 minus the hazard ratio (vaccine/placebo) of HIV infection diagnosis. However, the great genetic variability of HIV poses a central challenge to developing a highly efficacious vaccine (Fauci et al., 2008). The trial population is exposed to many HIV genotypes but the vaccine contains only a few, and the vaccine is less likely to protect against HIVs with greater genetic distance from the sequences inside the vaccine (Gilbert et al., 1999). The trial has objectives to assess whether and how the vaccine impacts the infection rate with any HIV genotype and whether and how the vaccine effect varies by HIV genotype; assessment of this objective has been named ‘sieve analysis’ (Gilbert et al., 1998). Gilbert et al. (2008), Sun et al. (2009) and Sun and Gilbert (2012) developed sieve analysis methods using the competing risks failure time framework (Prentice et al., 1978), which attach a continuous ‘mark’ variable to HIV-infected subjects that measures the genetic distance of an infecting HIV sequence to a sequence inside the vaccine. The goal of the sieve analysis methods is evaluation of mark-specific vaccine efficacy, which here is defined as 1 minus the mark-specific hazard ratio (vaccine/placebo) of infection. Beyond HIV, the methods apply generally to any preventative vaccine efficacy trial for which the pathogen targeted by the vaccine is genetically diverse, which includes influenza, malaria, tuberculosis, dengue, streptococcus pneumoniae, human papilloma virus and hepatitis C virus.

Gilbert et al. (2008) and Sun et al. (2009) assumed no missing mark data in infected subjects, whereas Sun and Gilbert (2012) allowed marks missing at random. In practice there are missing marks; for example in the Vax004 trial 32 of 368 infected subjects had no HIV sequence data (Gilbert et al., 2008), owing to drop-out or to inability of the HIV sequencing technology to measure the infecting HIV sequence, and in the ‘Step’ trial 22 of 88 infected subjects had no HIV sequence data (Rolland et al., 2011). Whereas it is of scientific interest to evaluate a mark defined on the basis of the earliest available HIV sequence, a mark of particular scientific interest is defined on the basis of an HIV sequence measured near the time of acquisition, which is missing in a much larger fraction of infected subjects owing to the periodic (typically 6-monthly) diagnostic tests for HIV infection. Specifically, HIV sequences are measured from the earliest available post-infection blood sample, and a ‘near acquisition’ or ‘early’ sample may be defined as a sample that has been documented to be sufficiently near acquisition. In the Step trial, only 23 of the 66 infected subjects with sequence data had an early mark measured, defined as sampling within 3 weeks. Sun and Gilbert (2012) have provided details on the HIV testing algorithm that is used to define an early mark.

Sun and Gilbert (2012) is the only reference on sieve analysis that accommodates missing continuous marks. It develops two valid estimation approaches based on the stratified mark-specific proportional hazards model. The first uses inverse probability weighting (IPW) of the complete-case estimator, which leverages auxiliary predictors of whether the mark is observed, whereas the second, adapting Robins et al. (1994), augments the IPW complete-case estimator with auxiliary predictors of the missing marks. Sun and Gilbert (2012) restricted attention to estimation methods, and this paper is a sequel that develops corresponding inferential or hypothesis testing methods based on the augmented IPW estimator. An important new component of this work compared with the previous work is to centre it on the sieve analysis of the RV144 Thai trial, which recently delivered the landmark result that a prime boost HIV vaccine appeared to provide partial protection against HIV infection (estimated VE =31%; 95% confidence interval 1–51%; p = 0.04; Rerks-Ngarm et al. (2009)). This result has stimulated intense interest in sieve analysis, for two reasons. First, there is controversy about whether the vaccine is really partially working versus a false positive result (Gilbert et al., 2011), and the sieve analysis of HIV sequences can help to resolve this question. In particular, if evidence is found that the vaccine efficacy declines with genetic distance, and the distance is defined on the basis of known parts of HIV that contain putatively protective antibody epitopes, then an interpretation of real vaccine efficacy is supported. Secondly, the HIV vaccine field is grappling with how to modify the tested vaccine to increase its potential vaccine efficacy for the next efficacy trial, and understanding the relationship between vaccine efficacy and the genetic distance provides direct guidance on which HIV sequences to put inside the next generation vaccines.

This paper is organized as follows. Notation, assumptions and the stratified mark-specific proportional hazards model are introduced in Section 2. Background on the estimation procedures that are needed for the testing procedures are described in Section 3. The testing procedures are developed, and asymptotic properties described, in Section 4. The finite sample performances of the tests are evaluated via simulations in Section 5. The application to the Thai trial is given in Section 6, and the asymptotic results and their proofs are placed in Appendix A.

The programs that were used to analyse the data can be obtained from

https://academic.oup.com/jrsssc/issue/

2. Model and missing mark data

2.1. Stratified mark-specific proportional hazards model

Let T be the failure time, V a continuous mark variable with bounded support [0,1] and Z(t) a possibly time-dependent p-dimensional covariate. The mark V is observable only when T is observed. Suppose that the conditional mark-specific hazard function at time t given the covariate history Z(s), for s≤t, depends on the current value Z(t) only. We consider the stratified mark-specific proportional hazards model

$λ_{k} {t, v | z (t)} = λ_{0 k} (t, v) exp {β {(v)}^{T} z (t)}, k = 1, \dots, K,$

(1)

where $λ_{k} {t, v | z (t)}$ is the conditional mark-specific hazard function given covariate z(t) for an individual in the kth stratum, $λ_{0 k} (\cdot, v) = λ_{k} {t, v | z (t) = 0}$ is the unspecified baseline hazard function for the kth stratum, β(v) is the p-dimensional unknown regression coefficient function of v and K is the number of strata. Model (1) allows different baseline functions for different strata and flexibly allows for arbitrary mark-specific infection hazards over time in the placebo group. In practice, different key subgroups (e.g. men and women in the Thai trial) are assigned different baseline mark-specific hazards of HIV infection.

Arranging $β (v) = {(β_{1} (v), β_{2}^{T} (v))}^{T}$ ⁠, so that β₁(v) is the coefficient for vaccination status and β₂(v) for other covariates, the covariate- and stratum-adjusted mark-specific vaccine efficacy VE(v)=1− exp {β₁(v)}. Sun et al. (2009) developed some statistical procedures for model (1) under K = 1 based on observations of the random variables (X,Z(·),V) for δ = 1 and (X,Z(·)) for δ = 0, where X = min{T,C}, δ = I(T≤C) and C is a censoring random variable. Sun and Gilbert (2012) developed estimation procedures for model (1) allowing V to be missing for some subjects with δ = 1; these methods incorporate auxiliary covariates and/or auxiliary mark variables that inform about the probability V that is observed and about the distribution of V. This paper develops parallel hypothesis testing procedures for assessing VE(v). As summarized in Section 1, the two objectives are to assess whether the vaccine efficacy ever deviates from 0 (i.e. test VE(v)=0) and to assess whether the vaccine efficacy changes with the mark (i.e. test VE(v)=VE).

2.2. Missing data assumptions

Let R be the indicator of whether all possible data are observed for a subject; R = 1 if either δ = 0 (right censored) or if δ = 1 and V is observed; and R = 0 otherwise. Auxiliary variables A may be helpful for predicting missing marks. Since the mark can only be missing for failures, supplemental information is potentially useful only for failures, for predicting missingness and for informing about the distribution of missing marks. For example, if V is defined on the basis of the early virus, then V^*, the auxiliary mark information, may include sequences of later sampled viruses and can be considered a subset of A. In general, A could include multiple viral sequences per infected subject at multiple time points, giving information on intrasubject HIV evolution. The relationship between A and V can be modelled to help to predict V (see Section 5 for a simulated example).

We assume that C is conditionally independent of (T,V) given Z(·) and the stratum. We also assume that V is missing at random (Rubin, 1976), i.e., given δ = 1 and W=(T,Z(T),A), the probability that V is missing depends only on the observed W, not on the value of V; this assumption is expressed as

$r_{k} (W) \equiv P (R = 1 | δ = 1, W) = P (R = 1 | V, δ = 1, W) .$

(2)

Let $π_{k} (Q) = P (R = 1 | Q)$ where Q=(δ,W). Then π_k(Q)=δr_k(W)+1−δ. The missingness at random (MAR) assumption (2) also implies that V is independent of R given Q:

$ρ_{k} (v, W) \equiv P (V ⩽ v | δ = 1, W) = P (V ⩽ v | R = 1, δ = 1, W) .$

(3)

Define $r_{k} (w) = P (R = 1 | δ = 1, W = w)$ and $ρ_{k} (v, w) = P (V ⩽ v | δ = 1, W = w)$ ⁠. The stratum-specific definitions of r_k(w) and ρ_k(v,w) allow the models of the probability of complete-case and of the mark distribution to differ across strata.

Let τ be the end of the follow-up period, and $n_{k}$ be the number of subjects in the kth stratum; the total sample size is $n = Σ_{k = 1}^{K} n_{k}$ ⁠. Let ${X_{k i}, Z_{k i} (\cdot), δ_{k i}, R_{k i}, V_{k i}, A_{k i}; i = 1, \dots, n_{k}}$ be independent and identically distributed replicates of {X,Z(·),δ, R,V,A} from the kth stratum. The observed data are ${O_{k i}; i = 1, \dots, n_{k}, k = 1, \dots, K}$ ⁠, where O_ki={X_ki,Z_ki(·),R_ki, R_kiV_ki,A_ki} for δ_ki=1 and O_ki={X_ki,Z_ki(·),R_ki=1} for δ_ki=0. We assume that the O_ki are independent for all subjects.

2.3. Hypotheses to test

We develop procedures for testing the following two sets of hypotheses. Let [a,b]⊂(0,1). The first set of hypotheses is

$\begin{array}{l} H_{10} : VE (v) = 0 for v \in [a, b] \\ v e r s u s & H_{1 a} : VE (v) \neq 0 for some v (general alternative) \\ or & H_{1 m} : VE (v) ⩾ 0 with strict inequality for some v (monotone alternative) . \end{array}$

The second set of hypotheses is

$\begin{array}{l} H_{20} : VE (v) does not depend on v \in [a, b] \\ v e r s u s & H_{2 a} : VE (v) depends on v (general alternative) \\ or H_{2 m} : VE (v) decreases as v increases (monotone alternative) . \end{array}$

The null hypothesis H₁₀ implies that the vaccine affords no protection (nor increased risk) against any HIV genotype. The ordered alternative H_1m indicates that the vaccine provides protection for at least some of the HIV genotypes, whereas H_1a indicates that the vaccine provides protection and/or increased risk for some HIV genotypes. The null hypothesis H₂₀ implies that there is no difference in vaccine protection against different HIV genotypes. The ordered alternative H_2m indicates that vaccine efficacy decreases with v and H_2a indicates that the vaccine efficacy changes with v. With β₁(v) the first component of β(v), the first set of hypotheses is equivalent to H₁₀:β₁(v)=0 for v ∈ [a,b] versusH_1a:β₁(v)≠0 for some v or H_1m:β₁(v)≤0 with strict inequality for some v. The second set of hypotheses is equivalent to H₂₀:β₁(v) does not depend on v ∈ [a,b] versusH_2a:β₁(v) depends on v or H_2m:β₁(v) increases as v increases. We develop testing procedures for detecting departures from H₁₀ in the direction of H_1m and H_1a and for detecting departures from H₂₀ in the direction of H_2m and H_2a. The procedures are developed on the basis of the augmented IPW complete-case estimator that was developed by Sun and Gilbert (2012).

3. Estimation procedure with missing marks

The augmented IPW (AIPW) estimator for model (1) is obtained in two stages. First the IPW complete-case estimator is derived and second the AIPW estimator is obtained, which improves efficiency by accounting for information in the conditional distribution of V given the auxiliaries.

Let r_k(W_ki, ψ_k) be the parametric model for the probability of the complete case, with r_k(W_ki) defined in equation (2), where W_ki=(T_ki,Z_ki(T_ki),A_ki) and ψ_k is a q-dimensional parameter. For example, we can assume the logistic model with $logit {r_{k} (W_{k i}, ψ_{k})} = ψ_{k}^{T} W_{k i}$ for those with δ_ki=1, where W_ki=(T_ki,Z_ki(T_ki),A_ki). By equation (2), the maximum likelihood estimator $\hat{ψ} = ({\hat{ψ}}_{1}, \dots, {\hat{ψ}}_{K})$ of ψ=(ψ₁,…,ψ_K) is obtained by maximizing the observed data likelihood

$\prod_{k, i} r_{k} {(W_{k i}, ψ_{k})}^{R_{k i} δ_{k i}} {1 - r_{k} (W_{k i}, ψ_{k})}^{(1 - R_{k i}) δ_{k i}} .$

(4)

Let K(x) be a kernel function with support [−1,1] and let h = h_n be a bandwidth. Let N_ki(t,v)=I(X_ki≤t,δ_ki=1,V_ki≤v) and Y_ki(t)=I(X_ki≥t). Let Q_ki=(δ_ki,W_ki) and π_k(Q_ki, ψ_k)=δ_kir_k(W_ki, ψ_k)+1−δ_ki. The first-stage estimator is the IPW estimator ${\hat{β}}^{ipw} (v)$ ⁠, which solves the estimating equation for β $U_{ipw} (v, β, \hat{ψ}) = 0$ ⁠, where

$U_{ipw} (v, β, \hat{ψ}) = \sum_{k = 1}^{K} \sum_{i = 1}^{n_{k}} \int_{0}^{1} \int_{0}^{τ} K_{h} (u - v) {Z_{k i} (t) - {\tilde{Z}}_{k} (t, β, {\hat{ψ}}_{k})} \frac{R_{k i}}{π_{k} (Q_{k i}, {\hat{ψ}}_{k})} N_{k i} (d t, d u) .$

(5)

Here

$K_{h} (x) = \frac{K (x / h)}{h},$

${\tilde{Z}}_{k} (t, β, ψ_{k}) = \frac{{\tilde{S}}_{k}^{(1)} (t, β, ψ_{k})}{{\tilde{S}}_{k}^{(0)} (t, β, ψ_{k})}$

and

${\tilde{S}}_{k}^{(j)} (t, β, ψ_{k}) = n_{k}^{- 1} \sum_{i = 1}^{n_{k}} R_{k i} {π_{k} (Q_{k i}, ψ_{k})}^{- 1} Y_{k i} (t) exp {β^{T} Z_{k i} (t)} Z_{k i} {(t)}^{\otimes j}$

for j = 0,1, where z^⊗0=1 and z^⊗1=z for any $z \in R^{p}$ ⁠. The score function (5) can be viewed as an extension of the score function that is used for the cause-specific Cox model (Prentice et al., 1978) for a particular failure cause J = j, for which the counting process counts only events of type j. It borrows strength from observations having marks in the neighbourhood of v. The kernel function is designed to give greater weight to observations with marks near v than those further away.

The baseline function λ_0k(t,v) can be estimated by ${\hat{λ}}_{0 k}^{ipw} (t, v)$ ⁠, obtained by smoothing the increments of the following estimator of the doubly cumulative baseline function $Λ_{0 k} (t, v) = \int_{0}^{t} \int_{0}^{v} λ_{0 k} (s, u) d s d u$ ⁠:

${\hat{Λ}}_{0 k}^{ipw} (t, v) = \sum_{i = 1}^{n_{k}} \int_{0}^{t} \int_{0}^{v} \frac{R_{k i}}{π_{k} (Q_{k i}, {\hat{ψ}}_{k})} \frac{N_{k i} (d s, d u)}{n_{k} {\tilde{S}}_{k}^{(0)} {s, {\hat{β}}^{ipw} (u), \hat{ψ_{k}}}} .$

(6)

For example, one can use the following kernel smoothing:

${\hat{λ}}_{0 k}^{ipw} (t, v) = \int_{0}^{τ} \int_{0}^{1} K_{h_{1}}^{(1)} (t - s) K_{h_{2}}^{(2)} (v - u) {\hat{Λ}}_{0 k}^{ipw} (d s, d u),$

(7)

where $K_{h_{1}}^{(1)} (x) = K^{(1)} (x / h_{1}) / h_{1}$ and $K_{h_{2}}^{(2)} (x) = K^{(2)} (x / h_{2}) / h_{2}$ ⁠, with K⁽¹⁾(·) and K⁽²⁾(·) the kernel functions and $h_{1}$ and $h_{2}$ the bandwidths.

Following Robins et al. (1994), Sun and Gilbert (2012) proposed a more efficient procedure for estimating equation (1) by incorporating the knowledge of ρ_k(w,v) in the estimation procedure. Let w=(t,z,a) and $g_{k} (a | t, v, z) = P (A_{k i} = a | T_{k i} = t, V_{k i} = v, Z_{k i} = z, δ_{k i} = 1)$ ⁠. Then

$ρ_{k} (w, v) = \int_{0}^{v} λ_{k} (t, u | z) g_{k} (a | t, u, z) d u / \int_{0}^{1} λ_{k} (t, u | z) g_{k} (a | t, u, z) d u .$

(8)

If no auxiliary variables are available or if A_ki is conditionally independent of V_ki given (T_ki,Z_ki, δ_ki), then

$ρ_{k} (w, v) = \int_{0}^{v} λ_{k} (t, u | z) d u / \int_{0}^{1} λ_{k} (t, u | z) d u .$

In this case, ρ_k(w,v) can be estimated by

${\hat{ρ}}_{k}^{ipw} (w, v) = \int_{0}^{v} {\hat{λ}}_{k}^{ipw} (t, u | z) d u / \int_{0}^{1} {\hat{λ}}_{k}^{ipw} (t, u | z) d u,$

where ${\hat{λ}}_{k}^{ipw} (t, u | z) = {\hat{λ}}_{0 k}^{ipw} (t, u) exp {{\hat{β}}^{ipw} {(u)}^{T} z}$ ⁠. When the auxiliary marks A_ki are correlated with V_ki conditional on T_ki,Z_ki and δ_ki=1, the conditional distribution ρ_k(w,v) involves the function $g_{k} (a | t, u, z)$ ⁠, for which a parametric or semiparametric model may be developed to describe the dependence between A_ki and V_ki. Let ${\hat{g}}_{k} (a | t, u, z)$ be an estimator of $g_{k} (a | t, u, z)$ with a convergence rate of at least (nh)^−1/2. Then ρ_k(w,v) can be estimated by

${\hat{ρ}}_{k}^{ipw} (w, v) = \int_{0}^{v} {\hat{λ}}_{k}^{ipw} (t, u | z) {\hat{g}}_{k} (a | t, u, z) d u / \int_{0}^{1} {\hat{λ}}_{k}^{ipw} (t, u | z) {\hat{g}}_{k} (a | t, u, z) d u .$

(9)

Let $N_{k i}^{x} (t) = I (X_{k i} ⩽ t, δ_{k i} = 1)$ and $N_{k i}^{v} (v) = I (V_{k i} ⩽ v)$ ⁠. The AIPW estimating equation for β is $U_{aug} {v, β, \hat{ψ}, \hat{ρ} (\cdot)} = 0$ ⁠, where

$\begin{array}{l} U_{aug} {v, β, \hat{ψ}, \hat{ρ} (\cdot)} & = & \sum_{k = 1}^{K} \sum_{i = 1}^{n_{k}} \int_{0}^{1} \int_{0}^{τ} K_{h} (u - v) {Z_{k i} (t) - {\bar{Z}}_{k} (t, β)} \\ \times [\frac{R_{k i}}{π_{k} (Q_{k i}, {\hat{ψ}}_{k})} N_{k i} (d t, d u) + {1 - \frac{R_{k i}}{π_{k} (Q_{k i}, {\hat{ψ}}_{k})}} N_{k i}^{x} (d t) d {{\hat{ρ}}_{k}^{ipw} (W_{k i}, u)}], \end{array}$

(10)

and

${\bar{Z}}_{k} (t, β) = S_{k}^{(1)} (t, β) / S_{k}^{(0)} (t, β)$

and

$S_{k}^{(j)} (t, β) = n_{k}^{- 1} \sum_{i = 1}^{n_{k}} Y_{k i} (t) exp {β^{T} Z_{k i} (t)} Z_{k i} {(t)}^{\otimes j}$

for j = 0,1. The AIPW estimator of β(v) solves equation (10) and is denoted by ${\hat{β}}^{aug} (v)$ ⁠. The estimator of the cumulative function $B (v) = \int_{0}^{v} β (u) d u$ is given by ${\hat{B}}^{aug} (v) = \int_{0}^{v} {\hat{β}}^{aug} (u) d u$ ⁠. Note that there is no ${\hat{ψ}}_{k}$ in ${\bar{Z}}_{k} (t, β)$ ⁠; this is a difference between the IPW and AIPW estimators.

To implement the estimation procedures in practice, one can use arbitrary auxiliaries for estimating ${\hat{ψ}}_{k}$ ⁠; these auxiliaries may include covariates and marks at multiple time points pre infection and post infection respectively. In contrast, although in principle arbitrary auxiliaries may also be used for the terms ${\hat{g}}_{k} (a | t, u, z)$ in equation (9), owing to the curse of dimensionality the method is expected to perform best in practice with a univariate auxiliary, where semiparametric or fully parametric models for $g_{k} (a | t, u, z)$ would be required to include multivariate auxiliaries.

Sun and Gilbert (2012) proved that the estimators ${\hat{β}}^{ipw} (t, v)$ and ${\hat{β}}^{aug} (t, v)$ are consistent and that ${\hat{β}}^{aug} (v)$ is more efficient than ${\hat{β}}^{ipw} (v)$ ⁠. In the next section, we develop some hypothesis testing procedures for assessing mark-specific vaccine efficacy based on ${\hat{B}}^{aug} (v)$ ⁠.

4. Testing of mark-specific vaccine efficacy

The covariate-adjusted vaccine efficacy VE(v) is defined through the first component of β(v). Let $B_{1} (v)$ be the first component of the cumulative coefficient function B(v). The hypothesis tests concerning VE(v) are constructed on the basis of the first component ${\hat{B}}_{1}^{aug} (v)$ of the AIPW estimator ${\hat{B}}^{aug} (v)$ ⁠. The cumulative estimator ${\hat{B}}^{aug} (v)$ has more stable large sample behaviour and a faster convergence rate than ${\hat{β}}^{aug} (v)$ ⁠.

Let $W_{B} (v) = n^{1 / 2} {{\hat{B}}^{aug} (v) - {\hat{B}}^{aug} (a)} - n^{1 / 2} {B (v) - B (a)}$ for v ∈ [a,b]. In Appendix A we show that W_B(v), v ∈ [a,b], converges weakly to a p-dimensional mean 0 Gaussian process with continuous sample paths on v ∈ [a,b]. Further, the distribution of W_B(v), for v ∈ [a,b], can be approximated by using the Gaussian multipliers resampling method (of Lin et al. (1993)) based on

$W_{B}^{*} (v) = n^{- 1 / 2} \sum_{k = 1}^{K} \sum_{i = 1}^{n_{k}} ξ_{k i} {\hat{H}}_{k i} (v), v \in [a, b],$

where ${ξ_{k i}, i = 1, \dots, n_{k}, k = 1, \dots, K}$ are independent and identically distributed standard normal random variables and ${\hat{H}}_{k i} (v)$ is defined in expression (22) in Appendix A. Let $W_{B_{1}} (v)$ and $W_{B_{1}}^{*} (v)$ be the first component of W_B(v) and $W_{B}^{*} (v)$ respectively. With the Gaussian multipliers method, the variance $var {{\hat{B}}_{1}^{aug} (v) - {\hat{B}}_{1}^{aug} (a)}$ can be consistently estimated by $\hat{var} {{\hat{B}}_{1}^{aug} (v) - {\hat{B}}_{1}^{aug} (a)} = n^{- 1} {var}^{*} {W_{B_{1}}^{*} (v)}$ ⁠, where ${var}^{*} {W_{B_{1}}^{*} (v)}$ is the first component on the diagonal of the covariance given in expression (23) in Appendix A.

4.1. Testing the null hypothesis H₁₀

Consider the test process $Q^{(1)} (v) = n^{1 / 2} {{\hat{B}}_{1}^{aug} (v) - {\hat{B}}_{1}^{aug} (a)}$ ⁠, v ∈ [a,b]. Then $Q^{(1)} (v) = W_{B_{1}} (v) + n^{1 / 2} {B_{1} (v) - B_{1} (a)}$ ⁠, v ∈ [a,b]. Under H₁₀, $B_{1} (v) - B_{1} (a) = 0$ for v ∈ [a,b], which motivates the following test statistics for testing H₁₀:

$\begin{matrix} T_{a 1}^{(1)} = sup_{v \in [a, b]} | Q^{(1)} (v) |, \\ T_{a 2}^{(1)} = \int_{a}^{b} Q^{(1)} {(v)}^{2} {dvar}^{*} {W_{B_{1}}^{*} (v)}, \\ T_{m 1}^{(1)} = inf_{v \in [a, b]} Q^{(1)} (v), \\ T_{m 2}^{(1)} = \int_{a}^{b} Q^{(1)} (v) {dvar}^{*} {W_{B_{1}}^{*} (v)} . \end{matrix}$

The test statistics $T_{a 1}^{(1)}$ and $T_{a 2}^{(1)}$ capture general departures H_1a, whereas the test statistics $T_{m 1}^{(1)}$ and $T_{m 2}^{(1)}$ are sensitive to the monotone departures H_1m. It is easy to derive that all the test statistics $T_{a 1}^{(1)}$ ⁠, $T_{a 2}^{(1)}$ ⁠, $T_{m 1}^{(1)}$ and $T_{m 2}^{(1)}$ are consistent against their respective alternative hypotheses, and Appendix A derives their limiting distributions under H₁₀.

Under H₁₀, the distribution of Q⁽¹⁾(v), v ∈ [a,b], can be approximated by the conditional distribution of $W_{B_{1}}^{*}$ ⁠, v ∈ [a,b], given the observed data sequence. Hence, the distributions of $T_{a 1}^{(1)}$ ⁠, $T_{a 2}^{(1)}$ ⁠, $T_{m 1}^{(1)}$ and $T_{m 2}^{(1)}$ under H₁₀ can be approximated by the conditional distributions of $T_{a 1}^{* (1)} = {sup}_{v \in [a, b]} | W_{B_{1}}^{*} (v) |$ ⁠, $T_{a 2}^{* (1)} = \int_{a}^{b} W_{B_{1}}^{*} {(v)}^{2} {dvar}^{*} {W_{B_{1}}^{*} (v)}$ ⁠, $T_{m 1}^{* (1)} = {inf}_{v \in [a, b]} W_{B_{1}}^{*} (v)$ and $T_{m 2}^{* (1)} = \int_{a}^{b} W_{B_{1}}^{*} (v) {dvar}^{*} {W_{B_{1}}^{*} (v)}$ ⁠, given the observed data sequence, respectively. The critical values $c_{a 1}^{(1)}$ and $c_{a 2}^{(1)}$ of the test statistics $T_{a 1}^{(1)}$ and $T_{a 2}^{(1)}$ can be approximated by the (1−α)-quantile of $T_{a 1}^{* (1)}$ and $T_{a 2}^{* (1)}$ ⁠, which can be obtained by repeatedly generating a large number, say 500, of independent sets of normal samples ${ξ_{k i}, i = 1, \dots, n_{k}, k = 1, \dots, K}$ while holding the observed data sequence fixed. Similarly, the critical values $c_{m 1}^{(1)}$ and $c_{m 2}^{(1)}$ of the test statistics $T_{m 1}^{(1)}$ and $T_{m 2}^{(1)}$ can be approximated by the α-quantile of $T_{m 1}^{* (1)}$ and $T_{m 2}^{* (1)}$ ⁠, which again can be obtained by repeatedly generating independent sets of normal samples ${ξ_{k i}, i = 1, \dots, n_{k}, k = 1, \dots, K}$ ⁠. At the significance level α, the tests based on $T_{a 1}^{(1)}$ and $T_{a 2}^{(1)}$ reject H₁₀ in favour of H_1a if $T_{a 1}^{(1)} > c_{a 1}^{(1)}$ and $T_{a 2}^{(1)} > c_{a 2}^{(1)}$ respectively, and the tests based on $T_{m 1}^{(1)}$ and $T_{m 2}^{(1)}$ reject H₁₀ in favour of H_1m if $T_{m 1}^{(1)} < c_{m 1}^{(1)}$ and $T_{m 2}^{(1)} < c_{m 2}^{(1)}$ respectively.

4.2. Testing the null hypothesis H₂₀

Let

$Q^{(2)} (v) = {(v - a)}^{- 1} n^{1 / 2} {{\hat{B}}_{1}^{aug} (v) - {\hat{B}}_{1}^{aug} (a)} - {(b - a)}^{- 1} n^{1 / 2} {{\hat{B}}_{1}^{aug} (b) - {\hat{B}}_{1}^{aug} (a)} .$

Then

$Q^{(2)} (v) = Γ (v, W_{B_{1}}) + n^{1 / 2} Γ (v, B_{1}) for a < v ⩽ b,$

(11)

where

$Γ (v, F_{1}) = {(v - a)}^{- 1} {F_{1} (v) - F_{1} (a)} - {(b - a)}^{- 1} {F_{1} (b) - F_{1} (a)}$

is a transformation of F₁(·). We note that $Γ (\cdot, B_{1}) = 0$ under H₂₀ and $Γ (\cdot, B_{1}) \neq 0$ under the alternatives, motivating Q⁽²⁾(v) as the test process and the following test statistics for testing H₂₀:

$\begin{matrix} T_{a 1}^{(2)} = sup_{v \in [a^{'}, b]} | Q^{(2)} (v) |, \\ T_{a 2}^{(2)} = \int_{a^{'}}^{b} Q^{(2)} {(v)}^{2} {dvar}^{*} {W_{B_{1}}^{*} (v)}, \\ T_{m 1}^{(2)} = inf_{v \in [a^{'}, b]} Q^{(2)} (v), \\ T_{m 2}^{(2)} = \int_{a^{'}}^{b} Q^{(2)} (v) {dvar}^{*} {W_{B_{1}}^{*} (v)}, \end{matrix}$

where $a < a^{'} < b$ ⁠. We choose $a^{'} > a$ to avoid 0 in the denominator of Q⁽²⁾(v). In practice, one can choose $a^{'}$ close to a to make use of available data and to ensure that the tests are consistent.

By the asymptotic results shown in Appendix A and the continuous mapping theorem, under H₂₀ the distribution of Q⁽²⁾(v), v ∈ [a,b], can be approximated by the conditional distribution of $Γ (v, W_{B_{1}}^{*})$ ⁠, v ∈ [a,b], given the observed data sequence. Hence, the distributions of $T_{a 1}^{(2)}$ ⁠, $T_{a 2}^{(2)}$ ⁠, $T_{m 1}^{(2)}$ and $T_{m 2}^{(2)}$ under H₂₀ can be approximated by the conditional distributions of

$T_{a 1}^{* (2)} = sup_{v \in [a^{'}, b]} | Γ (v, W_{B_{1}}^{*}) |,$

$T_{a 2}^{* (2)} = \int_{a^{'}}^{b} Γ {(v, W_{B_{1}}^{*})}^{2} {dvar}^{*} {W_{B_{1}}^{*} (v)},$

$T_{m 1}^{* (2)} = inf_{v \in [a^{'}, b]} Γ (v, W_{B_{1}}^{*})$

and

$T_{m 2}^{* (2)} = \int_{a^{'}}^{b} Γ (v, W_{B_{1}}^{*}) {dvar}^{*} {W_{B_{1}}^{*} (v)},$

given the observed data sequence, respectively. Similarly to Section 4.1, the respective critical values $c_{a 1}^{(2)}$ and $c_{a 2}^{(2)}$ of the test statistics $T_{a 1}^{(2)}$ and $T_{a 2}^{(2)}$ can be approximated by the (1−α)-quantiles of the conditional distributions of $T_{a 1}^{* (2)}$ and $T_{a 2}^{* (2)}$ obtained through repeatedly generating independent sets of normal samples ${ξ_{k i}, i = 1, \dots, n_{k}, k = 1, \dots, K}$ while holding the observed data sequence fixed. The critical values $c_{m 1}^{(2)}$ and $c_{m 2}^{(2)}$ for $T_{m 1}^{(2)}$ and $T_{m 2}^{(2)}$ can be approximated similarly. At the significance level α, the tests based on $T_{a 1}^{(2)}$ and $T_{a 2}^{(2)}$ reject H₂₀ in favour of H_2a if $T_{a 1}^{(2)} > c_{a 1}^{(2)}$ and $T_{a 2}^{(2)} > c_{a 2}^{(2)}$ respectively, and the tests based on $T_{m 1}^{(2)}$ and $T_{m 2}^{(2)}$ reject H₂₀ in favour of H_2m if $T_{m 1}^{(2)} < c_{m 1}^{(2)}$ and $T_{m 2}^{(2)} < c_{m 2}^{(2)}$ respectively.

The tests $T_{a 1}^{(2)}$ and $T_{a 2}^{(2)}$ capture general departures H_2a whereas the tests $T_{m 1}^{(2)}$ and $T_{m 2}^{(2)}$ are sensitive to the monotone departure H_2m. Note that the derivative $d Γ (v, B_{1}) / d v = {(v - a)}^{- 1} {β_{1} (v) - {(v - a)}^{- 1} B_{1} (v)} ⩾ 0$ under H_2m with strict inequality for at least some v ∈ [a,b]. This plus the fact that $Γ (v, B_{1})$ is non-decreasing with $Γ (b, B_{1}) = 0$ lead to the results that the tests based on $T_{m 1}^{(2)}$ and $T_{m 2}^{(2)}$ are consistent against H_2m and the tests based on $T_{a 1}^{(2)}$ and $T_{a 2}^{(2)}$ are consistent against H_2a. The proofs are given in the second paragraph following theorem 1 in Appendix A.

In Sections 4.1 and 4.2, we considered two types of test statistics, namely the integration-based test statistics and the supremum-based test statistics, for each pair of hypotheses. The former are generalizations of the Cramér–von Mises test statistic and involve integration of deviations over the whole range of the mark, whereas the latter are extensions of the classic Kolmogorov–Smirnov test statistic for testing the goodness of fit of a distribution function, and take the supremum of such deviations. As demonstrated in a comprehensive analysis of the relative powers of the classic Kolmogorov–Smirnov test and the Cramér–von Mises test by Stephens (1974), we expect that the two types of test statistic have different powers for different true alternative distributions. The integration-based test statistics are best suited for situations where the true alternative distribution deviates a little over the whole support of the mark and the supremum-based test statistics may have more power against situations where the true alternative has large deviations over a small section of the support. For example, for testing differential VE(v), H₂₀, the supremum-based tests will tend to be relatively more powerful if $\hat{VE} (v)$ is very high for a small range of marks near a and declines sharply to 0 and is constant at 0 for all other marks.

5. Simulation study

5.1. Numerical assessment of the tests under correctly specified models

We conduct a simulation study to evaluate the finite sample performance of the testing procedures proposed. The empirical sizes and powers of the test statistics are assessed for various models, sample sizes (500 and 800) and choices of bandwidths. The powers of the tests are evaluated in both situations where a correlated auxiliary variable is used and where it is absent.

We consider K = 1 stratum. Let Z_ki be the treatment indicator with P(Z_ki=1)=0.5. The (T_ki, V_ki) are generated from the following mark-specific proportional hazards model:

$λ (t, v | z) = exp {γ v + (α + β v) z}, t ⩾ 0, 0 ⩽ v ⩽ 1,$

(12)

where α, β and γ are constants. Under model (12), λ₀(t,v)= exp (γv) and VE(v)=1− exp (α+βv). For α = 0 and β = 0, VE(v)=0, indicating no vaccine efficacy, and, for β = 0, VE(v)=VE, indicating mark invariant vaccine efficacy, whereas β>0 indicates VE(v) decreasing in v. We examine the hypothesis testing procedures for the following specific models M1–M5 respectively:

$(α, β, γ) = (0, 0, 0.3),$

implying that VE(v)=0;

$(α, β, γ) = (- 0.69, 0, 0.3),$

implying that VE(v) does not depend on v;

$(α, β, γ) = (- 0.6, 0.6, 0.3),$

implying that VE(v) decreases;

$(α, β, γ) = (- 1.2, 1.2, 0.3),$

implying that VE(v) decreases;

$(α, β, γ) = (- 1.5, 1.5, 0.3),$

implying that VE(v) decreases.

We generate the censoring times from an exponential distribution, independent of (T,V), with censoring rates ranging from 20% to 30%. We take τ = 2.0. The complete-case indicator R_ki is generated with conditional probability $r_{k} (W_{k i}) = P (R_{k i} = 1 | δ_{k i} = 1, W_{k i})$ ⁠, where

$logit {r_{k} (W_{k i})} = ψ_{k 0} + ψ_{k 1} Z_{k i}, i = 1, \dots, n_{k}, k = 1, \dots, K .$

(13)

With ψ_k0=0.2 and ψ_k1=−0.2 about 50% of observed failures are missing marks.

Conditionally on (T_ki, Z_ki, V_ki), we assume that the auxiliary marks follow the model

$A_{k i} = {(θ + 1)}^{- 1} (V_{k i} + θ U_{k i}), θ > 0,$

(14)

for $i = 1, \dots, n_{k},$ k = 1,…,K, where V_ki are the possibly missing marks, U_ki is uniformly distributed on [0,1] independent of V_ki and θ>0 is an association parameter between A_ki and V_ki. The correlation coefficient ρ between A_ki and V_ki is 1 for θ = 0. Since A_ki is observed for all observed failure times, the AIPW estimator in this case is the full data estimator. The A_ki and V_ki are independent for θ = ∞, yielding ρ = 0. In addition, the θ-values of 0.8, 0.4 and 0.2 correspond to ρ = 0.78, 0.92, 0.98.

Under model (14), the conditional density of A_ki given (T_ki, Z_ki, V_ki) is

$g_{k} (a ∣ t, v, z; θ) = \frac{1 + θ}{θ} I (\frac{v}{1 + θ} ⩽ a ⩽ \frac{v + θ}{1 + θ}), 0 ⩽ a ⩽ 1, 0 ⩽ v ⩽ 1.$

(15)

The likelihood function for θ is

$L (θ) = \prod_{δ_{k i} = 1, R_{k i} = 1} \frac{1 + θ}{θ} I (\frac{V_{k i}}{1 + θ} ⩽ A_{k i} ⩽ \frac{V_{k i} + θ}{1 + θ}) for θ > 0.$

It is easy to show that the maximum likelihood estimator equals

$\hat{θ} = max_{δ_{k i} = 1, R_{k i} = 1} {V_{k i} / A_{k i}, (1 - V_{k i}) / (1 - A_{k i})} - 1 .$

The density estimator $g_{k} (a | t, v, z; \hat{θ})$ is plugged into equation (9) to obtain ${\hat{ρ}}_{k}^{ipw} (w, v)$ ⁠, which is used to construct the AIPW estimator of β in equation (10).

The performances of the test procedures proposed are evaluated through simulations for the models described in expressions (12), (13) and (14) under the settings M1–M5, where M1 is a setting under the null hypothesis H₁₀ and M2 is a setting under the null hypothesis H₂₀. We consider the situations where no auxiliary information is provided and where the correlation between the auxiliary mark and the mark of interest is ρ = 0.92 (under model (14) with θ = 0.4). Table 1 presents the empirical sizes and powers of the tests $T_{a 1}^{(1)}$ ⁠, $T_{a 2}^{(1)}$ ⁠, $T_{m 1}^{(1)}$ and $T_{m 2}^{(1)}$ for testing H₁₀ at the nominal level 0.05. Table 2 presents the empirical sizes and powers of the tests $T_{a 1}^{(2)}$ ⁠, $T_{a 2}^{(2)}$ ⁠, $T_{m 1}^{(2)}$ and $T_{m 2}^{(2)}$ for testing H₂₀ at the nominal level 0.05. The results are presented for n = 500 with $h_{1} = 0.1$ and $h = h_{2} = 0.15$ ⁠, 0.2, and for n = 800 with $h_{1} = 0.1$ and $h = h_{2} = 0.1$ ⁠, 0.15. We take a = 0, b = 1 and $a^{'} = 0.5$ for the tests. The Epanechnikov kernel $K (x) = 0.75 (1 - x^{2}) I (| x | ⩽ 1)$ is used throughout the numerical analysis.

Table 1

Open in new tab

Empirical sizes and powers of the tests $T_{1}^{(1)}$ ⁠, $T_{2}^{(1)}$ ⁠, $T_{1}^{(1)}$ and $T_{2}^{(1)}$ for testing H₁₀ at the nominal level 0.05 for ρ = 0 and 0.92 when 50% of the marks are missing†

Model	(α,β,γ)	n	h	Size/power
				ρ = 0				ρ = 0.92
				$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$	$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$
M1	(0,0,0.3)	500	0.15	5.4	4.0	4.0	5.0	4.6	4.2	3.8	4.2
		500	0.20	5.0	4.4	4.6	5.2	4.8	4.0	4.2	3.6
		800	0.10	3.8	3.6	4.2	4.2	3.8	3.8	5.4	4.8
		800	0.15	4.0	3.8	4.6	4.6	5.0	4.4	5.4	5.6
M3	(−0.6,0.6,0.3)	500	0.15	68.2	67.0	79.4	76.0	73.2	74.6	83.2	85.4
		500	0.20	63.2	65.0	75.8	74.2	69.2	71.4	79.8	82.6
		800	0.10	88.2	86.2	94.6	90.4	92.0	93.0	95.0	97.2
		800	0.15	87.4	86.6	92.8	90.8	89.2	90.6	93.4	95.2
M4	(−1.2,1.2,0.3)	500	0.15	99.6	99.4	99.8	99.8	99.8	100	99.8	100
		500	0.20	99.4	99.0	99.6	99.8	99.6	99.8	99.8	100
		800	0.10	100	100	100	100	100	100	100	100
		800	0.15	100	100	100	100	100	100	100	100
M2	(−0.69,0,0.3)	500	0.15	100	100	100	99.8	100	100	100	100
		500	0.20	100	100	100	100	100	100	100	100
		800	0.10	100	99.8	100	100	99.8	99.8	99.8	99.8
		800	0.15	100	100	100	100	100	100	100	100

Model	(α,β,γ)	n	h	Size/power
				ρ = 0				ρ = 0.92
				$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$	$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$
M1	(0,0,0.3)	500	0.15	5.4	4.0	4.0	5.0	4.6	4.2	3.8	4.2
		500	0.20	5.0	4.4	4.6	5.2	4.8	4.0	4.2	3.6
		800	0.10	3.8	3.6	4.2	4.2	3.8	3.8	5.4	4.8
		800	0.15	4.0	3.8	4.6	4.6	5.0	4.4	5.4	5.6
M3	(−0.6,0.6,0.3)	500	0.15	68.2	67.0	79.4	76.0	73.2	74.6	83.2	85.4
		500	0.20	63.2	65.0	75.8	74.2	69.2	71.4	79.8	82.6
		800	0.10	88.2	86.2	94.6	90.4	92.0	93.0	95.0	97.2
		800	0.15	87.4	86.6	92.8	90.8	89.2	90.6	93.4	95.2
M4	(−1.2,1.2,0.3)	500	0.15	99.6	99.4	99.8	99.8	99.8	100	99.8	100
		500	0.20	99.4	99.0	99.6	99.8	99.6	99.8	99.8	100
		800	0.10	100	100	100	100	100	100	100	100
		800	0.15	100	100	100	100	100	100	100	100
M2	(−0.69,0,0.3)	500	0.15	100	100	100	99.8	100	100	100	100
		500	0.20	100	100	100	100	100	100	100	100
		800	0.10	100	99.8	100	100	99.8	99.8	99.8	99.8
		800	0.15	100	100	100	100	100	100	100	100

†

The bandwidths are $h_{1} = 0.1$ and $h_{2} = h$ ⁠. Each entry is based on 500 Gaussian multipliers samples and 500 repetitions.

Table 1

Open in new tab

Model	(α,β,γ)	n	h	Size/power
				ρ = 0				ρ = 0.92
				$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$	$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$
M1	(0,0,0.3)	500	0.15	5.4	4.0	4.0	5.0	4.6	4.2	3.8	4.2
		500	0.20	5.0	4.4	4.6	5.2	4.8	4.0	4.2	3.6
		800	0.10	3.8	3.6	4.2	4.2	3.8	3.8	5.4	4.8
		800	0.15	4.0	3.8	4.6	4.6	5.0	4.4	5.4	5.6
M3	(−0.6,0.6,0.3)	500	0.15	68.2	67.0	79.4	76.0	73.2	74.6	83.2	85.4
		500	0.20	63.2	65.0	75.8	74.2	69.2	71.4	79.8	82.6
		800	0.10	88.2	86.2	94.6	90.4	92.0	93.0	95.0	97.2
		800	0.15	87.4	86.6	92.8	90.8	89.2	90.6	93.4	95.2
M4	(−1.2,1.2,0.3)	500	0.15	99.6	99.4	99.8	99.8	99.8	100	99.8	100
		500	0.20	99.4	99.0	99.6	99.8	99.6	99.8	99.8	100
		800	0.10	100	100	100	100	100	100	100	100
		800	0.15	100	100	100	100	100	100	100	100
M2	(−0.69,0,0.3)	500	0.15	100	100	100	99.8	100	100	100	100
		500	0.20	100	100	100	100	100	100	100	100
		800	0.10	100	99.8	100	100	99.8	99.8	99.8	99.8
		800	0.15	100	100	100	100	100	100	100	100

Model	(α,β,γ)	n	h	Size/power
				ρ = 0				ρ = 0.92
				$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$	$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$
M1	(0,0,0.3)	500	0.15	5.4	4.0	4.0	5.0	4.6	4.2	3.8	4.2
		500	0.20	5.0	4.4	4.6	5.2	4.8	4.0	4.2	3.6
		800	0.10	3.8	3.6	4.2	4.2	3.8	3.8	5.4	4.8
		800	0.15	4.0	3.8	4.6	4.6	5.0	4.4	5.4	5.6
M3	(−0.6,0.6,0.3)	500	0.15	68.2	67.0	79.4	76.0	73.2	74.6	83.2	85.4
		500	0.20	63.2	65.0	75.8	74.2	69.2	71.4	79.8	82.6
		800	0.10	88.2	86.2	94.6	90.4	92.0	93.0	95.0	97.2
		800	0.15	87.4	86.6	92.8	90.8	89.2	90.6	93.4	95.2
M4	(−1.2,1.2,0.3)	500	0.15	99.6	99.4	99.8	99.8	99.8	100	99.8	100
		500	0.20	99.4	99.0	99.6	99.8	99.6	99.8	99.8	100
		800	0.10	100	100	100	100	100	100	100	100
		800	0.15	100	100	100	100	100	100	100	100
M2	(−0.69,0,0.3)	500	0.15	100	100	100	99.8	100	100	100	100
		500	0.20	100	100	100	100	100	100	100	100
		800	0.10	100	99.8	100	100	99.8	99.8	99.8	99.8
		800	0.15	100	100	100	100	100	100	100	100

†

The bandwidths are $h_{1} = 0.1$ and $h_{2} = h$ ⁠. Each entry is based on 500 Gaussian multipliers samples and 500 repetitions.

Table 2

Open in new tab

Empirical sizes and powers of the tests $T_{1}^{(2)}$ ⁠, $T_{2}^{(2)}$ ⁠, $T_{1}^{(2)}$ and $T_{2}^{(2)}$ for testing H₂₀ at the nominal level 0.05 for ρ = 0 and 0.92 when 50% of the marks are missing†

Model	(α,β,γ)	n	h	Size/power
				ρ = 0				ρ = 0.92
				$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$	$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$
M2	(−0.69,0,0.3)	500	0.15	5.6	4.8	5.8	5.8	7.6	7.2	7.4	7.0
		500	0.20	5.8	4.8	5.4	5.2	6.6	6.6	6.6	7.4
		800	0.10	6.4	5.0	5.6	5.8	6.2	5.8	7.2	7.0
		800	0.15	6.6	5.2	5.8	5.6	6.0	5.6	6.0	6.6
M3	(−0.6,0.6,0.3)	500	0.15	16.8	17.0	22.4	25.2	20.6	25.8	32.6	37.4
		500	0.20	14.2	15.8	22.2	24.8	19.4	24.2	31.8	34.6
		800	0.10	26.0	25.8	35.2	36.4	36.0	38.0	46.0	49.2
		800	0.15	25.4	25.8	34.8	35.6	34.0	36.0	45.4	47.4
M4	(−1.2,1.2,0.3)	500	0.15	44.4	46.2	59.0	63.2	63.6	68.4	76.4	80.2
		500	0.20	42.2	44.0	57.2	59.6	61.4	65.8	73.2	75.8
		800	0.10	66.2	67.6	75.2	78.0	82.8	86.6	90.6	91.8
		800	0.15	64.6	66.2	74.0	77.0	80.6	84.4	88.4	91.2
M5	(−1.5,1.5,0.3)	500	0.15	64.5	66.5	75.0	76.5	81.0	85.6	88.8	90.4
		500	0.20	61.0	62.6	72.2	72.2	77.8	82.4	86.8	89.4
		800	0.10	80.8	85.6	87.6	91.4	94.6	96.2	97.6	98.4
		800	0.15	78.6	84.8	87.8	91.4	94.4	95.6	95.8	97.8

Model	(α,β,γ)	n	h	Size/power
				ρ = 0				ρ = 0.92
				$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$	$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$
M2	(−0.69,0,0.3)	500	0.15	5.6	4.8	5.8	5.8	7.6	7.2	7.4	7.0
		500	0.20	5.8	4.8	5.4	5.2	6.6	6.6	6.6	7.4
		800	0.10	6.4	5.0	5.6	5.8	6.2	5.8	7.2	7.0
		800	0.15	6.6	5.2	5.8	5.6	6.0	5.6	6.0	6.6
M3	(−0.6,0.6,0.3)	500	0.15	16.8	17.0	22.4	25.2	20.6	25.8	32.6	37.4
		500	0.20	14.2	15.8	22.2	24.8	19.4	24.2	31.8	34.6
		800	0.10	26.0	25.8	35.2	36.4	36.0	38.0	46.0	49.2
		800	0.15	25.4	25.8	34.8	35.6	34.0	36.0	45.4	47.4
M4	(−1.2,1.2,0.3)	500	0.15	44.4	46.2	59.0	63.2	63.6	68.4	76.4	80.2
		500	0.20	42.2	44.0	57.2	59.6	61.4	65.8	73.2	75.8
		800	0.10	66.2	67.6	75.2	78.0	82.8	86.6	90.6	91.8
		800	0.15	64.6	66.2	74.0	77.0	80.6	84.4	88.4	91.2
M5	(−1.5,1.5,0.3)	500	0.15	64.5	66.5	75.0	76.5	81.0	85.6	88.8	90.4
		500	0.20	61.0	62.6	72.2	72.2	77.8	82.4	86.8	89.4
		800	0.10	80.8	85.6	87.6	91.4	94.6	96.2	97.6	98.4
		800	0.15	78.6	84.8	87.8	91.4	94.4	95.6	95.8	97.8

†

The bandwidths are $h_{1} = 0.1$ and $h_{2} = h$ ⁠. Each entry is based on 500 Gaussian multipliers samples and 500 repetitions.

Table 2

Open in new tab

Model	(α,β,γ)	n	h	Size/power
				ρ = 0				ρ = 0.92
				$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$	$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$
M2	(−0.69,0,0.3)	500	0.15	5.6	4.8	5.8	5.8	7.6	7.2	7.4	7.0
		500	0.20	5.8	4.8	5.4	5.2	6.6	6.6	6.6	7.4
		800	0.10	6.4	5.0	5.6	5.8	6.2	5.8	7.2	7.0
		800	0.15	6.6	5.2	5.8	5.6	6.0	5.6	6.0	6.6
M3	(−0.6,0.6,0.3)	500	0.15	16.8	17.0	22.4	25.2	20.6	25.8	32.6	37.4
		500	0.20	14.2	15.8	22.2	24.8	19.4	24.2	31.8	34.6
		800	0.10	26.0	25.8	35.2	36.4	36.0	38.0	46.0	49.2
		800	0.15	25.4	25.8	34.8	35.6	34.0	36.0	45.4	47.4
M4	(−1.2,1.2,0.3)	500	0.15	44.4	46.2	59.0	63.2	63.6	68.4	76.4	80.2
		500	0.20	42.2	44.0	57.2	59.6	61.4	65.8	73.2	75.8
		800	0.10	66.2	67.6	75.2	78.0	82.8	86.6	90.6	91.8
		800	0.15	64.6	66.2	74.0	77.0	80.6	84.4	88.4	91.2
M5	(−1.5,1.5,0.3)	500	0.15	64.5	66.5	75.0	76.5	81.0	85.6	88.8	90.4
		500	0.20	61.0	62.6	72.2	72.2	77.8	82.4	86.8	89.4
		800	0.10	80.8	85.6	87.6	91.4	94.6	96.2	97.6	98.4
		800	0.15	78.6	84.8	87.8	91.4	94.4	95.6	95.8	97.8

Model	(α,β,γ)	n	h	Size/power
				ρ = 0				ρ = 0.92
				$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$	$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$
M2	(−0.69,0,0.3)	500	0.15	5.6	4.8	5.8	5.8	7.6	7.2	7.4	7.0
		500	0.20	5.8	4.8	5.4	5.2	6.6	6.6	6.6	7.4
		800	0.10	6.4	5.0	5.6	5.8	6.2	5.8	7.2	7.0
		800	0.15	6.6	5.2	5.8	5.6	6.0	5.6	6.0	6.6
M3	(−0.6,0.6,0.3)	500	0.15	16.8	17.0	22.4	25.2	20.6	25.8	32.6	37.4
		500	0.20	14.2	15.8	22.2	24.8	19.4	24.2	31.8	34.6
		800	0.10	26.0	25.8	35.2	36.4	36.0	38.0	46.0	49.2
		800	0.15	25.4	25.8	34.8	35.6	34.0	36.0	45.4	47.4
M4	(−1.2,1.2,0.3)	500	0.15	44.4	46.2	59.0	63.2	63.6	68.4	76.4	80.2
		500	0.20	42.2	44.0	57.2	59.6	61.4	65.8	73.2	75.8
		800	0.10	66.2	67.6	75.2	78.0	82.8	86.6	90.6	91.8
		800	0.15	64.6	66.2	74.0	77.0	80.6	84.4	88.4	91.2
M5	(−1.5,1.5,0.3)	500	0.15	64.5	66.5	75.0	76.5	81.0	85.6	88.8	90.4
		500	0.20	61.0	62.6	72.2	72.2	77.8	82.4	86.8	89.4
		800	0.10	80.8	85.6	87.6	91.4	94.6	96.2	97.6	98.4
		800	0.15	78.6	84.8	87.8	91.4	94.4	95.6	95.8	97.8

†

The bandwidths are $h_{1} = 0.1$ and $h_{2} = h$ ⁠. Each entry is based on 500 Gaussian multipliers samples and 500 repetitions.

Tables 1 and 2 show that all the tests have satisfactory empirical sizes close to the nominal level 0.05. The powers of the tests increase with sample size and they are not overly sensitive to the bandwidths selected. The powers of the tests for testing H₁₀ increase as the model moves in the direction M1→M3→M4→M2, representing increased departure from the null hypothesis H₁₀. The powers of the tests for testing H₂₀ increase as the model moves in the direction M2→M3→M4→M5, representing increased departure from the null hypothesis H₂₀. The tests utilizing the auxiliary marks have higher power than those without using the auxiliary marks.

As with any non-parametric smoothing procedure, one needs to select bandwidths carefully. In practice, the appropriate bandwidth selection can be based on a $K$ -fold cross-validation method (e.g. Efron and Tibshirani (1993), Hoover et al. (1998), Cai et al. (2000) and Tian et al. (2005)).

The testing procedures proposed properly handle missing marks under MAR with asymptotically correct significance levels. However, if only the observations with complete information are used, i.e. the complete-case analysis, then the testing procedures are expected often not to provide correct type I error control. We conduct a simulation study to evaluate the observed sizes of the proposed tests using the complete cases under two different models for missing the indicator R_ki—model (13) and the model

$logit {r_{k} (W_{k i})} = 0.8 - Z_{k i} - 0.3 T_{k i}, i = 1, \dots, n_{k}, k = 1, \dots, K .$

(16)

For K = 1 both model (13) and model (16) yield about 50% missing marks among the observed failures. The sizes of $T_{a 1}^{(1)}$ ⁠, $T_{a 2}^{(1)}$ ⁠, $T_{m 1}^{(1)}$ and $T_{m 2}^{(1)}$ for testing H₁₀ are evaluated under model M1 and the sizes of $T_{a 1}^{(2)}$ ⁠, $T_{a 2}^{(2)}$ ⁠, $T_{m 1}^{(2)}$ and $T_{m 2}^{(2)}$ for testing H₂₀ are evaluated under model M2 (Table 3). Under model (13), the observed sizes for testing H₁₀ are elevated (around 7–15%), whereas those for testing H₂₀ remain around 5%. Under model (16), the observed sizes for testing H₁₀ exceed 37% for all tests, whereas those for testing H₂₀ reach 12% and 14% for tests $T_{m 1}^{(2)}$ and $T_{m 2}^{(2)}$ when n = 800.

Table 3

Open in new tab

Empirical sizes of the tests for H₁₀ and H₂₀ at the nominal level 0.05 using the complete cases under missingness completely at random when 50% of the marks are missing†

Model	Missingness model	n	h	Size
				$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$
TestingH₁₀
M1	(13)	500	0.20	0.14	0.10	0.12	0.15
	(13)	800	0.15	0.10	0.07	0.11	0.11
	(16)	500	0.20	0.39	0.37	0.50	0.42
	(16)	800	0.15	0.50	0.46	0.63	0.55
				$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$
Testing H₂₀
M2	(13)	500	0.20	0.08	0.04	0.08	0.05
	(13)	800	0.15	0.06	0.09	0.06	0.10
	(16)	500	0.20	0.08	0.07	0.08	0.05
	(16)	800	0.15	0.07	0.06	0.12	0.14

Model	Missingness model	n	h	Size
				$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$
TestingH₁₀
M1	(13)	500	0.20	0.14	0.10	0.12	0.15
	(13)	800	0.15	0.10	0.07	0.11	0.11
	(16)	500	0.20	0.39	0.37	0.50	0.42
	(16)	800	0.15	0.50	0.46	0.63	0.55
				$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$
Testing H₂₀
M2	(13)	500	0.20	0.08	0.04	0.08	0.05
	(13)	800	0.15	0.06	0.09	0.06	0.10
	(16)	500	0.20	0.08	0.07	0.08	0.05
	(16)	800	0.15	0.07	0.06	0.12	0.14

†

The bandwidths are $h_{1} = 0.1$ and $h_{2} = h$ ⁠. Each entry is based on 100 Gaussian multipliers samples and 100 repetitions.

Table 3

Open in new tab

Empirical sizes of the tests for H₁₀ and H₂₀ at the nominal level 0.05 using the complete cases under missingness completely at random when 50% of the marks are missing†

Model	Missingness model	n	h	Size
				$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$
TestingH₁₀
M1	(13)	500	0.20	0.14	0.10	0.12	0.15
	(13)	800	0.15	0.10	0.07	0.11	0.11
	(16)	500	0.20	0.39	0.37	0.50	0.42
	(16)	800	0.15	0.50	0.46	0.63	0.55
				$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$
Testing H₂₀
M2	(13)	500	0.20	0.08	0.04	0.08	0.05
	(13)	800	0.15	0.06	0.09	0.06	0.10
	(16)	500	0.20	0.08	0.07	0.08	0.05
	(16)	800	0.15	0.07	0.06	0.12	0.14

Model	Missingness model	n	h	Size
				$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$
TestingH₁₀
M1	(13)	500	0.20	0.14	0.10	0.12	0.15
	(13)	800	0.15	0.10	0.07	0.11	0.11
	(16)	500	0.20	0.39	0.37	0.50	0.42
	(16)	800	0.15	0.50	0.46	0.63	0.55
				$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$
Testing H₂₀
M2	(13)	500	0.20	0.08	0.04	0.08	0.05
	(13)	800	0.15	0.06	0.09	0.06	0.10
	(16)	500	0.20	0.08	0.07	0.08	0.05
	(16)	800	0.15	0.07	0.06	0.12	0.14

†

The bandwidths are $h_{1} = 0.1$ and $h_{2} = h$ ⁠. Each entry is based on 100 Gaussian multipliers samples and 100 repetitions.

These simulation results verify that the testing procedures applied to complete cases generally do not have nominal size, although for some of the scenarios the sizes are nominal. To explain this, it can be shown that, under MAR, $λ_{k} (t, v | z, R_{k i} = 1) = λ_{k} (t, v | z) h_{k} (t, z)$ ⁠, where $h_{k} (t, z) = P (R_{k i} = 1 | T_{k i} = t, Z_{k i} = z) / P (R_{k i} = 1 | T_{k i} ⩾ t, Z_{k i} = z)$ ⁠. If h_k(t,z) does not depend on z and MAR holds, then the observations for individuals with the observed marks only can be viewed as a random sample from a mark-specific proportional hazards model with a different baseline hazard function but the same regression function β(v). In this case, the tests for both H₁₀ and H₂₀ based on the complete cases are valid. If h_k(t,z) depends on z but not on t and MAR holds, then h_k(t,z) can be expressed as $h_{k} (t, z) = exp (ϑ_{k}^{'} z)$ (the scenario under model (13)), and the tests of H₁₀ based on the complete cases will be biased. However, the tests of H₂₀ remain unbiased since the biases in the estimation of β(v) do not depend on v, such that the test process Q⁽²⁾(v) is still asymptotically a mean 0 process. In general, if h_k(t,z) depends on both z and t and MAR holds, which is the scenario under the missingness model (16), then the test process Q⁽²⁾(v) is not an asymptotically mean 0 process. The magnitude of departure of the asymptotic sizes of the test statistics of H₂₀ from the nominal level depends on h_k(t,z) in a complicated manner.

5.2. Numerical assessment of the tests under misspecified models

This subsection evaluates robustness of the proposed test procedures to misspecifications of r_k(w) and/or $g_{k} (a | t, v, z)$ ⁠, and to violation of the MAR assumption. The Z_ki, (T_ki, V_ki) and C_ki are generated by using the same models as above, again with approximately 30% censoring.

Robustness of the tests to misspecification of r_k(w) is examined by assuming model (13) whereas the actual complete-case indicator R_ki is generated with the conditional probability $r_{k} (W_{k i}) = P (R_{k i} = 1 | δ_{k i} = 1, W_{k i})$ ⁠, where

$logit {r_{k} (W_{k i})} = 1.1 + Z_{k i} - T_{k i}, i = 1, \dots, n_{k} .$

(17)

This model yields approximately 50% missing marks among observed failures under models M1–M5.

Robustness of the tests is also examined when $g_{k} (a | t, v, z)$ is misspecified. This is carried out by assuming model (14) for the auxiliary mark, or, equivalently, model (15) for $g_{k} (a | t, v, z)$ ⁠, whereas the actual mark for δ_ki=1 is generated from

$A_{k i} = {(1.4 + 2 τ)}^{- 1} (V_{k i} + 0.4 U_{k i} + 2 X_{k i}),$

(18)

for $i = 1, \dots, n_{k} .$ Here U_ki is uniformly distributed on [0,1] and is independent of V_ki.

Robustness of the tests to violation of the MAR assumption (2) is examined by assuming model (13), whereas the actual R_ki depends on V_ki through the model

$logit {r_{k} (W_{k i})} = 0.6 + Z_{k i} - 2 V_{k i}, i = 1, \dots, n_{k} .$

Model	(α,β,γ)	Size/power
r_k(w) is misspecified
M1	(0,0,0.3)	4.2	5.2	3.6	4.2
M3	(−0.6,0.6,0.3)	62.0	74.4	74.0	81.8
M4	(−1.2,1.2,0.3)	99.6	99.8	99.8	99.8
M2	(−0.69,0,0.3)	100	100	100	100
$g_{k} () a \| t, v, z ()$ is misspecified
M1	(0,0,0.3)	3.4	4.2	5.8	4.6
M3	(−0.6,0.6,0.3)	59.6	64.4	72.8	74.4
M4	(−1.2,1.2,0.3)	99.2	99.4	99.6	99.6
M2	(−0.69,0,0.3)	100	99.8	100	99.8
r_k(w) and $g_{k} () a \| t, v, z ()$ are misspecified
M1	(0,0,0.3)	4.0	4.0	3.8	3.4
M3	(−0.6,0.6,0.3)	61.8	61.8	71.8	73.8
M4	(−1.2,1.2,0.3)	99.6	98.6	99.8	99.8
M2	(−0.69,0,0.3)	100	100	100	100
MAR assumption is violated
M1	(0,0,0.3)	3.4	3.8	3.6	5.0
M3	(−0.6,0.6,0.3)	60.6	67.0	73.0	77.8
M4	(−1.2,1.2,0.3)	99.2	99.6	99.8	99.6
M2	(−0.69,0,0.3)	100	100	100	100

Model	(α,β,γ)	Size/power
r_k(w) is misspecified
M1	(0,0,0.3)	4.2	5.2	3.6	4.2
M3	(−0.6,0.6,0.3)	62.0	74.4	74.0	81.8
M4	(−1.2,1.2,0.3)	99.6	99.8	99.8	99.8
M2	(−0.69,0,0.3)	100	100	100	100
$g_{k} () a \| t, v, z ()$ is misspecified
M1	(0,0,0.3)	3.4	4.2	5.8	4.6
M3	(−0.6,0.6,0.3)	59.6	64.4	72.8	74.4
M4	(−1.2,1.2,0.3)	99.2	99.4	99.6	99.6
M2	(−0.69,0,0.3)	100	99.8	100	99.8
r_k(w) and $g_{k} () a \| t, v, z ()$ are misspecified
M1	(0,0,0.3)	4.0	4.0	3.8	3.4
M3	(−0.6,0.6,0.3)	61.8	61.8	71.8	73.8
M4	(−1.2,1.2,0.3)	99.6	98.6	99.8	99.8
M2	(−0.69,0,0.3)	100	100	100	100
MAR assumption is violated
M1	(0,0,0.3)	3.4	3.8	3.6	5.0
M3	(−0.6,0.6,0.3)	60.6	67.0	73.0	77.8
M4	(−1.2,1.2,0.3)	99.2	99.6	99.8	99.6
M2	(−0.69,0,0.3)	100	100	100	100

Model	(α,β,γ)	Size/power
r_k(w) is misspecified
M1	(0,0,0.3)	4.2	5.2	3.6	4.2
M3	(−0.6,0.6,0.3)	62.0	74.4	74.0	81.8
M4	(−1.2,1.2,0.3)	99.6	99.8	99.8	99.8
M2	(−0.69,0,0.3)	100	100	100	100
$g_{k} () a \| t, v, z ()$ is misspecified
M1	(0,0,0.3)	3.4	4.2	5.8	4.6
M3	(−0.6,0.6,0.3)	59.6	64.4	72.8	74.4
M4	(−1.2,1.2,0.3)	99.2	99.4	99.6	99.6
M2	(−0.69,0,0.3)	100	99.8	100	99.8
r_k(w) and $g_{k} () a \| t, v, z ()$ are misspecified
M1	(0,0,0.3)	4.0	4.0	3.8	3.4
M3	(−0.6,0.6,0.3)	61.8	61.8	71.8	73.8
M4	(−1.2,1.2,0.3)	99.6	98.6	99.8	99.8
M2	(−0.69,0,0.3)	100	100	100	100
MAR assumption is violated
M1	(0,0,0.3)	3.4	3.8	3.6	5.0
M3	(−0.6,0.6,0.3)	60.6	67.0	73.0	77.8
M4	(−1.2,1.2,0.3)	99.2	99.6	99.8	99.6
M2	(−0.69,0,0.3)	100	100	100	100

Model	(α,β,γ)	Size/power
r_k(w) is misspecified
M1	(0,0,0.3)	4.2	5.2	3.6	4.2
M3	(−0.6,0.6,0.3)	62.0	74.4	74.0	81.8
M4	(−1.2,1.2,0.3)	99.6	99.8	99.8	99.8
M2	(−0.69,0,0.3)	100	100	100	100
$g_{k} () a \| t, v, z ()$ is misspecified
M1	(0,0,0.3)	3.4	4.2	5.8	4.6
M3	(−0.6,0.6,0.3)	59.6	64.4	72.8	74.4
M4	(−1.2,1.2,0.3)	99.2	99.4	99.6	99.6
M2	(−0.69,0,0.3)	100	99.8	100	99.8
r_k(w) and $g_{k} () a \| t, v, z ()$ are misspecified
M1	(0,0,0.3)	4.0	4.0	3.8	3.4
M3	(−0.6,0.6,0.3)	61.8	61.8	71.8	73.8
M4	(−1.2,1.2,0.3)	99.6	98.6	99.8	99.8
M2	(−0.69,0,0.3)	100	100	100	100
MAR assumption is violated
M1	(0,0,0.3)	3.4	3.8	3.6	5.0
M3	(−0.6,0.6,0.3)	60.6	67.0	73.0	77.8
M4	(−1.2,1.2,0.3)	99.2	99.6	99.8	99.6
M2	(−0.69,0,0.3)	100	100	100	100

Model	(α,β,γ)	Size/power
r_k(w) is misspecified
M2	(−0.69,0,0.3)	5.0	3.8	5.4	6.2
M3	(−0.6,0.6,0.3)	24.0	25.2	34.2	36.0
M4	(−1.2,1.2,0.3)	60.8	66.6	72.8	78.4
M5	(−1.5,1.5,0.3)	76.6	82.0	85.8	88.8
$g_{k} () a \| t, v, z ()$ is misspecified
M2	(−0.69,0,0.3)	4.8	6.6	6.0	5.8
M3	(−0.6,0.6,0.3)	17.2	18.0	28.4	28.2
M4	(−1.2,1.2,0.3)	44.8	47.2	56.4	61.0
M5	(−1.5,1.5,0.3)	58.0	60.4	68.6	73.2
r_k(w) and $g_{k} () a \| t, v, z ()$ are misspecified
M2	(−0.69,0,0.3)	4.0	4.8	4.4	4.4
M3	(−0.6,0.6,0.3)	16.6	19.6	26.8	26.6
M4	(−1.2,1.2,0.3)	43.2	46.6	55.6	60.6
M5	(−1.5,1.5,0.3)	53.8	58.8	67.4	71.4
MAR assumption is violated
M2	(−0.69,0,0.3)	6.8	6.0	7.6	7.8
M3	(−0.6,0.6,0.3)	28.6	33.6	39.6	42.0
M4	(−1.2,1.2,0.3)	61.8	67.0	74.0	78.4
M5	(−1.5,1.5,0.3)	77.4	81.6	85.4	89.2

Model	(α,β,γ)	Size/power
r_k(w) is misspecified
M2	(−0.69,0,0.3)	5.0	3.8	5.4	6.2
M3	(−0.6,0.6,0.3)	24.0	25.2	34.2	36.0
M4	(−1.2,1.2,0.3)	60.8	66.6	72.8	78.4
M5	(−1.5,1.5,0.3)	76.6	82.0	85.8	88.8
$g_{k} () a \| t, v, z ()$ is misspecified
M2	(−0.69,0,0.3)	4.8	6.6	6.0	5.8
M3	(−0.6,0.6,0.3)	17.2	18.0	28.4	28.2
M4	(−1.2,1.2,0.3)	44.8	47.2	56.4	61.0
M5	(−1.5,1.5,0.3)	58.0	60.4	68.6	73.2
r_k(w) and $g_{k} () a \| t, v, z ()$ are misspecified
M2	(−0.69,0,0.3)	4.0	4.8	4.4	4.4
M3	(−0.6,0.6,0.3)	16.6	19.6	26.8	26.6
M4	(−1.2,1.2,0.3)	43.2	46.6	55.6	60.6
M5	(−1.5,1.5,0.3)	53.8	58.8	67.4	71.4
MAR assumption is violated
M2	(−0.69,0,0.3)	6.8	6.0	7.6	7.8
M3	(−0.6,0.6,0.3)	28.6	33.6	39.6	42.0
M4	(−1.2,1.2,0.3)	61.8	67.0	74.0	78.4
M5	(−1.5,1.5,0.3)	77.4	81.6	85.4	89.2

Model	(α,β,γ)	Size/power
r_k(w) is misspecified
M2	(−0.69,0,0.3)	5.0	3.8	5.4	6.2
M3	(−0.6,0.6,0.3)	24.0	25.2	34.2	36.0
M4	(−1.2,1.2,0.3)	60.8	66.6	72.8	78.4
M5	(−1.5,1.5,0.3)	76.6	82.0	85.8	88.8
$g_{k} () a \| t, v, z ()$ is misspecified
M2	(−0.69,0,0.3)	4.8	6.6	6.0	5.8
M3	(−0.6,0.6,0.3)	17.2	18.0	28.4	28.2
M4	(−1.2,1.2,0.3)	44.8	47.2	56.4	61.0
M5	(−1.5,1.5,0.3)	58.0	60.4	68.6	73.2
r_k(w) and $g_{k} () a \| t, v, z ()$ are misspecified
M2	(−0.69,0,0.3)	4.0	4.8	4.4	4.4
M3	(−0.6,0.6,0.3)	16.6	19.6	26.8	26.6
M4	(−1.2,1.2,0.3)	43.2	46.6	55.6	60.6
M5	(−1.5,1.5,0.3)	53.8	58.8	67.4	71.4
MAR assumption is violated
M2	(−0.69,0,0.3)	6.8	6.0	7.6	7.8
M3	(−0.6,0.6,0.3)	28.6	33.6	39.6	42.0
M4	(−1.2,1.2,0.3)	61.8	67.0	74.0	78.4
M5	(−1.5,1.5,0.3)	77.4	81.6	85.4	89.2

Model	(α,β,γ)	Size/power
r_k(w) is misspecified
M2	(−0.69,0,0.3)	5.0	3.8	5.4	6.2
M3	(−0.6,0.6,0.3)	24.0	25.2	34.2	36.0
M4	(−1.2,1.2,0.3)	60.8	66.6	72.8	78.4
M5	(−1.5,1.5,0.3)	76.6	82.0	85.8	88.8
$g_{k} () a \| t, v, z ()$ is misspecified
M2	(−0.69,0,0.3)	4.8	6.6	6.0	5.8
M3	(−0.6,0.6,0.3)	17.2	18.0	28.4	28.2
M4	(−1.2,1.2,0.3)	44.8	47.2	56.4	61.0
M5	(−1.5,1.5,0.3)	58.0	60.4	68.6	73.2
r_k(w) and $g_{k} () a \| t, v, z ()$ are misspecified
M2	(−0.69,0,0.3)	4.0	4.8	4.4	4.4
M3	(−0.6,0.6,0.3)	16.6	19.6	26.8	26.6
M4	(−1.2,1.2,0.3)	43.2	46.6	55.6	60.6
M5	(−1.5,1.5,0.3)	53.8	58.8	67.4	71.4
MAR assumption is violated
M2	(−0.69,0,0.3)	6.8	6.0	7.6	7.8
M3	(−0.6,0.6,0.3)	28.6	33.6	39.6	42.0
M4	(−1.2,1.2,0.3)	61.8	67.0	74.0	78.4
M5	(−1.5,1.5,0.3)	77.4	81.6	85.4	89.2

5.3. Simulation study for the Thai trial

We conduct a simulation of the Thai trial, to gain insight about the power that is available for this real trial. Specifically, we simulated data to yield about the numbers of infections observed (74 in the placebo group and 51 in the vaccine group), the overall vaccine efficacy from the proportional hazards model is about 31%, and the true VE(v) curve decreases with v to be around 65–70% for v close to 0 and around 0% for v close to 1. The actual infection rate was only 0.3% over 3.5 years; to speed the simulations we use a 20% placebo infection rate and retain 74 infections on average.

Again with K = 1 stratum, the (T_ki, V_ki) are generated from the model

$λ (t, v | z) = γ exp {(α + β v) z}, t ⩾ 0, 0 ⩽ v ⩽ 1,$

(20)

where α, β and γ are constants. Under model (20), VE(v)=1− exp (α+βv), the marginal hazards are λ₀(t)=γ for z = 0 and λ₁(t)=γ exp (α){ exp (β)−1}/β for z = 1, and the Cox proportional hazards vaccine efficacy is VE_C=1−λ₁(t)/λ₀(t)=1− exp (α){ exp (β)−1}/β. We choose (α,β,γ)=(−1.1,1.3,0.068), yielding VE_C=0.32, VE(0)=0.67 and VE(0.85)=0. We study 400 subjects each in the vaccine and placebo groups. Matching the actual trial, the censoring rate before τ is kept very low, just under 5%. The missing mark indicator is generated from model (13), with (ψ_k0, ψ_k1) set to yield about 0%, 25% (−1.2, −0.2), 50% (0.2, −0.2) and 75% (−1.0, −0.2) missing marks among observed failures. We assume that the auxiliary variable A_ki follows model (14) given in Section 5.1, where the θ-values of ∞, 0.8, 0.4 and 0.2 correspond to ρ = 0, 0.78, 0.92, 0.98 for the correlation coefficient between A_ki and V_ki.

Because of lost information on the mark, we choose larger bandwidths for higher percentages of missing marks. We use h = 0.4 for the case with 75% missing marks, h = 0.3 for the case with 50% missing marks, h = 0.2 for the case with 25% missing marks and h = 0.15 for the case with 0% missing marks. The bandwidths $h_{1}$ and $h_{2}$ in equation (7) in the estimation of ${\hat{λ}}_{0 k}^{ipw} (t, v)$ are taken to be 0.50 and $h_{2} = h$ in each case. Powers of the proposed tests $T_{a 1}^{(1)}$ ⁠, $T_{a 2}^{(1)}$ ⁠, $T_{m 1}^{(1)}$ ⁠, $T_{m 2}^{(1)}$ ⁠, $T_{a 1}^{(2)}$ ⁠, $T_{a 1}^{(2)}$ ⁠, $T_{m 1}^{(2)}$ and $T_{m 2}^{(2)}$ for the simulations based on the Thai trial at the nominal level 0.05 are reported in Table 6. The tests show similar performance as was found in the simulation study of Section 5.1. As only 10% of infected subjects had missing marks in the RV144 trial and the auxiliary was very weakly predictive, we focus on the entries with 0% or 25% missing marks and ρ = 0. There is 67–95% power to reject H₁₀, and 33–60% power to reject H₂₀. These results show that a fairly strong sieve effect with VE(v) declining from 67% to 0% could readily be missed in the Thai trial because of limited power. The only slightly improved power with an excellent auxiliary ρ = 0.98 shows that greater numbers of events would be needed to achieve high power for testing H₂₀.

Table 6

Open in new tab

Power of the tests $T_{1}^{(1)}$ ⁠, $T_{2}^{(1)}$ ⁠, $T_{1}^{(1)}$ ⁠, $T_{2}^{(1)}$ ⁠, $T_{1}^{(2)}$ ⁠, $T_{1}^{(2)}$ ⁠, $T_{1}^{(2)}$ and $T_{2}^{(2)}$ for the Thai trial at the nominal level 0.05†

ρ	% missing marks	h	Power
			TestingH₁₀				TestingH₂₀
			$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$	$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$
	0	0.15	77	85	86	95	48	48	59	60
0	25	0.2	67	76	79	85	36	33	50	47
	50	0.3	63	71	71	82	29	27	37	42
	75	0.4	41	51	59	58	21	18	35	31
0.78	25	0.2	67	79	82	89	36	39	46	50
	50	0.3	60	71	74	84	28	28	41	39
	75	0.4	49	53	63	65	25	25	34	34
0.92	25	0.2	70	80	84	91	37	41	50	56
	50	0.3	61	71	73	87	35	39	50	51
	75	0.4	54	58	62	71	30	33	40	44
0.98	25	0.2	71	81	82	91	39	47	53	55
	50	0.3	66	76	75	86	44	42	50	52
	75	0.4	56	66	68	76	41	43	51	49

ρ	% missing marks	h	Power
			TestingH₁₀				TestingH₂₀
			$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$	$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$
	0	0.15	77	85	86	95	48	48	59	60
0	25	0.2	67	76	79	85	36	33	50	47
	50	0.3	63	71	71	82	29	27	37	42
	75	0.4	41	51	59	58	21	18	35	31
0.78	25	0.2	67	79	82	89	36	39	46	50
	50	0.3	60	71	74	84	28	28	41	39
	75	0.4	49	53	63	65	25	25	34	34
0.92	25	0.2	70	80	84	91	37	41	50	56
	50	0.3	61	71	73	87	35	39	50	51
	75	0.4	54	58	62	71	30	33	40	44
0.98	25	0.2	71	81	82	91	39	47	53	55
	50	0.3	66	76	75	86	44	42	50	52
	75	0.4	56	66	68	76	41	43	51	49

†

Each entry is based on 100 Gaussian multipliers samples and 100 repetitions.

Table 6

Open in new tab

ρ	% missing marks	h	Power
			TestingH₁₀				TestingH₂₀
			$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$	$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$
	0	0.15	77	85	86	95	48	48	59	60
0	25	0.2	67	76	79	85	36	33	50	47
	50	0.3	63	71	71	82	29	27	37	42
	75	0.4	41	51	59	58	21	18	35	31
0.78	25	0.2	67	79	82	89	36	39	46	50
	50	0.3	60	71	74	84	28	28	41	39
	75	0.4	49	53	63	65	25	25	34	34
0.92	25	0.2	70	80	84	91	37	41	50	56
	50	0.3	61	71	73	87	35	39	50	51
	75	0.4	54	58	62	71	30	33	40	44
0.98	25	0.2	71	81	82	91	39	47	53	55
	50	0.3	66	76	75	86	44	42	50	52
	75	0.4	56	66	68	76	41	43	51	49

ρ	% missing marks	h	Power
			TestingH₁₀				TestingH₂₀
			$T_{a 1}^{(1)}$	$T_{a 2}^{(1)}$	$T_{m 1}^{(1)}$	$T_{m 2}^{(1)}$	$T_{a 1}^{(2)}$	$T_{a 2}^{(2)}$	$T_{m 1}^{(2)}$	$T_{m 2}^{(2)}$
	0	0.15	77	85	86	95	48	48	59	60
0	25	0.2	67	76	79	85	36	33	50	47
	50	0.3	63	71	71	82	29	27	37	42
	75	0.4	41	51	59	58	21	18	35	31
0.78	25	0.2	67	79	82	89	36	39	46	50
	50	0.3	60	71	74	84	28	28	41	39
	75	0.4	49	53	63	65	25	25	34	34
0.92	25	0.2	70	80	84	91	37	41	50	56
	50	0.3	61	71	73	87	35	39	50	51
	75	0.4	54	58	62	71	30	33	40	44
0.98	25	0.2	71	81	82	91	39	47	53	55
	50	0.3	66	76	75	86	44	42	50	52
	75	0.4	56	66	68	76	41	43	51	49

†

Each entry is based on 100 Gaussian multipliers samples and 100 repetitions.

6. Analysis of the RV144 Thai trial

In the RV144 Thai trial, 125 subjects (51 of 8197 in the vaccine group and 74 of 8198 in the placebo group) were diagnosed with HIV infection over a 42-month follow-up period, from whom full length HIV genomes were measured from 121; three missed data because their HIV viral load was too low for the Sanger sequencing technology to work, and one dropped out (Rerks-Ngarm et al., 2009; Rolland et al., 2012). We focus on the gp120 region of the HIV Env protein, because this region stimulates anti-HIV antibody responses which are the putative cause of the observed partial vaccine efficacy. Three gp120 sequences were included in the vaccine: 92TH023 in the ALVAC canarypox vector prime component, and CM244, MN in the AIDSVAX gp120 protein boost component. 92TH023 and CM244 are subtype E HIVs whereas MN is subtype B, and 110 of the 121 subjects were infected with subtype E sequences. The subtype E vaccine insert sequences are much closer genetically to the infecting (and regional circulating) sequences than MN and thus are more likely to stimulate protective immune responses. Accordingly, the analysis focuses on the 92TH023 and CM244 reference sequences, and right-censors the 15 subjects who were HIV infected with subtype B or with unknown subtype. One subject who acquired HIV infection during the trial was documented to have acquired HIV from another trial participant who had previously become HIV infected; the analysis excludes this subject because his or her inclusion would violate the independent observations assumption. In the context of our model set-up, T is the time to diagnosis of HIV infection with subtype E HIV. The time to diagnosis of HIV infection with subtype B or with unknown HIV subtype is treated as censoring.

We define V based on HIV sequence data measured from a blood sample drawn at or before the date of HIV diagnosis. (The trial documented acute phase or preseroconversion infection in only a few subjects, prohibiting defining the mark based on acute phase sequences.) 11 of the 109 (11%) infected subjects have sequences measured from a post-diagnosis sample and hence are missing V. To maximize biological relevance and statistical power, we restrict the gp120 distances to the published set of gp120 sites in contact with known broadly neutralizing monoclonal antibodies (Moore et al., 2009; Wei et al., 2003). For each HIV sequence from a subject and each of the two reference vaccine sequences, V is computed as a weighted Hamming distance by using the point accepted mutation between scoring matrix (Nickle et al., 2007). Between two and 13 sequences (a total of 1030 sequences) were measured per infected subject, and V is defined as the subject's sequence closest to his or her consensus sequence (the consensus sequence is comprised of the majority amino acids at each site, one site at a time). Finally, the distances are rescaled to values between 0 and 1.

In total, 109 infected subjects (43 vaccine; 66 placebo) are included in the analysis, of which 98 (39 vaccine; 59 placebo) have an observed mark V; Fig. 1 displays the observed Vs.

Fig. 1.

Scatter plots of the marks V versus the HIV-infection time T for the 98 HIV-infected subjects in the Thai trial with an observed mark: the mark V is the HIV-specific point accepted mutation matrix (Nickle et al., 2007) weighted Hamming distances between a subject's HIV envelope gp120 amino acid sequence (nearest to his or her consensus sequence) and (a) the 92TH023 or (b) CM244 vaccine reference sequence; the distances restrict to the 172 amino acid sites in gp120 documented to contact broadly neutralizing monoclonal antibodies (, vaccine LOWESS smooth fit (Cleveland, 1979); , placebo LOWESS smooth fit)

Open in new tabDownload slide

To predict the probability of observing V among the 109 infected subjects, we use all-subsets logistic regression model selection considering demographics, host genetics and biomarker data post infection. The best model by the Bayesian information criterion BIC includes only the years from entry until diagnosis of HIV infection (X₁), with model fit logit ${\hat{P} (R = 1 | δ = 1, X_{1})} = 1.17 + 0.70 X_{1}$ for the CM244 reference sequence. The model was very similar for the 92TH023 reference sequence (which is not shown). In addition, we consider linear and logistic regression models for relating the mean of various potential auxiliary variables (A) to V, X₁ and treatment indicator Z. Model selection did not reveal any significantly predictive auxiliary variables; we expect that HIV sequence information measured after V has been defined would be a good predictor, but these data were not collected. Nevertheless, to implement the AIPW method we select the best available auxiliary variable, gender (A = X₂; 1≡male; 0≡female) and use the logistic regression model that results; for CM244 the fitted model $\hat{g} (A = a | V, X_{1}, Z)$ is logit ${\hat{P} (X_{2} = 1 | δ = 1, V, X_{1}, Z)} = 0.24 - 0.33 V + 0.16 X_{1} + 0.38 Z$ ⁠, and the model was very similar for 92TH023 (which is not shown).

The AIPW estimation and testing procedures are applied to the Thai trial data set with bandwidths $h_{1} = 0.5$ and $h_{2} = h = 0.3$ ⁠, a = 0.05, b = 1 and $a^{'} = a + 0.01$ (a and $a^{'}$ are near the minimum observed marks). As in the simulation study, 500 simulated Gaussian multipliers are used. Because the results are nearly identical with and without the auxiliary variable, only the latter results are presented. Fig. 2 shows the estimated VE(v) along with 95% pointwise confidence bands, indicating that vaccine efficacy appears to be high against HIVs near the 92TH023 reference sequence (estimated VE(0.01) = 56%) and declines to 0 against HIVs farthest from the 92TH023 reference sequence (estimated VE(1.0) = 2.4%). The decline is similar for the CM244 reference sequence, with estimated VE(0.01) = 45% and estimated VE(0.95) = −9.1%.

Fig. 2.

AIPW estimation of VE(v) and 95% pointwise confidence bands without using auxiliary variables for the Thai trial with bandwidths $h_{1} = 0.5$ and $h_{2} = h = 0.3$ for the monoclonal antibody contact site distances to (a) the 92TH023 and (b) CM244 reference sequences

Open in new tabDownload slide

Figs 3(a) and 3(b) show the test processes Q⁽¹⁾(v) versus 20 realizations from the Gaussian multiplier process $W_{B_{1}}^{*} (v)$ given the observed data, and Figs 3(c) and 3(d) show the parallel results for the test process Q⁽²⁾(v), each suggesting departures from the null hypothesis H₁₀ and from the null hypothesis H₂₀ for each reference sequence. The p-values of the tests based on the test statistics $T_{m 1}^{(1)}$ and $T_{m 2}^{(1)}$ for testing H₁₀ against the monotone alternative over v ∈ [0,1] are 0.032 and 0.008 for 92TH023, and 0.014 and 0.010 for CM244. The p-values of the test statistics $T_{a 1}^{(1)}$ and $T_{a 2}^{(1)}$ for testing H₁₀ against the general alternative are 0.054 and 0.018 for 92TH023 and 0.030 and 0.010 for CM244. For testing H₂₀ over v ∈ [0,1], the p-values of the supremum-type tests based on the test statistics $T_{a 1}^{(2)}$ and $T_{m 1}^{(2)}$ are 0.53 and 0.27 for 92TH023 and 0.37 and 0.18 for CM244. The p-values of the integrated square type of tests based on the test statistics $T_{a 2}^{(2)}$ and $T_{m 2}^{(2)}$ are 0.35 and 0.14 for 92TH023 and 0.44 and 0.19 for CM244.

Fig. 3.

Diagnostic plots of the test processes for the Thai trial data set with bandwidths $h_{1} = 0.5$ ⁠, $h_{2} = h = 0.3$ and a = 0.05, b = 1 and $a^{'} = a + 0.01$ ⁠: (a) Q⁽¹⁾(v) () versus 20 realizations () from the Gaussian multiplier process $W_{B_{1}}^{_{*}} (v)$ (92TH023, CM244 reference; H₁₀); (b) Q⁽¹⁾(v) () versus 20 realizations () from the Gaussian multiplier process $W_{B_{1}}^{_{*}} (v)$ (92TH023, CM244 reference; H₂₀); (c) Q⁽²⁾(v) () versus 20 realizations () from the Gaussian multiplier process $Γ (v, W_{B_{1}}^{_{*}})$ (92TH023, CM244 reference; H₁₀): (d) Q⁽²⁾(v) () versus 20 realizations () from the Gaussian multiplier process $Γ (v, W_{B_{1}}^{_{*}})$ (92TH023, CM244 reference, H₂₀)

Open in new tabDownload slide

These analyses provide more evidence that the vaccine had some protective efficacy than the original primary analysis that did not account for the mark information (Rerks-Ngarm et al., 2009): the primary analysis test for any vaccine efficacy yielded p = 0.04 whereas the tests for any vaccine efficacy against any mark reported here yielded a median p-value of 0.016 across the four test statistics and two reference sequences. The analyses also showed a non-significant trend (p-values around 0.14–0.19) that the vaccine protected better against HIVs closely matched to the vaccine strain HIVs in the monoclonal antibody contact sites but had less or absent protection against HIVs with many mismatches in these sites. Although the levels of significance are not compelling, the simulation study presented in Section 5.5.3 of the power available for detecting a vaccine sieve effect in the Thai trial showed that the study is well powered only to detect large sieve effects (with greater decline of VE(v) in v than was observed in the estimated VE(v) curves); thus a moderate-to-large sieve effect is consistent with the observed results. These results may guide future vaccine research by suggesting modifications of future vaccine candidates to include HIV sequences more closely matched to circulating HIVs in the monoclonal antibody contact sites. They may also motivate the design of future experiments to understand functional effects of amino acid mutations at the monoclonal antibody contact sites.

Acknowledgements

The authors thank Hasan Ahmed and Paul Edlefsen for generating the HIV sequence distances, and they thank the participants, investigators and sponsors of the RV144 Thai trial, including the US Military HIV Research Program, US Army Medical Research and Materiel Command, National Institute of Allergy and Infectious Diseases US and Thai Components, Armed Forces Research Institute of Medical Science Ministry of Public Health, Thailand, Mahidol University, SanofiPasteur and Global Solutions for Infectious Diseases. The authors thank the Joint Editor, Associate Editor and two referees for their helpful suggestions. The research of Yanqing Sun was partially supported by National Science Foundation grants DMS-0905777 and DMS-1208978, and the research of Dr Sun and Dr Peter Gilbert was partially supported by National Institutes of Health National Institute of Allergy and Infectious Diseases grant R37AI054165. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

References

Aalen

O. O.

and

Johansen

(

1978

)

An empirical transition matrix for non-hom*ogeneous Markov chains based on censored observations

Scand. J. Statist.

141

–

150

OpenURL Placeholder Text

Cai

Fan

and

(

2000

)

Efficient estimation and inferences for varying-coefficient models

J. Am. Statist. Ass.

888

–

902

Cleveland

W. S.

(

1979

)

Robust locally weighted regression and smoothing scatterplots

J. Am. Statist. Ass.

829

–

836

Efron

and

Tibshirani

R. J.

(

1993

)

An Introduction to the Bootstrap

New York

Chapman and Hall

Fauci

A. S.

Johnston

M. I.

Dieffenbach

C. W.

Burton

D. R.

Hammer

S. M.

Hoxie

J. A.

Martin

Overbaugh

Watkins

D. I.

Mahmoud

and

Greene

W. C.

(

2008

)

HIV vaccine research: the way forward

Science

321

530

–

532

Gilbert

P. B.

Berger

J. O.

Stablein

Becker

Essex

Hammer

S. M.

Kim

J. H.

and

Degruttola

V. G.

(

2011

)

Statistical interpretation of the RV144 HIV vaccine efficacy trial in Thailand: a case study for statistical issues in efficacy trials

J. Infect. Dis.

203

969

–

975

Gilbert

P. B.

Lele

and

Vardi

(

1999

)

Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials

Biometrika

–

Gilbert

P. B.

McKeague

I. W.

and

Sun

(

2008

)

The two-sample problem for failure rates depending on a continuous mark: an application to vaccine efficacy

Biostatistics

263

–

276

Gilbert

P. B.

Self

S. G.

and

Ashby

M. A.

(

1998

)

Statistical methods for assessing differential vaccine protection against human immunodeficiency virus types

Biometrics

799

–

814

Hoover

D. R.

Rice

J. A.

C. O.

and

Yang

P.-L.

(

1998

)

Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data

Biometrika

809

–

822

Lin

D. Y.

Wei

L. J.

and

Ying

(

1993

)

Checking the Cox model with cumulative sums of martingale-based residuals

Biometrika

557

–

572

Moore

P. L.

Ranchobe

Lambson

B. E.

Gray

E. S.

Cave

Abrahams

M.-R.

Bandawe

Mlisana

Abdool Karim

S. S.

Williamson

Morris

the CAPRISA 002 Study and the NIAID Center for HIV/AIDS Vaccine Immunology (CHAVI)

(

2009

)

Limited neutralizing antibody specificities drive neutralization escape in early HIV-1 subtype C infection

PLOS Path.

, article e1000598.

OpenURL Placeholder Text

Nickle

D. C.

Heath

Jensen

M. A.

Gilbert

P. B.

Mullins

J. I.

and

Kosakovsky Pond

S. L.

(

2007

)

HIV-specific probabilistic models of protein evolution

PLOS ONE

, no. 6, article e503.

OpenURL Placeholder Text

Prentice

R. L.

Kalbfleisch

J. D.

Peterson, Jr

A. V.

Flournoy

Farewell

V. T.

and

Breslow

N. E.

(

1978

)

The analysis of failure times in the presence of competing risks

Biometrics

541

–

554

Rerks-Ngarm

Pitisuttithum

Nitayaphan

Kaewkungwal

Chiu

Paris

Premsri

Namwat

de Souza

Adams

Benenson

Gurunathan

Tartaglia

McNeil

J. G.

Francis

D. P.

Stablein

Birx

D. L.

Chunsuttiwat

Khamboonruang

Thongcharoen

Robb

M. L.

Michael

N. L.

Kunasol

and

Kim

J. H.

(

2009

)

Vaccination with ALVAC and AIDSVAX to prevent HIV-1 infection in Thailand

New Engl. J. Med.

361

2209

–

2220

Robins

J. M.

Rotnitzky

and

Zhao

L. P.

(

1994

)

Estimation of regression coefficients when some regressors are not always observed

J. Am. Statist. Ass.

846

–

866

Rolland

Edlefsen

P. T.

Larsen

B. B.

Tovanabutra

Sanders-Buell

Hertz

deCamp

A. C.

Carrico

Menis

Magaret

C. A.

Ahmed

Juraska

Chen

Konopa

Nariya

Stoddard

J. N.

Wong

Zhao

Deng

Maust

B. S.

Bose

Howell

Bates

Lazzaro

O’Sullivan

Lei

Bradfield

Ibitamuno

Assawadarachai

O’Connell

R. J.

deSouza

M. S.

Nitayaphan

Rerks-Ngarm

Robb

M. L.

McLellan

J. S.

Georgiev

Kwong

P. D.

Carlson

J. M.

Michael

N. L.

Schief

W. R.

Gilbert

P. B.

Mullins

J. I.

and

Kim

J. H.

(

2012

)

Increased HIV-1 vaccine efficacy against viruses with genetic signatures in Env V2

Nature

490

417

–

420

Rolland

Tovanabutra

Decamp

A. C.

Frahm

Gilbert

P. B.

Sanders-Buell

Heath

Magaret

C. A.

Bose

Bradfield

O’Sullivan

Crossler

Jones

Nau

Wong

Zhao

Raugi

D. N.

Sorensen

Stoddard

J. N.

Maust

B. S.

Deng

Hural

Dubey

Michael

N. L.

Shiver

Corey

Self

S. G.

Kim

Buchbinder

Casimiro

D. R.

Robertson

M. N.

Duerr

McElrath

M. J.

McCutchan

F. E.

and

Mullins

J. I.

(

2011

)

Genetic impact of vaccination on breakthrough HIV-1 sequences from the STEP trial

Nat. Med.

366

–

371

Rubin

D. B.

(

1976

)

Inference and missing data

Biometrika

581

–

592

Stephens

M. A.

(

1974

)

Edf statistics for goodness of fit and some comparisons

J. Am. Statist. Ass.

730

–

737

Sun

and

Gilbert

P. B.

(

2012

)

Estimation of stratified mark-specific proportional hazards models with missing marks

Scand. J. Statist.

–

Sun

Gilbert

P. B.

and

McKeague

I. W.

(

2009

)

Proportional hazards models with continuous marks

Ann. Statist.

394

–

426

Sun

and

(

2005

)

Semiparametric time-varying coefficients regression model for longitudinal data

Scand. J. Statist.

–

Tian

Zucker

and

Wei

L. J.

(

2005

)

On the Cox model with time-varying regression coefficients

J. Am. Statist. Ass.

100

172

–

183