| Title: | Multi-regime Marginal Structural Cox Model for Multi-way Treatment Switch in Oncology Clinical Trials with Survival Endpoint |
|---|---|
| Description: | Estimate the causal effect of sustained treatment strategies on overall survival in clinical trials with possible treatment crossover and switch to subsequent therapy. Simulate faithful longitudinal clinical trials data with survival endpoints and multi-way treatment switches allowing for time-dependent prognostic factors. For more on methodological background, please see: Keogh et al. (2021) <doi.org/10.1002/bimj.202000040> and Suarez et al. (2008) <doi.org/10.1016/j.jclinepi.2007.11.007>. |
| Authors: | Haobin Chen [aut, cre], Yuxuan Chen [aut], Philip He [aut] |
| Maintainer: | Haobin Chen <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.1 |
| Built: | 2026-06-01 10:19:45 UTC |
| Source: | https://github.com/tonyhbc/multiswc |
multimsm() fits a marginal structural Cox model with categorical treatment
for randomized clinical trials or trial-like longitudinal data with multi-way
treatment switching, assuming participants underwent regular longitudinal
follow-up visits. This method assumes a three-way treatment switch scenario:
control participants may crossover to experimental group and both control
and experimental participants may switch to a subsequent therapy.
multimsm() takes a longitudinal survival dataset with 3 treatment process
variables as primary inputs:
rand: baseline randomized treatment arm. 1 denotes randomized
experimental treatment and 0 denotes randomized control/SOC.
cross: time-varying control-to-experimental crossover indicator (1/0).
subseq: time-varying subsequent therapy initiation indicator (1/0).
Internally, multimsm() converts this triplet of treatment trajectory into
the five-level time-varying treatment regime:
where C and E denote sustained control and sustained experimental
treatment regime at t, CE denotes control-to-experimental crossover, CS
denotes on control-to-subsequent switch regime status at t, and ES denotes
on experimental-to-subsequent switch regime status at t.
The downstream estimator is a multinomial-regime stabilized-weight marginal structural Cox model with the time-varying regime as predictor:
The stabilized treatment-regime weight (S-IPTW) at each post-baseline interval is the probability of the observed current regime under a stabilizer model divided by the corresponding probability under a denominator model. The cumulative product of these row-specific ratios is used as the inverse probability weight. Optional ordinary censoring weights may also be multiplied in, if informative censoring is present.
multimsm(dat_long, id = "id", tstart = "t.start", tstop = "t.stop", event = "event", cens = NULL, rand = "rand", cross = "cross", subseq = "subseq", base_cov = NULL, iptw_num = NULL, iptw_den, ipcw_mod = NULL, wt_trunc = NULL, prob_bounds = c(1e-06, 1 - 1e-06), normalize_weights = TRUE, robust = TRUE, check_inputs = TRUE, trace_multinom = FALSE, maxit = 200)multimsm(dat_long, id = "id", tstart = "t.start", tstop = "t.stop", event = "event", cens = NULL, rand = "rand", cross = "cross", subseq = "subseq", base_cov = NULL, iptw_num = NULL, iptw_den, ipcw_mod = NULL, wt_trunc = NULL, prob_bounds = c(1e-06, 1 - 1e-06), normalize_weights = TRUE, robust = TRUE, check_inputs = TRUE, trace_multinom = FALSE, maxit = 200)
dat_long |
A |
id |
Character. Subject identifier variable name. |
tstart |
Character. Interval start-time variable name. |
tstop |
Character. Interval stop-time variable name. |
event |
Character. Event indicator variable name ( |
cens |
Optional character Ordinary right-censoring indicator used
only if |
rand |
Character. Baseline randomization variable name ( |
cross |
Character. Time-varying control-to-experimental crossover indicator. It may be one-time pulse at crossover time (...,0,1,0,0,...) or absorbing (...,0,1,1,1,...). It must be structurally 0 always for subjects randomized to experimental treatment (assumed no EC switch). |
subseq |
Character. Time-varying subsequent therapy initiation indicator. It may be pulse or absorbing. |
base_cov |
Optional character vector of baseline covariates to include in
the default numerator model when |
iptw_num |
Optional right-hand-side formula or character string for
the stabilized numerator multinomial regime model. If |
iptw_den |
Required right-hand-side formula or character string for
the denominator multinomial regime model. This should typically include
|
ipcw_mod |
Optional right-hand-side formula or character string for an
ordinary censoring model. If supplied, |
wt_trunc |
Optional numeric in |
prob_bounds |
Numeric length-2 vector of lower and upper probability bounds used to stabilize extreme predicted probabilities away from 0 and 1. |
normalize_weights |
Logical. If |
robust |
Logical. If |
check_inputs |
Logical. If |
trace_multinom |
Logical. Passed to |
maxit |
Integer maximum number of iterations for the multinomial models. |
The treatment-switching process is assumed to follow the package's five-regime structure G(t):
R(t) is rand randomized arm, Crs(t) is cross the absorbing crossover
status, and Subs(t) is subseq the absorbing subsequent initiation status.
Users may supply cross and subseq with either sustained absorbing
status indicator such as (0, 0, 1, 1, 1), or as switch-initiation action
indicators such as (0, 0, 1, 0, 0) - both acceptable.
The current method allows only one switch type per subject's follow up. Thus,
control-arm subjects may remain in C, transition to CE, or transition to CS;
experimental-arm subjects may remain in E or transition to ES.
The outcome model uses generated regime as the exposure. By default
C is the reference regime, so the coefficient for regimeE estimates the
marginal sustained experimental-versus-control contrast aimed by the package's
primary no-switch estimand. The remaining coefficients describe the distinct
switched-regime contrasts averaging switch times relative to sustained control.
An object of class "multimsm"; a list with components:
coef_table A data.frame of full estimated coefficient table.
coef Log-hazard ratio estimates for treatment regimes against control.
hr Hazard ratio estimates for treatment regimes against control.
hr_ci Wald 95% confidence intervals for the hazard ratios.
fit The fitted weighted Cox model from survival::coxph().
p.val P values of regime effect estimates.
dat_long Augmented long-format data containing the derived variables,
including switch indicators, and final estimated weights
(.w_use/.w_use_trunc).
regime_models A list containing the fitted numerator and denominator
mulinomial regime models.
censor_model The fitted censoring model when ipcw_mod is supplied;
othrwise NULL.
diagnostics A list containing untruncated final-weight quantiles,
final switch-regime counts/proportions, cohort size, and truncation bounds.
call The matched function call.
Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11(5):550-560.
Suarez D, Haro JM, Novick D, et al. Marginal structural models for multiple treatment comparisons: an application to antipsychotic treatment for schizophrenia. Journal of Clinical Epidemiology. 2008;61(6):525-530.
set.seed(123) sim_obj <- simswc( n = 300, n_visit = 8, base_cov = c("L1", "L3"), trt_prob = 0.5, param_tdcov = c(-1.5, 0.3, 0.3, -0.2, 0.5), param_tdcov2 = c(0.05, 0.2, 0.2, -0.2, 0.7), param_sw = c(-3, 0.4, 0.4, -0.3, 0.3, 0.8), param_haz = c(0.1, log(0.8), log(1.1), log(1.1), log(1.4), log(1.4), log(1.3), log(1), log(0.9)), param_cens = NULL, param_select = c(0, 0.5, 0.25), lagk = TRUE, true_hr = FALSE ) # Convert to the required treatment switch indicator triplet dat <- sim_obj$dat_long dat$rand <- stats::ave(dat$A, dat$id, FUN = function(x) rep(x[1], length(x))) dat$cross <- dat$S_CE dat$subseq <- pmax(dat$S_ES, dat$S_CS) fit <- multimsm( dat_long = dat, id = "id", tstart = "t.start", tstop = "t.stop", event = "event", rand = "rand", cross = "cross", subseq = "subseq", base_cov = c("L1", "L3"), iptw_num = ~ regime_lag + factor(visit) + L1 + L3, iptw_den = ~ regime_lag + factor(visit) + L1 + L3 + X + U + Alag1, wt_trunc = 0.95 ) fitset.seed(123) sim_obj <- simswc( n = 300, n_visit = 8, base_cov = c("L1", "L3"), trt_prob = 0.5, param_tdcov = c(-1.5, 0.3, 0.3, -0.2, 0.5), param_tdcov2 = c(0.05, 0.2, 0.2, -0.2, 0.7), param_sw = c(-3, 0.4, 0.4, -0.3, 0.3, 0.8), param_haz = c(0.1, log(0.8), log(1.1), log(1.1), log(1.4), log(1.4), log(1.3), log(1), log(0.9)), param_cens = NULL, param_select = c(0, 0.5, 0.25), lagk = TRUE, true_hr = FALSE ) # Convert to the required treatment switch indicator triplet dat <- sim_obj$dat_long dat$rand <- stats::ave(dat$A, dat$id, FUN = function(x) rep(x[1], length(x))) dat$cross <- dat$S_CE dat$subseq <- pmax(dat$S_ES, dat$S_CS) fit <- multimsm( dat_long = dat, id = "id", tstart = "t.start", tstop = "t.stop", event = "event", rand = "rand", cross = "cross", subseq = "subseq", base_cov = c("L1", "L3"), iptw_num = ~ regime_lag + factor(visit) + L1 + L3, iptw_den = ~ regime_lag + factor(visit) + L1 + L3 + X + U + Alag1, wt_trunc = 0.95 ) fit
Print a fitted multi-regime marginal structural Cox model
## S3 method for class 'multimsm' print(x, digits = 3, ...)## S3 method for class 'multimsm' print(x, digits = 3, ...)
x |
An object returned by |
digits |
Number of decimal places for coefficient and weight summaries. |
... |
Currently unused. |
Invisibly returns x.
simswc() simulates a two-arm randomized clinical-trial-like longitudinal
survival dataset with a three-way post-randomization treatment swich
process. It is intended to generate data for evaluating and demonstrating
multi-regime marginal structural Cox model multimsm().
Follow-up is represented on a discrete visit grid
, where K = n_visit. Each subject is
randomized at baseline to experimental treatment or control
treatment . During follow-up, subjects may experience at most
one absorbing treatment switch:
ES: randomized experimental subsequent therapy;
CE: randomized control experimental crossover;
CS: randomized control subsequent therapy.
The simulator generates baseline covariates, two time-varying confounders
(X and U), a generic switch process V, switch-type indicators
S_ES, S_CE, and S_CS, optional random censoring, and a survival outcome
from a piecewise exponential hazard model. The output includes both
subject-level wide-format data and counting-process long-format data suitable
for Cox modeling and for the package's multi-regime MSM workflow.
simswc(n, n_visit, base_cov = paste0("L", 1:6), trt_prob = 0.5, param_tdcov, param_tdcov2, param_sw, param_haz, param_cens = NULL, study_end = n_visit, lagk = FALSE, param_select = NULL, p_pick_CE = 0.5, true_hr = TRUE, truth_n = 1e+05)simswc(n, n_visit, base_cov = paste0("L", 1:6), trt_prob = 0.5, param_tdcov, param_tdcov2, param_sw, param_haz, param_cens = NULL, study_end = n_visit, lagk = FALSE, param_select = NULL, p_pick_CE = 0.5, true_hr = TRUE, truth_n = 1e+05)
n |
Integer sample size. The current implementation requires an integer greater than 49. |
n_visit |
Integer number of planned discrete visits, |
base_cov |
Character vector specifying which baseline covariates are
included in the simulator's linear predictors and used to define strata for
baseline randomization. Must be a non-empty subset of
|
trt_prob |
Numeric scalar in |
param_tdcov |
Numeric vector of length
|
param_tdcov2 |
Numeric vector of length
|
param_sw |
Numeric vector of length
|
param_haz |
Numeric vector of length
|
param_cens |
Optional numeric vector of length
|
study_end |
Numeric scalar giving the nominal administrative study-end
time. Must be between |
lagk |
Logical. If |
param_select |
Optional numeric vector of length 3 controlling the
If |
p_pick_CE |
Numeric scalar in |
true_hr |
Logical. If |
truth_n |
Non-negative integer Monte Carlo sample size used only when
|
Let denote the vector of selected baseline covariates named in
base_cov. The simulator first generates six possible baseline covariates:
L1: Binary age group indicator,
.
L2: Three-level risk group taking values 0, 1, and 2 with
probabilities 0.60, 0.35, and 0.05, respectively.
L3: Binary baseline metastasis/progression-like indicator,
.
L4: Continuous tumor-size-like marker,
.
L5: Binary biomarker indicator,
.
L6: Binary prior-treatment indicator,
.
Only the variables included in base_cov are used in the treatment,
confounder, switch, censoring, and hazard linear predictors, and only these
selected baseline covariates are returned in the simulated datasets.
Baseline treatment is assigned by stratified randomization within the
interaction strata formed by base_cov; within each stratum, approximately
trt_prob of subjects are assigned to . Because base_cov
defines both the model covariate vector and the randomization strata, using a
continuous variable such as L4 may create many small strata. In practical
simulation studies, it is often preferable to stratify on categorical or binary
baseline covariates, for example base_cov = c("L1", "L3").
The time-varying binary confounder X is initialized as and is
made absorbing by construction. For ,
and the stored value is
The continuous time-varying confounder U is initialized as
. For ,
The generic switch indicator V is initialized as . Among
subjects who have not yet switched, the probability of a new switch at visit
is
Once a subject switches, V remains equal to 1. The switch destination is then
assigned as follows:
if , the subject switches to ES; S_ES is set to 1
from that visit onward and A is set to 0 thereafter to represent being
off the original experimental-treatment indicator and on subsequent
therapy;
if , the subject switches either to CE or to CS;
under CE, S_CE is set to 1 and A is set to 1 thereafter; under CS,
S_CS is set to 1 and A remains 0 thereafter.
If param_select = NULL, the control-arm switch destination is chosen with
fixed probability p_pick_CE for CE and 1 - p_pick_CE for CS. If
param_select is supplied, the probability of CE among newly switching
control-arm subjects is
The event time is generated from a piecewise exponential model. For interval
, where , the internal column index is
, and the interval-specific hazard is
Conditional on the interval covariates, an exponential waiting time with rate
is drawn. If the waiting time is less than 1, the event is
placed inside that interval at exact continuous time ;
otherwise simulation proceeds to the next interval.
If param_cens is supplied, random right censoring is generated in discrete
time. At the end of interval , among subjects not yet randomly
censored,
If param_cens = NULL, no additional random censoring is generated. In all
cases, the simulator also applies an administrative censoring time
C.a = study_end - T_e, where T_e is sampled uniformly from the integers
0:floor(n_visit / 3). The observed time is the minimum of event time,
random censoring time, and administrative censoring time.
The long-format output uses start-stop counting-process intervals. The
terminal event time is kept as a continuous time when the event occurs inside
an interval. If lagk = TRUE, the long-format data also include
Alag1-Alag3, Xlag1-Xlag3, Ulag1-Ulag3, and one-step leads
Xnext1 and Unext1.
For the revised triplet-input multimsm() interface, the simulation output
can be converted by defining rand as the subject-level baseline treatment
A.0, cross = S_CE, and subseq = pmax(S_ES, S_CS).
A list with components:
dat: Subject-level wide-format data. It contains id, selected
baseline covariates, visit-indexed matrices expanded into columns for
X, U, A, V, S_ES, S_CE, and S_CS, observed event/censoring
information (T.obs, D.obs), observed switch time T.w, administrative
censoring time C.a, and observed switch type type_sw (1 = ES,
2 = CE, 3 = CS, NA = no observed switch before exit).
dat_long: Counting-process long-format data derived from dat,
with one row per subject-interval while the subject is under observation.
It includes t.start, t.stop, event, C, selected baseline
covariates, A, V, X, U, and S_ES/S_CE/S_CS. If lagk = TRUE,
it also includes the lag/lead variables described above.
C: An n by n_visit + 1 matrix containing the cumulative
censoring process after combining random and administrative censoring.
stats: Named summary statistics: init_trt, the proportion
randomized to experimental treatment; switched, the proportion with an
observed switch before exit; prop_ES, prop_CE, and prop_CS, the
proportions of switch types among switchers; and prop_event, the
observed event proportion.
true_sdiff: If true_hr = TRUE and truth_n > 0, a list containing
an approximate sustained-regime benchmark, including a fitted
survival::survfit object, time grid t, survival curves surv0 and
surv1, an approximate Cox hazard ratio cHR, survival difference
surv_diff, and the simulated benchmark datasets dat_A0 and dat_A1.
Otherwise NULL.
Keogh RH, Seaman SR, Gran JM, Vansteelandt S. Simulating longitudinal data from marginal structural models using the additive hazard model. Biometrical Journal. 2021;63:1526-1541.
set.seed(123) sim_obj <- simswc( n = 200, n_visit = 8, base_cov = c("L1", "L3"), trt_prob = 0.5, param_tdcov = c(-1.5, 0.3, 0.3, -0.2, 0.5), param_tdcov2 = c( 0.05, 0.2, 0.2, -0.2, 0.7), param_sw = c(-3.0, 0.4, 0.4, -0.3, 0.3, 0.8), param_haz = c(0.1, log(0.8), log(1.1), log(1.1), log(1.4), log(1.4), log(1.3), log(1), log(0.9)), param_cens = NULL, param_select = c(0, 0.5, 0.25), lagk = TRUE, true_hr = FALSE ) head(sim_obj$dat_long) sim_obj$stats ## Convert to the triplet interface used by multimsm(). dat <- sim_obj$dat_long dat$rand <- stats::ave(dat$A, dat$id, FUN = function(x) rep(x[1], length(x))) dat$cross <- dat$S_CE dat$subseq <- pmax(dat$S_ES, dat$S_CS) table(dat$rand, dat$cross, dat$subseq)set.seed(123) sim_obj <- simswc( n = 200, n_visit = 8, base_cov = c("L1", "L3"), trt_prob = 0.5, param_tdcov = c(-1.5, 0.3, 0.3, -0.2, 0.5), param_tdcov2 = c( 0.05, 0.2, 0.2, -0.2, 0.7), param_sw = c(-3.0, 0.4, 0.4, -0.3, 0.3, 0.8), param_haz = c(0.1, log(0.8), log(1.1), log(1.1), log(1.4), log(1.4), log(1.3), log(1), log(0.9)), param_cens = NULL, param_select = c(0, 0.5, 0.25), lagk = TRUE, true_hr = FALSE ) head(sim_obj$dat_long) sim_obj$stats ## Convert to the triplet interface used by multimsm(). dat <- sim_obj$dat_long dat$rand <- stats::ave(dat$A, dat$id, FUN = function(x) rep(x[1], length(x))) dat$cross <- dat$S_CE dat$subseq <- pmax(dat$S_ES, dat$S_CS) table(dat$rand, dat$cross, dat$subseq)
tcoarsen() harmonizes irregular longitudinal survival data in start-stop
form onto a user-specified discrete time grid. It maps observed update times
to grid landmarks using either floor or ceiling coarsening, rebuilds start-stop
intervals, and augments each individual's follow-up record by carrying the
last known covariate and treatment values forward to empty grid intervals.
The function is intended as a transparent preprocessing utility for real-world clinical trial datasets with sparse or irregular follow-up times.The grid width and coarsening direction are substantive analysis choices and should usually be prespecified and examined in sensitivity analysis.
For use before multimsm(), include the triplet variables such as rand,
cross, and subseq in covs, and set absorb_vars = c("cross", "subseq")
when crossover and subsequent therapy indicators should be interpreted as
absorbing current-status variables. The output remains an ordinary data frame
that can be passed directly to downstream modeling functions.
tcoarsen(data, id, start, stop, event, covs = NULL, bin_width, dir_coarsen = c("floor", "ceiling"), origin = NULL, absorb_vars = NULL, keep_terminal_time = TRUE, gap_action = c("stop", "warn", "ignore"), add_visit = TRUE, visit_name = "visit", diagnostics = TRUE, verbose = TRUE)tcoarsen(data, id, start, stop, event, covs = NULL, bin_width, dir_coarsen = c("floor", "ceiling"), origin = NULL, absorb_vars = NULL, keep_terminal_time = TRUE, gap_action = c("stop", "warn", "ignore"), add_visit = TRUE, visit_name = "visit", diagnostics = TRUE, verbose = TRUE)
data |
A |
id |
A character string for the subject identifier variable name. |
start |
A character string naming the interval start-time variable. |
stop |
A character string for the interval stop-time variable name. |
event |
A character string for the terminal event indicator.
The event variable must be coded |
covs |
Optional character vector of treatment or covariate variables
to document as carried-forward predictors. Both baseline-only and
time-varying variables are allowed. The function retains all original data
columns, but |
bin_width |
A positive numeric value giving the grid width in the
same time units as |
dir_coarsen |
Coarsening direction, either |
origin |
Optional numeric grid origin. If |
absorb_vars |
Optional character vector of variables to convert to
absorbing status using within-subject cumulative maxima after sorting by
time. These variables must be coded |
keep_terminal_time |
Logical. If |
gap_action |
How to handle gaps between adjacent within-subject
start-stop intervals before coarsening. One of |
add_visit |
Logical. If |
visit_name |
A single character string giving the name of the visit-index
variable to add when |
diagnostics |
Logical. If |
verbose |
Logical. If |
Suppose an individual has observed rows indexed by j, representing intervals
[start_ij, stop_ij) over which the row values are assumed to apply. Given
grid origin origin and bin width bin_width, an observed update time s is
mapped to
origin + floor((s - origin) / bin_width) * bin_width when
dir_coarsen = "floor";
origin + ceiling((s - origin) / bin_width) * bin_width when
dir_coarsen = "ceiling".
After snapping update times, tcoarsen() inserts one row for each missing grid
interval during which the subject remains under observation. Values of all
carried variables are filled by last-known-value-carried-forward (LKCF).
Terminal event/censoring times are preserved exactly by default, so an event
at day 173 can remain stop = 173 even if the coarsening grid is monthly.
If multiple observed rows map to the same coarsened update time, the last observed row within that coarsened bin is kept. This rule is simple and reproducible, but it necessarily discards within-bin ordering information.
A tcoarsen object with two components. The first component is a data.frame
called dat_coarsen the time-coarsened input data with updated interval times
and time-varying covariates under coarsened time grid; the second component is
diagnostics for diagnostics settings and basic diagnostics.
Guerra SF, Schnitzer ME, Forget A, Blais L. Impact of discretization of the timeline for longitudinal causal inference methods. Statistics in Medicine. 2020 Sept;39(27):0277-6715.
if (requireNamespace("survival", quietly = TRUE)) { data("heart", package = "survival") heart_30 <- tcoarsen( data = heart, id = "id", start = "start", stop = "stop", event = "event", covs = c("transplant", "age", "surgery"), bin_width = 30, dir_coarsen = "floor", origin = 0, absorb_vars = "transplant", verbose = FALSE ) utils::head(heart_30$dat_coarsen, 10) }if (requireNamespace("survival", quietly = TRUE)) { data("heart", package = "survival") heart_30 <- tcoarsen( data = heart, id = "id", start = "start", stop = "stop", event = "event", covs = c("transplant", "age", "surgery"), bin_width = 30, dir_coarsen = "floor", origin = 0, absorb_vars = "transplant", verbose = FALSE ) utils::head(heart_30$dat_coarsen, 10) }