February 19th 2013
Faculty of Economic and Business Sciences | Salón de Grados
FWDselect: An R package for selecting variables in regression models
2013/02/19 – 10:00 h | Marta Sestelo, University of Vigo
Abstract
In multiple regression models, when there is a large number p of explanatory variables which may or may not be relevant for making predictions about the response, it is useful to be able to reduce the model. To this end, it is necessary to determine the best subset or subsets of q (q <= p) predictors which will establish the model or models with the best prediction capacity. Here, we present a new approach to this problem, the FWDselect package which introduces a simple method to select the best model using different types of data (binary, gaussian or poisson) and applying it in different contexts (parametric or nonparametric). The developed methodology includes two stages: i) to select the best combinations of q variables by using a new forward stepwise-based selection procedure and perhaps, most importantly, ii) to determine the number of covariates to be included in the model based on bootstrap resampling techniques. The software is illustrated using pollution data.
Testing for functional cluster in grain size curves
2013/02/19 – 10:45 h | Nora M. Villanueva, University of Vigo
Abstract
In many practical situations, the main objective is to understand the effect of a continuous covariate, X, on a variable response Y. In some instances, the relationship between Y and X can vary among subsets defined by levels {1,…, k} of a given factor, F. The formulation of a factor-by-curve interaction model gives rise to several questions of interest, the first of which is whether all the curves are equal among themselves. If it is true, it can be deduced that there is not an effect of the factor F on the response. In a case where the curves are not equal, however, the second question arises, namely, do groups or clusters of curves exist and, if so, how many? To answer both questions we afford some procedures based on bootstrap. The first is a test for comparing the equality of the k curves and the second one is the classification of these curves -if the previous test is rejected- into a number of clusters. The proposed methods were applied to a grain-size measurement. The work is completed with two simulation studies to assess the behaviour of these tests.
Performance of Beta-Binomial SGoF multitesting method under dependence: a simulation study
2013/02/19 – 12:00 h | Irene Castro Conde, University of Vigo
Abstract
In a recent paper (de Uña-Álvarez, 2012) a correction of SGoF multitesting method for possibly dependent tests was introduced. This correction enhanced the field of applications of SGoF methodology, initially restricted to the independent setting, to make decisions on which hypotheses are to be rejected in a multiple hypothesis testing problem involving dependence. In this work we make a contribution to that topic through an intensive Monte Carlo simulation study of that correction, called BB-SGoF (from Beta-Binomial). In these simulations several number of blocks, within-block correlation values, effect levels, and proportion of true effects are considered. The allocation of the true effects is taken to be random. False discovery rate, power, and conservativeness of the method (with respect to the number of existing effects with p-values below the given significance threshold) are computed along the Monte Carlo trials. Comparison to the original SGoF and Benjamini-Hochberg adjustments is provided. In de Uña-Álvarez (2012) FDR and power weren’t reported so this implies a new contribution to the study of BB-SGoF procedure. Another contribution of this work is the development of an R code for the implementation of BB-SGoF method. Part of this work is included in the forthcoming publication Castro Conde I and de Uña-Álvarez J (2013).
Estimation of the transition probabilities in the illness-death model: the package TPmsm
2013/02/19 – 12:45 h | Luís Filipe Meira Machado & Artur Agostinho Araújo, University of Minho (Portugal)
Abstract
One major goal in clinical applications of multi-state models is the estimation of transition probabilities. The usual nonparametric estimator of the transition matrix for nonhomogeneous Markov processes is the Aalen-Johansen estimator (Aalen and Johansen (1978)). However, two problems may arise from using this estimator: first, its standard error may be large in heavy censored scenarios; second, the estimator may be inconsistent if the process is non-Markov. Happily, there have been several recent contributions that account for these problems. In this work we consider the estimation of the transition probabilities, using TPmsm a software application for R. It describes the capabilities of the program for estimating these quantities using seven different approaches. In two of these approaches the transition probabilities are estimated conditionally on current or past covariate measures. The software is illustrated using data from two real data sets.