Titles and Abstracts

Titles/abstracts for the Fourth Workshop on Higher-Order Asymptotics and Post-Selection Inference (WHOA-PSI)^{4}. Click here to go to the main conference page, where you can find more information. Contact: Todd Kuffner, email: kuffner@wustl.edu

Talks 

Rina Foygel Barber, University of Chicago
Title: Predictive inference with the jackknife+
Abstract: We introduce the jackknife+, a novel method for constructing predictive confidence intervals that is robust to the distribution of the data. The jackknife+ modifies the well-known jackknife (leave-one-out cross-validation) to account for the variability in the fitted regression function when we subsample the training data. Assuming exchangeable training samples, we prove that the jackknife+ permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically. Such guarantees are not possible for the original jackknife and we demonstrate examples where the coverage rate may actually vanish. Our theoretical and empirical analysis reveals that the jackknife and jackknife+ intervals achieve nearly exact coverage and have similar lengths whenever the fitting algorithm obeys some form of stability. We also extend to the setting of K-fold cross-validation. Our methods are related to cross-conformal prediction proposed by Vovk [2015] and we discuss connections. This work is joint with Emmanuel Candes, Aaditya Ramdas, and Ryan Tibshirani.

Pallavi Basu, Indian School of Business
Title: TBA
Abstract: TBA
 
Yuval Benjamini, Hebrew University of Jerusalem
Title: Extrapolating the accuracy of multi-class classification
Abstract: The difficulty of multi-class classification generally increases with the number of classes. This raises a natural question: Using data from a subset of the classes, can we predict how well a classifier will scale as the number of classes increases? In other words, how should we extrapolate the accuracy for small pilot studies to larger problems ? In this talk, I will present a framework that allows us to analyze this question. Assuming classes are sampled from a population (and some assumptions about the classifiers), we can identify how expected classification accuracy depends on the number of classes (k) via a specific cumulative distribution function. I will present a non-parametric method for estimating this function, which allows extrapolation to K>k. I will show relations with the ROC curve. Finally, I hope to discuss why the extrapolation problem may be important for neuroscientists, who are increasingly using mutliclass extrapolation accuracy as a proxy for richness of representation. This is joint work with Charles Zheng and Rakesh Achanta.

Florentina Bunea, Cornell University
Title: Essential regression
Abstract: Click here

Brian Caffo, Johns Hopkins University
Title: Statistical properties of measurement in resting state functional magnetic resonance imaging
Abstract: In this talk we discuss the statistical measurement properties of resting state functional magnetic resonance imaging data. Recent work has focused on measures of brain connectivity via resting state fMRI as a "fingerprint". We discuss the statistical properties of group fingerprint matching vis a via the matching strategy and statistical assumptions. We further explore the utility of matching as a strategy for establishing measurement quality.  Alternate strategies using ranking and measures of discriminability are also explored. Connections will be made to the use of higher order asymptotics for estimating distributional properties of matching statistics.  We further apply matching strategies on a group of subjects from the Human Connectome Project comparing matching performance of subjects to themselves, homozygous and heterozygous twins, non-twin siblings and non-relations. Furthermore, we investigate which brain connections are most and least idiosyncratic.

Emmanuel Candes, Stanford University
Title: To be announced
Abstract: TBA

Daniela De Angelis
, University of Cambridge
Title: Value of information for evidence synthesis
Abstract: In a Bayesian model that combines evidence from several different sources, it is important to know which parameters most affect the estimate or decision from the model; which of the parameter uncertainties drive the decision uncertainty; and what further data should be collected to reduce such uncertainty. These questions can be addressed by Value of Information (VoI) analysis, allowing estimation of the expected gain from learning specific parameters or collecting data of a given design. In this talk, we introduce the concept of VoI for Bayesian evidence synthesis, using and extending ideas from health economics, computer modelling and Bayesian design. We then apply it to a model developed to estimate prevalence of HIV infection, which combines indirect information from surveys, registers, and expert beliefs. Results show which parameters contribute most to the uncertainty about each prevalence estimate, and the expected improvements in precision from specific amounts of additional data. These benefits can be traded with the costs of sampling to determine an optimal sample size. Joint work with : Chris Jackson, Anne Presanis and Stefano Conti.

Julia Fukuyama, Indiana University
Title: Phylogenetically-informed distance methods: their uses, properties, and potential
Abstract: Phylogenetically-informed distances are widely used in ecology, often in conjunction with multi-dimensional scaling, to describe the relationships between communities of organisms and the taxa they comprise. A large number of such distances have been developed, each leading to a different representation of the communities. The ecology literature often tries to interpret the differences between representations given by different distances, but without a good understanding of the properties of the distances it is unclear how useful these interpretations are. I give an overview of some of these distances, describe the interpretational challenges they pose, develop some interesting properties, and comment on opportunities for post-selection inference in this domain.

Irina Gaynanova, Texas A&M University
Title: Direct inference for sparse differential network analysis
Abstract: We consider the problem of constructing confidence intervals for the differential edges between the two high-dimensional networks. The problem is motivated by the comparison of gene interactions between two molecular subtypes of colorectal cancer with distinct survival prognosis. Unlike the existing approaches for differential network inference that require sparsity of individual precision matrices from both groups, we only require sparsity of the precision matrix difference. We discuss the methods' theoretical properties, evaluate its performance in numerical studies and highlight directions for future research. This is joint work with Mladen Kolar and Byol Kim.

Ed George, University of Pennsylvania
Title: Multidimensional monotonicity discovery with MBART
Abstract: For the discovery of a regression relationship between y and x, a vector of p potential predictors, the flexible nonparametric nature of BART (Bayesian Additive Regression Trees) allows for a much richer set of possibilities than restrictive parametric approaches. To exploit the potential monotonicity of the predictors, we introduce mBART, a constrained version of BART that incorporates monotonicity with a multivariate basis of monotone trees, thereby avoiding the further confines of a full parametric form. Using mBART to estimate such effects yields (i) function estimates that are smoother and more interpretable, (ii) better out-of-sample predictive performance and (iii) less post-data uncertainty. By using mBART to simultaneously estimate both the increasing and the decreasing regions of a predictor, mBART opens up a new approach to the discovery and estimation of the decomposition of a function into its monotone components.  (This is joint work with H. Chipman, R. McCulloch and T. Shively).

Iain Johnstone, Stanford University
Title: HOA-PSI for top eigenvalues in spiked PCA models
Abstract:  The setting is principal components analysis with number of variables proportional to sample size, both large. The data are Gaussian with known spherical population covariance except for a fixed number of larger and distinct population eigenvalues, 'spikes'. If these spikes are large enough, i.e. 'supercritical', then to leading order the sample spike eigenvalues are known to be asymptotically independent Gaussian. We give the first order Edgeworth correction for this model (which is far from the usual smooth function of means setting) and note how repulsion of supercritical sample eigenvalues first becomes visible at this order.  If time allows, we outline implications for improved confidence intervals for the spike values, using a minimal conditioning strategy for post selection inference. This is joint work with Jeha Yang.

Mladen Kolar, University of Chicago
Title: TBA
Abstract: TBA
 
Vladimir Koltchinskii, Georgia Tech
Title: Bias reduction and efficiency in estimation of smooth functionals of high-dimensional parameters
Abstract: A problem of estimation of smooth functionals of high-dimensional parameters of statistical models will be discussed. The focus will be on a method of bias reduction based on approximate solutions of integral equations on the parameter space with respect to certain Markov kernels. It will be shown that, in the case of high-dimensional normal models, this approach yields estimators with optimal or nearly optimal mean squared error rates  (in particular, asymptotically efficient estimators) for all sufficiently smooth functionals. The proofs of these results rely on Gaussian concentration, representations of Markov chains as superpositions of smooth random maps and information-theoretic lower bounds. Possible extensions of this approach beyond normal models will be briefly discussed. The talk is based on a joint work with Mayya Zhilova.

Ioannis Kosmidis, University of Warwick
Title: Improved estimation of partially specified models
Abstract: This talk focuses on a new framework for reducing bias in estimation. Many bias reduction methods rely on an approximation of the bias function of the estimator under the assumption that the model is correct and fully-specified. Other bias reduction methods, like the bootstrap, the jackknife and indirect inference require fewer assumptions to operate but are typically computer-intensive. We present current research on a new framework for bias reduction that:
i) can deliver estimators with smaller bias than reference estimators even for partially specified models, as long as estimation is through unbiased estimating functions;
ii) always results in closed-form bias-reducing penalties to the objective function if estimation is through the maximisation of one, like maximum likelihood and maximum composite likelihood; and
iii) relies only on the estimating functions and their derivatives, greatly facilitating implementation through numerical or automatic differentiation techniques and standard numerical optimisation routines.

Joint work with: Nicola Lunardon, University of Milano-Bicocca, Italy

Arun Kumar Kuchibhotla, University of Pennsylvania
Title: Post-selection inference for all
Abstract: Inference after selection is currently available for limited settings. PoSI, as meant in Berk et al. (2013), has been extended to general M-estimators only for fixed (not depending on sample size) number of covariates is fixed. Selective inference, as studied by Jonathan Taylor & Co., is only rigorously proved for fixed number of covariates. In this talk, I will introduce randomness free study of M-estimators which readily yields uniform linear representation. This implies simultaneous and hence post-selection inference even with diverging number of covariates for a large class of M-estimators using high-dimensional CLT. Talk is based on "Deterministic Inequalities for Smooth M-estimators", arXiv:1809.05172.

Stephen M.S. Lee, University of Hong Kong
Title: High-dimensional local polynomial regression with variable selection and dimension reduction
Abstract: Variable selection and dimension reduction have been considered in non-parametric regression for improving the precision of estimation, via the formulation of a semiparametric multiple index model. However, most existing methods are ill-equipped to cope with a high-dimensional setting where the number of variables may grow exponentially fast with sample size. We propose a new procedure for simultaneous variable selection and dimension reduction in high-dimensional nonparametric regression problems. It consists essentially of penalised local polynomial regression, with the bandwidth matrix regularised to facilitate variable selection, dimension reduction and optimal estimation at the oracle convergence rate, all in one go. Unlike most existing methods, the proposed procedure does not require explicit bandwidth selection or an additional step of dimension determination using techniques like cross validation or principal components. Empirical performance of the procedure is illustrated with both simulated and real data examples. Joint work with Kin Yap Cheung.

Xihong Lin, Harvard University
Title: Hypothesis testing for a large number of composite nulls in genome-wide causal mediation analysis
Abstract: In genome-wide epigenetic studies, it is often of scientific interest to assess whether the effect of an exposure on a clinical outcome is mediated through DNA methylation. Statistical inference for causal mediation effects is challenged by the fact that one needs to test a large number of composite null hypotheses across the genome. In this paper, we first study the theoretical properties of the commonly used methods for testing for causal mediation effects, Sobel's test and the joint significance test. We show the joint significance test is the likelihood ratio test for the composite null hypothesis of no mediation effect. Both Sobel's test and the joint significance test follow non-standard distributions, and they are overly conservative for testing mediation effects and yield invalid inference in genome-wide epigenetic studies.  We propose a novel Divide-Aggregate Composite-null Test (DACT) for the composite null hypothesis of no mediation effect in genome-wide analysis.  We show that the DACT method provides valid statistical inference and boosts power for testing mediation effects across the genome.   We propose a correction procedure to improve the DACT method using Efron's empirical null method when the exposure-mediator or/and the mediator-outcome association signals are not sparse.   Our extensive simulation studies show that the DACT method properly controls type I error rates and outperforms the Sobel's and the joint significance tests for genome-wide causal mediation analysis. We applied the DACT method to the Normative Aging Study to identify putative DNA methylation sites that mediate the effect of smoking on lung function. We also developed a computationally efficient R package DACT for public use.

Kristin Linn, University of Pennsylvania
Title: Interactive Q-learning
Abstract: Forming evidence-based rules for optimal treatment allocation over time is a priority in personalized medicine research. Such rules must be estimated from data collected in observational or randomized studies. Popular methods for estimating optimal sequential decision rules from data, such as Q-learning, are approximate dynamic programming algorithms that require modeling non-smooth transformations of the data. Postulating a simple, well-fitting model for the transformed data can be difficult, and under many simple generative models the most commonly employed working models (namely, linear models) are known to be misspecified. We propose an alternative strategy for estimating optimal sequential decision rules wherein all modeling takes place before applying non-smooth transformations of the data. This simple re-ordering of the modeling and transformation steps leads to high-quality estimated sequential decision rules because the proposed estimators involve only conditional mean and variance modeling of smooth functionals of the data. Consequently, standard statistical procedures can be used for exploratory analysis, model building, and model validation. We will also discuss extensions of Interactive Q-learning for optimizing non-mean summaries of an outcome distribution.

Miles Lopes, UC Davis
Title: Bootstrap methods in high dimensions: spectral statistics and max statistics
Abstract: Although bootstrap methods have an extensive literature, relatively little is known about their performance in high-dimensional problems. In this talk, I will discuss two classes of statistics for which bootstrap approximations can succeed in high dimensions. The first is the class of "spectral statistics," which are functions of the eigenvalues of sample covariance matrices. In this case, I will describe a new type of bootstrap method with consistency guarantees. The second class is based on the coordinate-wise maxima of high-dimensional sample averages, which have attracted recent interest in connection with the "multiplier bootstrap". In this case, I will explain how existing theoretical rates of bootstrap approximation can be improved to near-parametric rates under certain structural conditions. (Joint work with subsets of {Alexander Aue, Andrew Blandino, Zhenhua Lin, Hans Mueller}.)

Xiao-Li Meng, Harvard University
Title: The Conditionality Principle is (still) safe and sound, but our large-p-small-n models are ill (defined)
Abstract: In recent years, a number of authors have questioned the applicability of the Conditionality Principle to high-dimensional problems, because they presented examples where certain model parameters cannot be estimated without using ancillary information. This talk points out that such questioning is meaningful only if both model parameters and ancillary statistics are defined in ways as required by the Conditionality Principle.  The mathematical assumptions between the number of parameters p and the sample size n, while very useful for approximation theory for high-dimensional problems, are typically at odds with statistical modeling both as a realizable process for generating data and as a coherent vehicle for inference. Furthermore, reaching consistency/estimability by marginalization comes at the necessary expense of solving a less relevant problem than the actual one we care about.  All these issues reinforce the time-honored ``no-free-lunch" principle, with an additional reminder to read the menu carefully for dietary restrictions. 

Art Owen, Stanford University
Title: Six percent power and barely selective inference
Abstract: Click here

Snigdha Panigrahi, University of Michigan
Title: Post-selective estimation of linear mediation effects
Abstract: In an attempt to understand the effect of exposures on an outcome variable, several mediators often make key contributions towards an indirect effect.  A priori it is not typically known which mediating pathways are of potential interest, out of a high dimensional set of candidates. Previous work in this domain has mainly focused on methods which identify likely mediators. However, a problem that has not received much attention to date is consistent estimation of the mediated associations between exposure and response, using the available samples and based upon these data-mined models. Specifically, the post-selective targets in linear mediation models take the form of adaptive linear combinations of model parameters. With the usual ``Polyhedral" machinery no longer applicable to construct pivotal inference, we deploy recently developed maximum likelihood techniques (Panigrahi and Taylor; 2019) for interval estimation. To showcase the merits of our approach, we will demonstrate in simulations an optimal tradeoff in power and inferential coherence. This is joint work with Yujia Pan.

Annie Qu, University of Illinois Urbana-Champaign
Title: Community detection with dependent connectivity
Abstract: In network analysis, within-community members are more likely to be connected than between-community members, which is reflected in that the edges within a community are intercorrelated. However, existing probabilistic models for community detection such as the stochastic block model (SBM) are not designed to capture the dependence among edges. In this paper, we propose a new community detection approach to incorporate intra-community dependence of connectivities through the Bahadur representation. The proposed method does not require specifying the likelihood function, which could be intractable for correlated binary connectivities. In addition, the proposed method allows for heterogeneity among edges between different communities. In theory, we show that incorporating correlation information can achieve a faster convergence rate compared to the independent SBM, and the proposed algorithm has a lower estimation bias and accelerated convergence compared to the variational EM. Our simulation studies show that the proposed algorithm outperforms the popular variational EM algorithm assuming conditional independence among edges. We also demonstrate the application of the proposed method to agricultural product trading networks from different countries. This is joint work with Yubai Yuan.

Aaditya Ramdas, Carnegie Mellon University
Title: Online control of the false coverage rate and false sign rate
Abstract: The reproducibility debate has caused a renewed interest in changing how one reports uncertainty, from $p$-values for testing a null hypothesis to confidence intervals (CIs) for the corresponding parameter. When CIs for multiple selected parameters are being reported, the natural analog of the false discovery rate (FDR) is the false coverage rate (FCR), which is the expected ratio of number of reported CIs that fail to cover their respective parameters to the total number of reported CIs. Here, we consider the general problem of FCR control in the online setting, where there is an infinite sequence of fixed unknown parameters $\theta_t$ ordered by time. At each step, we see independent data that is informative about $\theta_t$, and must immediately make a decision whether to report a CI for $\theta_t$ or not. If $\theta_t$ is selected for coverage, the task is to determine how to construct a CI for $\theta_t$ such that FCR $\leq \alpha$ for any $T\in \N$. While much progress has been made in online FDR control (test $\theta_t \in \Theta_{0,t}$) starting from the seminal alpha-investing paper of Foster and Stine (JRSSB, 2008), the problem of online FCR control is wide open. In this paper, we devise a novel solution to the problem which only requires the statistician to be able to construct a marginal CI at any given level. If so desired, our framework also yields online FDR control as a special case, or even online sign-classification procedures that control the false sign rate (FSR). Last, all of our methodology applies equally well to prediction intervals, having particular implications for selective conformal inference. This is joint work with Asaf Weinstein (preprint at https://arxiv.org/abs/1905.01059).

Veronika Rockova, University of Chicago
Title: TBA
Abstract: TBA
 

Cynthia Rush, Columbia University
Title: Algorithmic analysis of SLOPE via approximate message passing
Abstract: SLOPE is a relatively new convex optimization procedure for high-dimensional linear regression via the sorted L1 penalty: the larger the rank of the fitted coefficient, the larger the penalty. This non-separable penalty renders many existing techniques invalid or inconclusive in analyzing the SLOPE solution. In this talk, we propose using approximate message passing or AMP to provably solve the SLOPE problem in the regime of linear sparsity under Gaussian random designs.  This algorithmic approach allows one to approximate the SLOPE solution via the much more amenable AMP iterates, and a consequence of this analysis is an asymptotically exact characterization of the SLOPE solution.  Explicitly, we demonstrate that one can characterize the asymptotic dynamics of the AMP iterates by employing a recently developed state evolution analysis for non-separable penalties, thereby overcoming the difficulty caused by the sorted L1 penalty.  This is joint work with Zhiqi Bu, Jason Klusowski, and Weijie Su.

Richard Samworth, University of Cambridge
Title: High-dimensional principal component analysis with heterogeneous missingness
Abstract: We study the problem of high-dimensional Principal Component Analysis (PCA) with missing observations. In simple, homogeneous missingness settings with a noise level of constant order, we show that an existing inverse-probability weighted (IPW) estimator of the leading principal components can (nearly) attain the minimax optimal rate of convergence. However, deeper investigation reveals both that, particularly in more realistic settings where the missingness mechanism is heterogeneous, the empirical performance of the IPW estimator can be unsatisfactory, and moreover that, in the noiseless case, it fails to provide exact recovery of the principal components. We therefore introduce a new method for high-dimensional PCA, called `primePCA', that is designed to cope with situations where observations may be missing in a heterogeneous manner. Starting from the IPW estimator, primePCA iteratively projects the observed entries of the data matrix onto the column space of our current estimate to impute the missing entries, and then updates our estimate by computing the leading right singular space of the imputed data matrix. It turns out that the interaction between the heterogeneity of missingness and the low-dimensional structure is crucial in determining the feasibility of the problem. This leads us to impose an incoherence condition on the principal components and we prove that in the noiseless case, the error of primePCA converges to zero at a geometric rate when the signal strength is not too small. An important feature of our theoretical guarantees is that they depend on average, as opposed to worst-case, properties of the missingness mechanism. Joint work with Ziwei Zhu and Tengyao Wang.

Ulrike Schneider, TU Wien
Title: Uniformly valid confidence sets based on the Lasso in low dimensions
Abstract: In a linear regression model of fixed dimension p ≤ n, we construct confidence regions for the unknown parameter vector based on the Lasso estimator that uniformly and exactly hold the prescribed coverage in finite samples (as well as in an asymptotic setup). We thereby quantify estimation uncertainty as well as the ``post-model selection error" of this estimator. More concretely, in finite samples with Gaussian errors (and asymptotically in the case where the Lasso estimator is tuned to perform conservative model selection), we derive exact formulas for the minimal coverage probability over the entire parameter space and for a large class of shapes for the confidence sets, thus enabling the construction of valid confidence regions based on the Lasso estimator in these settings. Our calculations are carried out without explicit knowledge of the finite-sample distribution of the estimator. Furthermore, we discuss the choice of shape for the confidence sets and the comparison with the confidence ellipse based on the least-squares estimator, along with some ideas for extensions. [Reference: K. Ewald and U. Schneider, Uniformly Valid Confidence Sets Based on the Lasso, Electronic Journal of Statistics 12 (2018), 1358-1387.]

Peter Song, University of Michigan
Title: Method of Contraction-Expansion (MOCE) for simultaneous inference in linear models
Abstract: Simultaneous inference after model selection is of critical importance to address scientific hypotheses involving a set of parameters.  We consider high-dimensional linear regression model in which a regularization procedure such as LASSO is applied to yield a sparse model. To establish a simultaneous post-model selection inference, we propose a method of contraction and expansion (MOCE) along the line of debiasing estimation that enables us to balance the bias-and-variance tradeoff so that the super-sparsity assumption may be relaxed. We establish key theoretical results for the proposed MOCE procedure from which the expanded model can be selected with theoretical guarantees and simultaneous confidence regions can be constructed by the joint asymptotic normal distribution. In comparison with existing methods, our proposed method exhibits stable and reliable coverage at a nominal significance level with substantially less computational burden, and thus it is trustworthy for its application in solving real-world problems.  This is a joint work with Wang, Zhou and Tang.

Weijie Su, University of Pennsylvania
Title: Gaussian differential privacy
Abstract: Privacy-preserving data analysis has been put on a firm mathematical foundation since the introduction of differential privacy (DP) in 2006, with successful deployment in iOS and Chrome lately. This privacy definition, however, has some well-known weaknesses: notably, it does not tightly handle composition. This weakness has inspired several recent relaxations of differential privacy based on Renyi divergences. We propose an alternative relaxation of differential privacy, which we term "f-DP", which has a number of nice properties and avoids some of the difficulties associated with divergence based relaxations. First, it preserves the hypothesis testing interpretation of differential privacy, which makes its guarantees easily interpretable. It allows for lossless reasoning about composition and post-processing, and notably, a direct way to analyze privacy amplification by subsampling. We define a canonical single-parameter family of definitions within our class is termed "Gaussian Differential Privacy", based on the hypothesis testing region defined by two Gaussian distributions. We show that this family is focal by proving a central limit theorem, which shows that the privacy guarantees of -any- hypothesis-testing based definition of privacy (including differential privacy) converges to Gaussian differential privacy in the limit under composition. This central limit theorem also gives a tractable analysis tool. We demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent. This is joint work with Jinshuo Dong and Aaron Roth.

Jonathan Taylor, Stanford University
Title: Inference after selection through a black box
Abstract: We consider the problem of inference for parameters selected for reporting only after some algorithm, the canonical example be- ing inference for model parameters after a model selection procedure. The conditional correction for selection requires knowledge of how the selection is affected by changes in the underlying data, and much current re- search describes this selection explicitly. In this work, we assume 1) we have have access, in silico, to the selection algorithm itself and 2) for parameters of interest, the data input into the algorithm satisfies (pre-selection) a central limit theorem jointly with an estimator of our parameter of interest. Under these assumptions, we recast the problem into a statistical learning problem which can be fit with off-the-shelf models for binary regression. We consider two examples previously out of reach of this conditional approach: stability selection and inference after multiple runs of Model-X knockoffs.

Rob Tibshirani, Stanford University
Title: Prediction and outlier detection: a distribution-free prediction set with a balanced objective
Abstract: We consider the multi-class classification problem in the unmatched case where the training data and the out-of-sample data may have different distributions and propose a method  called BCOPS (balanced \& conformal optimized prediction set) that constructs prediction sets $C(x)$ at each $x$ in the out-of sample data. The method  tries to optimize out-of-sample performance, aiming to include the correct class as often as possible, but also detecting outliers  $x$, for which the method returns no prediction (corresponding to $C(x)$ equal to the empty set.)
BCOPS combines supervised-learning algorithms with the conformal prediction to minimize the misclassification loss over the distribution of the unlabeled out-of-sample data in the offline setting, and over a proxy of the out-of-sample distribution in the online setting. The constructed prediction sets  have a finite-sample coverage guarantee without distributional assumptions. We also describe new methods for the evaluation of out-of-sample performance in this unmatched case. We prove asymptotic consistency and efficiency of the proposed methods under suitable assumptions and illustrate them in real data examples. Joint work with Leying Guan, Yale University.


Ryan Tibshirani, Carnegie Mellon University
Title: What deep learning taught me about linear models
Abstract: Related to this paper: http://www.stat.cmu.edu/~ryantibs/papers/lsinter.pdf . Joint work with Trevor Hastie, Andrea Montanari and Saharon Rosset.

Jingshen Wang, UC Berkeley
Title: TBA
Abstract: TBA


Daniel Yekutieli, Tel Aviv University
Title: TBA
Abstract: TBA

Alastair Young, Imperial College London
Title: Challenges for (Bayesian) selective inference
Abstract: The `condition on selection' approach to selective inference is compelling, for both frequentist and Bayesian contexts, and strongly supported by classical, Fisherian, arguments. Yet, significant practical and conceptual challenges remain. Our purpose in this talk is to provide discussion of key issues, with the aim of providing a clear, pragmatic perspective on the selective inference problem, principally from a Bayesian angle. Assuming a framework in which selection is performed on a randomized version of sample data, several questions demand attention. How much should we condition? How can the computational challenge of Bayesian selective inference be met most effectively? What if the selection condition is imprecise? Should the selection condition be altered for application to randomised data? Joint work with Daniel Garcia Rasines.

Linda Zhao, University of Pennsylvania
Title: Nonparametric empirical Bayes methods for sparse, noisy signals
Abstract: We consider the high dimensional signal recovering problems. The goal is to identify the true signals from the noise with precision. Nonparametric empirical Bayesian schemes are proposed and investigated.  The method adapts well to varying degrees of sparsity. It not only performs well to recover the signals, but also provides credible intervals. A false discovery rate control method is introduced with our flexible nonparametric empirical Bayes schemes. The setup is built upon normal distribution with heteroskedastic variance but well-adapted to exponential family distributions. EM algorithm and other first order optimization methods are used and studied. Simulations show that our method outperforms existing ones. Applications in microarray data as well as sport data such as predicting batting averages in L. Brown (2008) will be discussed. This is joint work with Junhui Cai.


Posters

Stephen Bates, Stanford University
Title: TBA
Abstract: TBA

Zhiqi Bu, University of Pennsylvania
Title: SLOPE is better than LASSO: estimation and inference of SLOPE via approximate message passing
Abstract: In high-dimensional problem of reconstructing a sparse signal via the sorted L1 penalized estimation, or SLOPE, we apply the approximate message passing (AMP) to SLOPE minimization problem. We derive the AMP algorithm and state evolution respectively. We then rigorously prove that AMP solution converges to SLOPE minimization solution as iteration increases. We also use the state evolution for non-separable functions to asymptotically characterize the SLOPE solution. As a consequence, AMP and state evolution allow us to conduct inference on the SLOPE solution and demonstrate cases where SLOPE is better than LASSO (which is a special case of SLOPE). Our first result is the trade-off between false and true positive rates or, equivalently, between measures of type I and type II errors along the SLOPE path. Especially, LASSO is known to suffer from Donoho-Tanner phase transition where TPP may be bounded away from 1. In contrast, SLOPE overcomes such phase transition and one of the path can be nicely characterized as a Mobius transformation. Our second result considers fixed signal prior distribution and constructs SLOPE path that has better TPP, FDP and MSE at the same time.

Hongyuan Cao, Florida State University
Title: TBA
Abstract: TBA

Paromita Dubey, UC Davis
Title: Frechet analysis of variance and change point detection for random objects
Abstract: With an increasing abundance of complex non-Euclidean data, settings where data objects are assumed to be random variables taking values in a metric space are more frequently encountered. We propose a k-sample test for samples of random objects using Frechet mean and variance as generalizations of the notions of center and spread for metric space valued random variables. Our method is free of tuning parameters and is inspired from classical ANOVA, where traditionally groupwise variances are compared to draw inference regarding the mean. The proposed test is consistent and powerful against contiguous alternatives addressing both location and scale differences, which is captured using Frechet means and variances. Theoretical challenges are addressed using very mild assumptions on metric entropy, making our method applicable to a broad class of metric spaces, including networks, covariance matrices, probability distributions etc. Inspired by the test, we develop a method for estimation and testing of a change point in the distribution of a sequence of independent data objects. Change points are viewed as locations in a data sequence where the distribution changes either in terms of Frechet mean and or variance. We obtain the asymptotic distribution of the test statistic under the null hypothesis of no change point. We provide theoretical guarantees for consistency of the test under contiguous alternatives when a change point exists and for consistency of the estimated location of the change point. We illustrate the new approach with detecting change points in sequences of maternal fertility distributions.

Yinqiu He, University of Michigan
Title: Likelihood ratio test in multivariate linear regression: from low to high dimension
Abstract: When testing the structure of the regression coefficients matrix in multivariate linear regressions, likelihood ratio test (LRT) is one of the most popular approaches in practice. Despite its popularity, it is known that the classical chi-square approximations for LRTs often fail in high-dimensional settings, where the dimensions of responses and predictors (m, p) are allowed to grow with the sample size n. Though various corrected LRTs and other test statistics have been proposed in the literature, the fundamental question of when the classic LRT starts to fail is less studied. We first give the asymptotic boundary where the classic LRT fails and develops the corrected limiting distribution of the LRT for a general asymptotic regime. We then study the test power of the LRT in the high-dimensional setting, and develops a power-enhanced LRT. Lastly, when p>n, where the LRT is not well-defined, we propose a two-step testing procedure by first performing dimension reduction and then applying the proposed LRT. Theoretical properties are developed to ensure the validity of the proposed method. Numerical studies are also presented to demonstrate its good performance.

David Hong, University of Michigan
Title: Asymptotic eigenstructure of weighted sample covariance matrices for large dimensional low-rank models with heteroscedastic noise
Abstract: TBA

Byol Kim, University of Chicago
Title: TBA
Abstract: TBA

John Kolassa, Rutgers University
Title: TBA
Abstract: TBA

Lihua Lei, UC Berkeley
Title: The Bag-Of-Null-Statistics procedure: an adaptive framework for selecting better test statistics
Abstract: Classical multiple testing procedure often suffers from the curse of dimensionality. As the dimension increases, a traditional method that uses an agnostic p-value transformation is likely to fail to distinguish between the null and alternative hypotheses. In this work, we propose the Bag-Of-NUll-Statistic (BONuS) procedure, an adaptive procedure for multiple testing for mulvariate data, which helps improve the testing power while controlling the false discovery rate (FDR). Contrary to procedures that start with a set of p-values, our procedure starts with the original data, and adaptively finds a more powerful test statistic. It always controls FDR, works for a fairly general setting, and can gain a higher power compared to agnostic tests under mild conditions. In addition, with certain implementation techniques (Double BONuS), we can guarantee in probability that its performance is at least as good as the agnostic test.

Cong Ma, Princeton University
Title: Inference and uncertainty quantification for noisy matrix completion
Abstract: Noisy matrix completion aims at estimating a low-rank matrix given only partial and corrupted entries. Despite substantial progress in designing efficient estimation algorithms, it remains largely unclear how to assess the uncertainty of the obtained estimates and how to perform statistical inference on the unknown matrix (e.g. constructing a valid and short confidence interval for an unseen entry). This work takes a step towards inference and uncertainty quantification for noisy matrix completion. We develop a simple procedure to compensate for the bias of the widely used convex and nonconvex estimators. The resulting de-biased estimators admit nearly precise non-asymptotic distributional characterizations, which in turn enable optimal construction of confidence intervals/regions for, say, the missing entries and the low-rank factors. Our inferential procedures do not rely on sample splitting, thus avoiding unnecessary loss of data efficiency. As a byproduct, we obtain a sharp characterization of the estimation accuracy of our de-biased estimators, which, to the best of our knowledge, are the first tractable algorithms that provably achieve full statistical efficiency (including the pre-constant). The analysis herein is built upon the intimate link between convex and nonconvex optimization -- an appealing feature recently discovered by [CCF+19].

Matteo Sesia, Stanford University
Title: Multi-resolution localization of causal variants across the genome
Abstract:  We present KnockoffZoom, a flexible method for the genetic mapping of complex traits at multiple resolutions. KnockoffZoom localizes causal variants precisely and provably controls the false discovery rate using artificial genotypes as negative controls. Our method is equally valid for quantitative and binary phenotypes, making no assumptions about their genetic architectures. Instead, we rely on well-established genetic models of linkage  disequilibrium. We demonstrate that our method can detect more associations than mixed effects models and achieve fine-mapping precision, at comparable computational cost. Lastly, we apply KnockoffZoom to data from 350k subjects in the UK Biobank and report many new findings.

Nicholas Syring, Washington University in St. Louis
Title: TBA
Abstract: TBA

Armeen Taeb, Caltech
Title: TBA
Abstract: TBA

 
Hua Wang, University of Pennsylvania
Title: The simultaneous inference trade-off analysis on Lasso path
Abstract:  In high dimensional linear regression settings where explanatory variables have very low correlations and the true effective variables are sparse, each of large magnitude, it is expected that Lasso can find those true variables with few mistakes - if any. However, recent study suggest this is not the case in a regime of linear sparsity where the fraction of true effective variables tends to a constant, however small, even when the design is independent Gaussian. We further demonstrate that true features and null features are always inevitably interspersed on the Lasso path, and this effect can even get worse when the effect sizes are uniformly larger. We derive a complete diagram reveals all possible trade-off between false and true positive rates, or, equivalently, trade-off between measures of type I and type II errors along the Lasso path. And such diagram gives sharp upper and lower bounds which are sharp in the global sense. We reveal that even though the trade-off is inevitable, but the finer level of trade-off is not determined by the absolute magnitude of the effect sizes but mainly due to the relatively closeness between the effect sizes of true variables. The best case among those trade-offs is when we have all effect variables have very distinct magnitudes, and there is always a price to pay the effect sizes of the true signals are close to each other, which we interpret as ``the price of competition'', namely the cost due to the vast competition between comparable signals. Our analysis uses tools from approximate message passing (AMP) theory as well as novel elements to deal with a possibly adaptive selection of the Lasso regularizing parameter and massive conditioning techniques.

Yuling Yan, Princeton University
Title: Noisy matrix completion: understanding statistical guarantees for convex relaxation via nonconvex optimization
Abstract:  This paper studies noisy low-rank matrix completion: given partial and corrupted entries of a large low-rank matrix, the goal is to estimate the underlying matrix faithfully and efficiently. Arguably one of the most popular paradigms to tackle this problem is convex relaxation, which achieves remarkable efficacy in practice. However, the theoretical support of this approach is still far from optimal in the noisy setting, falling short of explaining the empirical success. We make progress towards demystifying the practical efficacy of convex relaxation {vis-a-vis} random noise. When the rank of the unknown matrix is a constant, we demonstrate that the convex programming approach achieves near-optimal estimation errors --- in terms of the Euclidean loss, the entrywise loss, and the spectral norm loss --- for a wide range of noise levels.  All of this is enabled by bridging convex relaxation with the nonconvex Burer--Monteiro approach, a seemingly distinct algorithmic paradigm that is provably robust against noise. More specifically, we show that an approximate critical point of the nonconvex formulation serves as an extremely tight approximation of the convex solution, allowing us to transfer the desired statistical guarantees of the nonconvex approach to its convex counterpart.

Yubai Yuan, University of Illinois Urbana-Champaign
Title: High-order embedding for hyperlink network prediction
Abstract:  In this poster, we are interested in formulating multi-layer networks arising from multiple structured relationships among vertices. This type of network system has a unique feature in that the links connecting vertices from a subgroup within or across layers of network might be correlated. We propose a novel hyperlink embedding to encode the potential subgroup structure of vertices into latent space to capture the local link dependency for the purpose of link inference. In addition, we utilize tensor decomposition to reduce the dimensionality of the high-order subgroup similarity modeling. Furthermore, to achieve the hyperlink selection from a set of potential candidates, we adopt regularizations to reinforce local concordances among vertices for subgroup structure identifications. The major advantage is that the proposed method is able to perform hyperlink predictions through observed pairwise links and underlying high-order subgroup structure in latent space. Also this subgroup structure enables pairwise link inference to borrow information through the within-subgroup dependency. Numerical studies indicate that the proposed method improves both hyperlink and pairwise link prediction accuracy compared to existing popular link prediction algorithms.

Xiaorui Zhu, University of Cincinnati
Title: Simultaneous confidence intervals using entire solution paths
Abstract:  An ideal simultaneous confidence intervals for model selection should practically provide important insights into the variable selection results. In this paper, we propose a general approach to construct simultaneous confidence intervals based on variable selection method and residual bootstraps. Our simultaneous confidence intervals has two features that are nearly achievable in other methods: (1) among all available approaches, it is the tightest one that can achieve the nominal confidence level simultaneously; (2) it shrinks intervals of most regression coefficients to zero width. Because only a small set of intervals have nonempty intervals, the simultaneous confidence intervals imply the inference of variable selection. In addition, we invent an graphical tool (named as Simultaneous Confidence Tube, SCT) to intuitively manifest the estimation and variable selection information. The theoretical properties of the simultaneous confidence intervals have been developed. We then conduct numerical studies and real data applications to illustrate the advantages of the simultaneous confidence intervals and SCT that are proposed in this article.