Research in Statistics and Mathematics …

I have extensive interests and expertise in Statistics and Mathematics. I have conducted research on simultaneous inference for high-dimensional dependent data and for high-dimensional discrete data, and on exponential family of distributions, Ito diffusion processes and time series models. I am shifting my research efforts to statistical modeling and inference on non-Euclidean data, and mathematical foundation of neural network models . These research topics will use a variety of tools in mathematics, including real/complex/functional/harmonic analysis, measure/probability theory, (stochastic) differential/integral equations, differential geometry, and topology.

Information on Some Projects

On this web page, papers that are marked as "Preprint" can be downloaded from arXiv.org, as "Published" can be obtained from their publishing journals, as "Manuscript" are not on arXiv or published, and as "In preparation" are not finished.

A. Current projects by Topics

  1. Interaction between algebra, geometry, topology and statistics: The Euclidean space is a topological space, an additive group, a metric space, a manifold, and a vector space, for which these 5 structures are compatible. This is why we can do nice statistics and probability theory in the Euclidean space. Suppose we remove the vector space structure from the Euclidean space, then linear models cannot be defined any more (globally); suppose we remove the additive group structure from the Euclidean space, then additive models and location-shift distributions cannot be defined any more; suppose we remove the manifold structure from the Euclidean space, then the central limit theorem may not hold anymore and the second pillar of statistics collapses; suppose we remove the metric structure from the Euclidean space, then the Glivenko–Cantelli theorem theorem may not hold any more and the first pillar of statistics collapses. In fact, without a metric or group structure, pretty much little statistics can be done. However, there are many real world data sets whose modeling data spaces lack one or several of the 5 structures mentioned earlier. So, a natural question is "To conduct sensible statistical inference and/or modeling, what are the needed minimal requirements on the algebraic, topological and/or geometric structures on the data space?"
  2. Statistics for neural networks (and in particular, deep convolutional neural networks (CNNs)) and study of "big models": Deep CNNs have demonstrated remarkable precision in binary classification (and other prediction) tasks. However, they belong to the class of models, which I call "big models", whose complexities are far beyond "small models" that have been the dominant ones in statistics. Little is known on the non-asymptotic statistical properties of CNNs; e.g., their performances in terms of FDR. In general, statistics has just started studying "big models" in the era of "big data ", and there is a lot to be done for them.
  3. The Tukey-Kramer conjecture for multiple testing: For some details on this conjecture, please read “John W. Tukey's contributions to multiple comparisons” by Yoav Benjamini and Henry Braun. It is about a claim that using the average of correlations among a set of test statistics in multiple testing for FDR control actually works (conservatively) and that we do not need to use or take into account the full dependence structure.
  4. Control of multiple error criteria in multiple testing: Most works in multiple testing control one error criterion such as the k-FDR or k-FWER without necessarily ensuring power at a prespecified level. However, in practice there are situations where we need to control multiple error criteria simultaneously or control the same error criterion on different "layers" or "groups" of hypotheses when taking into consideration structures of a set of hypotheses and at the same time ensure a prespecified power level. Needless to say, this is a very challenge task since there are examples of procedures that optimize for an error criterion and a power criterion but behave unstably or insensibly. Further, this is related to multiple testing structured hypotheses, which is a current trend in multiple testing.
  5. FDR control under dependence: Up till now, we are only aware of two types of dependence, namely, PRDS and reverse martingale, for which the famous Benjamini-Hochberg (BH) procedure is conservative. It is known that BH procedure is not conservative when, e.g., PRDS is reverted. So, a natural question is "Can we identify another type of dependence for which the BH procedure is conservative?" or "How can we modify the step-up critical constants of the BH procedure nontrivially to account for dependence and maintain FDR control?" On the other hand, there is considerable numerical evidence that some adaptive FDR procedures are conservative under positive dependence even though they have not been theoretically proven so. So, a natural question is "Can we prove that they are actually under such dependence?" or "Can we classify distributions or nontrivial dependence structures that satisfy conditional PRDS?"

B. Past Projects in Statistics by Topics

    1. Xiongzhi Chen (2025): Uniformly consistent proportion estimation for composite hypotheses via integral equations: ``the case of Gamma random variables". (To appear)
    2. Xiongzhi Chen (2025): Uniformly consistent proportion estimation for composite hypotheses via integral equations: ``the case of location-shift families''. (Preprint. Manuscript in Item 5 has been splited into Item 1 and Item 2)
    3. Xiongzhi Chen (2021+): Consistent estimation of the proportion of false nulls and FDR for adaptive multiple testing Normal means under weak dependence. (Preprint)
    4. Xiongzhi Chen (2019): Uniformly consistently estimating the proportion of false null hypotheses via Lebesgue-Stieltjes integral equations. (Published)
    5. Xiongzhi Chen (2019): Uniformly consistently estimating the proportion of false null hypotheses for composite null hypotheses via Lebesgue-Stieltjes integral equations. (Preprint)
    6. Xiongzhi Chen and R.W. Doerge (2014): A consistent estimator of the proportion of nonzero Normal means under certain strong covariance dependence. (Preprint)
    7. Xiongzhi Chen and John D. Storey (2014): Estimating the proportion of true null hypotheses via goodness of fit. (Mansucript)
    1. Xiongzhi Chen, R.W. Doerge and Sanat K. Sarkar (2020): A weighted FDR procedure under discrete and heterogeneous null distributions. (Published; R package ‘‘fdrDiscreteNull’’ on CRAN).
    2. Xiongzhi Chen, R.W. Doerge and Joseph F. Heyse (2018): Multiple testing with discrete data: proportion of true null hypotheses and two adaptive FDR procedures. (Published; R package ‘‘fdrDiscreteNull’’ on CRAN).
    3. Xiongzhi Chen, David G. Robinson and John D. Storey (2019): Functional false discovery rate with application to genomics. (Published)
    4. Xiongzhi Chen (2020): False discovery rate control for multiple testing based on discrete p-values. (Published)
    5. Xiongzhi Chen and Sanat K. Sarkar (2019): On Benjamini-Hochberg procedure applied to mid p-values. (Published)
    6. Shinjini Nandi, Sanat K. Sarkar and Xiongzhi Chen (2021): Adapting to one- and two-way classified structures of hypotheses while controlling the false discovery rate. (Published)
    7. Xiongzhi Chen and R.W. Doerge (2012): Towards better FDR procedures for discrete test statistics. (Published)
    1. Xiongzhi Chen and John D. Storey (2015): Consistent estimation of low-dimensional latent structure in high-dimensional data. (Preprint)
    2. John D. Storey, Keyur H. Desai and Xiongzhi Chen (2013): Empirical Bayes inference of dependent high-dimensional data. (Manuscript)
    3. Xiongzhi Chen and John D. Storey (2014): Nonparametric empirical Bayes estimation of the surrogate variable analysis model. (Manuscript)
    4. Xiongzhi Chen, Wei Hao and John D. Storey: Regression herding. (In preparation.)
    1. Xiongzhi Chen and R.W. Doerge (2020): A strong law of larger numbers related to multiple testing normal means. (Published)
    2. Xiongzhi Chen (2020): A strong law of large numbers for simultaneously testing parameters of Lancaster bivariate distributions. (Published)
    3. Xiongzhi Chen and R.W. Doerge (2015): Stopping time property of thresholds of Storey-type FDR procedures. (Preprint)

C. Current Projects in Mathematics by Topics

    1. Xiongzhi Chen (2015+): On Samuel Karlin's problem of geometric probability and a variant. (In prepartion.)

D. Past Projects in Mathematics by Topics

    1. Xiongzhi Chen (2016): Resolution of a conjecture on variance functions for one-parameter natural exponential family. (Published.)
    2. Xiongzhi Chen (2018): Reduction functions for the variance function of one-parameter natural exponential family. (Published.)
    1. Xiongzhi Chen (2015): Explicit solutions to a vector time series model and its induced model for business cycles. (Preprint.)
    1. Master's Thesis
    1. Xiongzhi Chen and Changlin Cai (2006): A new architecture for multilayer perceptrons as function approximators. Natural Science Edition, Journal of Sichuan University, Vol. 2.
    2. Changlin Cai, Zhongzhi Shi, Xiongzhi Chen (2006): The Fisher information matrix on neural manifolds of multilayer perceptrons. Natural Science Edition, Journal of Sichuan University, Accepted. pdf.

Back to my homepage