We developed a significance test method for multinomial naive bayes classifier with ultra-high dimensional binary features. A novel test statistic with asymptotic standard Gaussian null distribution is proposed. Under very mild assumptions, the proposed test statistic has powers that tend to 1 as the sample size tends to infinity. Then a sequential test process is developed to perform variable screening. We applied the proposed methods to lots of numerical studies including simulated examples and two real text data classification examples. The results show that our methods have good finite sample performances.
报告人简介:安百国,首都经济贸易大学副教授。2012年毕业于东北师范大学。2013-2015年美国北卡罗莱纳大学教堂山分校博士后,2016年至今工作于首都经济贸易大学统计学院。研究兴趣包括机器学习、高维复杂数据分析、文本分析、图像数据分析。