My recent research interests mainly focus on econometric theory work, with topics including causal inference with machine learning, and large dimensional portfolio theory. I also have some ongoing projects relating to applied econometrics with an emphasis on machine learning and big data.
Working papers A Heteroskedasticity-Robust Overidentifying Restriction Test with High-Dimensional Covariates, with Ziwei Mei and Zijian Guo Uniform Inference for Nonlinear Endogenous Treatment Effects with High-Dimensional Covariates, with Zijian Guo, Ziwei Mei, and Cun-Hui Zhang Navigating Complexity: Constrained Portfolio Analysis in High Dimensions with Tracking Error and Weight Constraints, with Mehmet Caner and Yingying Li Robust minimum variance portfolio for a large universe of assets, with Ruike Wu and Yanrong Yang Robust Inference for Multiple Predictive Regressions with an Application on Bond Risk Premia, with Xiaosai Liao and Xinjue Li
Publications On the instrumental variable estimation with many weak and invalid instruments, with Yiqi Lin, Frank Windmeijer and Xinyuan Song, Journal of the Royal Statistical Society: Series B (Statistical Methodology), forthcoming. Implementation codes on github (In many applications of the IV method, the researchers often need to work with "non-ideal" IVs such that instruments could be weak and/or invalid. We give identification results of the linear IV model with potentially invalid instruments among many weak IVs. The sparsest rule, conceptually equivalent to the plurality rule, is operational in computations and works with the non-convex penalty function that gives IV-validity selection consistency. The proposed weak and invalid IV robust treatment effect estimator (WIT) is useful when the researchers have what is likely the "worst case scenario," that is, unknown validity among weak signals of IVs. The method's robustness is demonstrated using an example of BMI on blood pressure, where genetic markers are used as IVs.)
Fan Q., Wu, R., Yang, Y. and Zhong, W., (2024) "Time-varying minimum variance portfolio." Journal of Econometrics. An earlier version. Implementation codes on github (This paper proposes a new time-varying minimum variance portfolio (TV-MVP) in a large investment universe of assets. Our method extends the existing literature of minimum variance portfolio by allowing for time-varying factor loadings, which is the facilitator to capture the dynamics of asset returns’ covariance structure, hence the optimal investment strategy in a dynamic setting. We show the oracle risk property of the proposed portfolio. Empirical studies show the new strategy is useful, and the flexible rebalancing works well considering transaction costs.)
Fan Q. and Wu Y., "Endogenous treatment effect estimation with a large and mixed set of instruments and control variables." The Review of Economics and Statistics, forthcoming. Implementation codes on GitHub page. An earlier version with nonparametric first stage. (Endogeneity issue commonly exists in observational studies. With the increasing availability of large datasets such as administrative data, comprehensive survey data, etc., one practical problem is what variables can serve as instruments and controls. This paper guides practitioner when facing a large dataset and model uncertainty.)
Cai, X, Fan Q. and Yuan C., (2022) "The impact of only child peers on students' cognitive and non-cognitive outcomes", Labour Economics. (According to recent US and EU surveys, a significant proportion of households with children have only one child. Utilizing representative and randomly assigned class data in China, we study how being integrated with classmates who are only children in a family affects students' academic and non-cognitive outcomes.)
Fan Q., Hsu, Y.-C., Lieli, R. and Y. Zhang, (2022) "Estimation of Conditional Average Treatment Effects with High-Dimensional Data" , Journal of Business & Economic Statistics, new! R package hdcate (CRAN version) provides implementations with general machine learning methods (Given the unconfoundedness assumption, we propose new nonparametric estimators for the reduced dimensional conditional average treatment effect (CATE) function. In the first stage, the nuisance functions necessary for identifying CATE are estimated by machine learning methods, allowing the number of covariates to be comparable to or larger than the sample size. The second stage consists of a low-dimensional local linear regression, reducing CATE to a function of the covariate(s) of interest. We derive functional limit theory for the estimators and provide an easy-to-implement procedure for uniform inference based on the multiplier bootstrap. )
CEO Early-Life Disaster Experience and Corporate Social Performance (with Don O'Sullivan and Leon Zolotoy). Strategic Management Journal, 2021 (Managerial Summary: We consider how traumatic experiences in childhood shape CEO cognition and values and, therefore, firm behavior. Our findings suggest that CEOs who have had to deal with traumatic early-life events may gain psychological strength from such experiences and that their psychological growth informs firm conduct. Specifically, our findings indicate that experience of trauma early in the CEO’s life is positively associated with corporate social performance. The implication is that boards aspiring to enhance this aspect of corporate performance may wish to consider the early-life experiences of prospective CEOs. While early-life experiences are unlikely to feature on a prospective CEO’s résumé, the typical selection process for senior executive appointments is well placed to unearth executives' life histories.)
Zhong, W., Zhou, W, Fan, Q. and Gao, Y., (2022) "Dummy Endogenous Treatment Effect Estimation Using High-Dimensional Instrumental Variables" (the R package naivereg implements the penalized logistic regression instrumental variables estimator (LIVE)), The Canadian Journal of Statistics, forthcoming (Computational social science and policy evaluation often encounter the issues of non-experimental data. The endogeneity issue caused by individual units selecting into a favorable (at least what they believe) program or unobserved confounders that affect both the outcome and treatment variables would result in biased estimation of the treatment effect. Instrumental variable method is the signature solution to this issues developed in economics and has seen applications in business administration, genetical genomics, and general causal inferences. A good instrumental variable should be able to explain the variation of the treatment decision and affect the outcome variable only through the channel of the treatment. In this paper, utilizing on the availability of large dataset with rich features of individuals (such as large survey data, administrative data, data crawled from Internet platform), the authors develop a two-stage approach to estimate the treatment effects of dummy endogenous variables using high-dimensional instrumental variables.)
Chen, T., Fan Q., Liu, K. and Le, L., (2021) Identifying key factors in momentum in basketball games, Journal of Applied Statistics, (This paper quantitatively investigates the momentum phenomenon in professional sports. Two aspects (explosiveness and duration) of the momentum definition (Chen and Fan, 2018) are operationalized so that its practicality becomes evident. Both technical variables such as field goals, assists, rebounds, etc. and environmental variables such as the spectator attendance rate and player salary dispersion are considered, and the potential for useful real-time analyses is illustrated. The paper adds an extra tool to the empirical researchers' toolkit, and it provides some insights for practitioners, including coaches, team managers and sports data analysts.)
Fan, Q. and Yu, W., (2021) Adaptive k-class estimation in high-dimensional linear models, Communications in Statistics - Simulation and Computation, (simulation code) (This paper proposes to use a data-driven method to determine the endogeneity level of the regressors as well as variable selection in high-dimensional models.)
Chen, Y., Fan, Q., Yang, X. and L. Zolotoy, (2021) "CEO Early-Life Disaster Experience and Stock Price Crash Risk", Journal of Corporate Finance, (We study the impact of CEO early-life disaster experience on stock price crash risk. Using a longitudinal sample of U.S. firms, we document that firms led by CEOs with early-life disaster experience have higher stock price crash risk. Our findings are consistent with CEOs who experienced early-life disasters being more risk tolerant, and thus more willing to accept the risks associated with bad news hoarding, engendering formation of stock price crashes. In cross-sectional analyses, we investigate the linkages by exploring various scenarios, including the CEO’s equity compensation-based incentives and power over corporate board, CEO’s asymmetric response to bad versus good news disclosures, and other measures including cash-flow volatility and stock return volatility.)
Caner M., Fan Q. and T. Grennes (2021) "Partners in debt: An endogenous non-linear analysis of the effects of public and private debt on growth", International Review of Economics & Finance (This paper offers an empirical analysis of how public and private debt jointly influence economic growth. We consider the endogeneity and interlink of two debt variables which are subject to regime switch in a dynamic panel data model. Using data from 29 OECD countries, the threshold effect of the interaction of the public and private debt on economic growth is found to be negative and significant when the aggregate debt to GDP ratio reaches 220%, beyond which the marginal effect of public (or private) debt further increases on top of the non-interactive effect. It is shown that the true effect of individual debt is largely underestimated if the interactive effect is omitted.)
Liu K. and Fan Q., (2021) "Credit expansion, bank liberalization, and structural change in bank asset accounts", Journal of Economic Dynamics and Control, (This paper studies the links among credit supply expansion, commercial bank asset account structures, and the housing boom preceding the 2007-2009 financial crisis. We propose a real business cycle model with a housing market and financial intermediaries (banks) subject to leverage constraints. In our model, banks determine their asset account structures endogenously. We show that a credit supply expansion to banks can account for four key facts that characterize the housing boom: (1) an increase in real house prices; (2) an increase in the mortgage-to-GDP ratio; (3) a decrease in the real mortgage interest rate; and (4) an increase in the ratio of mortgages to firm loans in commercial bank asset accounts. In our model, a credit supply expansion to banks can also generate a boom-bust cycle through the collateral value channel via mortgage borrowers. An asset-side bank regulation policy is discussed along with welfare analysis.)
Zhong, W., Y. Gao, W. Zhou and Fan, Q., (2021) "Endogenous Treatment Effect Estimation Using High-Dimensional Instruments and Double Selection" (the R package naivereg now includes DS-IV), Statistics and Probability Letters, (In the big data era, the availability of large dataset (such as administrative data, large survey data) imposes new challenges of model uncertainty for researchers that are interested in the causal inference of policies. In this paper, we propose a double selection instrumental variable estimator for the endogenous treatment effects using both high-dimensional control variables and instrumental variables. It deals with the endogeneity of the treatment variable and reduces omitted variable bias due to imperfect model selection. )
Fan, Q., X. Han, G. Pan and B. Jiang, (2020) Large System of Seemingly Unrelated Regressions: A Penalized Quasi-Maximum Likelihood Estimation Perspective, Econometric Theory, (In this paper, using a shrinkage estimator, we propose a penalized quasi-maximum likelihood estimator (PQMLE) to estimate a large system of equations in seemingly unrelated regression models where the number of equations is large relative to the sample size. We develop the asymptotic properties of the PQMLE for both the error covariance matrix and model coefficients. With increasing availability of large micro dataset such as the price data, the proposed method is useful to handle a large number of equations in the SUR model.)
Bao, X. and Fan, Q., (2020), The Impact of Temperature on Gaming Productivity: Evidence from Online Games, Empirical Economics, (In this paper, we study the impact of temperature on productivity measured by the performance of online computer games. Certain skills required in the computer game are also regarded as human capital, such as patience and perseverance, forward thinking and strategic planning, leadership and socialization, mental and creative prowess, multitasking, etc. Since technology has fundamentally changed our working environment and skills in the 21st century, we believe our paper could shed some light on the impact of temperature on productivity measured by computer game attributes as well as human-computer interaction in a more general scope.)
Fan, Q., Y. Han and X-P Zhang, A Study of Cross Sectional Stock Returns Using High-dimensional SUR Model and Many Firm Level Characteristics, in Proc. of 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP) (In this paper, we propose to study the cross sectional stock returns using the high-dimensional seemingly unrelated regression (SUR) model with many common factors as well as observed firm level characteristics.)
Fan, Q., F. Hu and X-P Zhang, Double-Selection Based High-dimensional Factor Model with Application in Asset Pricing, in Proc. of 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)(This paper proposes a principal component analysis (PCA) approach after a double-selection Lasso and applies it to both Chinese and US stock market data. Similar to the idea of Post-Lasso, we perform least squares regression on the principal component factors. To accommodate the nonlinear nature of the data, this paper compares the support vector regression (SVR) model with least squares regression model.)
Fan, Q. and Zhong, W., (2018), Nonparametric Additive Instrumental Variable Estimator: A Group Shrinkage Estimation Perspective, Journal of Business & Economic Statistics, (R package and a new user's manual with updated (2017) trade and growth data for NAIVE, Stata code) (This paper studies the instrumental variable selection problem with unknown functional form and number of valid instruments in the reduced form equation. Our empirical results show that in the 1980's, trade plays a more important role to growth compared to the results in the celebrated work of Frankel and Romer (1999). The naivereg program in R implements the method.) Fan, Q. and Wang, T., (2018), Game Day Effect on Stock Market: Evidence from Four Major Sports Leagues in U.S., Journal of Behavioral and Experimental Finance, (This paper investigates sports sentiment and stock market across a number of different angles, most notably, the rivalry games and media attention. Winning effect is found in small-cap stocks that are held more by local investors and in rivalry games. The game day effect is mitigated for games with higher media exposure. )
Fan, Q. and W. Zhong, (2018), Variable Selection for Structural Equation with Endogeneity, Journal of Systems Science and Complexity (In this paper, we address the variable selection problem in structural equation using adaptive lasso. In the empirical study, we revisit the famous Angrist and Krueger (1991) question of returns to education using China Census data.)
Chen, T. and Fan, Q., (2018), A Functional Data Analysis Approach to Model Score Difference in Professional Basketball Games, Journal of Applied Statistics (In this paper, we define momentum in sports and discuss its importance in determining the game outcomes. The definition of momentum is an objective measure which utilizes on observable in-game scoreboard statistics only, and hence it could provide a new avenue in modern sports analytics.)
Fan, Q., W. Lei and X-P Zhang, (2017), The Impact of Sports Sentiment on Stock Returns: A Case Study from Professional Sports Leagues, proceedings of the 2017 IEEE Global Conference on Signal & Information Processing (GlobalSIP), Symposium on Signal and Information Processing for Finance and Business, 2017 (This paper finds a strong 'loss effect' of playoff games to the fan base's local stock returns using NFL, NBA, MLB and NHL data.)
Fan, Q., X. Fu and S. Cai, (2017), Virtual World Versus Real World: An Economic Study of The Cyber Games Participation, The HCI International 2017 Conference Proceedings, Lecture Notes in Computer Science (LNCS) , In book: HCI in Business, Government and Organizations, F.F.H. Nah and C. H. Tan (Eds.), HCIBGO 2017, Part I, LNCS 10293, pp.58-77, DOI: 10.1007/978-3-319-58481-2_6 (Many of us play computer games. It is fun, yet sometimes challenging and time consuming. Some games are actually expensive to play: think about the mounts, pets, skins, etc. items that you wish to have while playing. In this study, we want to answer one question: how is real life affecting the virtual life in computer games, and vice versa?)
Fan, Q. and Wang, T., (2017), The Impact of Shanghai-Hong Kong Stock Connect Policy on A-H Share Price Premium (with new audio slides), Finance Research Letters(Contrary to the common consensus that the Shanghai-Hong Kong Stock Connect Policy fails to narrow the gap of cross-listed A-H share prices, this paper finds that the policy is indeed effective in reducing the price gap, after controlling many market and firm level characteristics.)
Other STATA, R codes and computer programs for my papers WIT: weak and invalid IV robust treatment effect estimator (codes) TV-MVP: time-varying minimum variance portfolio (codes, user's manual) hdcate: High-Dimensional Conditional Average Treatment Effects Estimation (R Package, CRAN version, user's manual) R2IVE: Robust IV Estimator to both the Irrelevant instrument and uncertain Included controls. Fan and Wu (2020) (R functions and user's manual) NAIVE: Nonparametric additive instrumental variable estimator. Fan and Zhong (2016) (STATA code, R package,user's manual (New!)) FDA basketball: "A Functional Data Analysis Approach to Model Score Difference in Professional Basketball Games". Chen and Fan (2016) (data and code)