Abstract: Firm characteristics are ubiquitously used in economics. These characteristics are often
based on readily-available information such as accounting data, but those reflect only a part
of investors’ information set. We show that useful information about firm characteristics is
embedded in investors’ holdings data and, via market clearing, in prices, returns, and trading
data. Based on insights from the recent artificial intelligence (AI) and machine learning (ML)
literature, in which unstructured data (e.g., words or speech) are represented as continuous
vectors in a potentially high-dimensional space, we propose to learn asset embeddings from
investors’ holdings data. Indeed, just as documents arrange words that can be used to uncover
word structures via embeddings, investors organize assets in portfolios that can be used to
uncover firm characteristics that investors deem important via asset embeddings. This broad
theme provides a natural bridge to connect recent advances in the fields of AI and ML to finance
and economics. Specifically, we show how language models, including transformer models that
feature prominently in large language models such as BERT and GPT, can handle numerical
information, and in particular holdings data to estimate asset embeddings. We provide initial
evidence on the value added of asset embeddings through a series of applications in the con-
text of firm valuations, return comovement, and uncovering asset substitution patterns. As a
by-product, the models generate investor embeddings, which can be used to measure investor
similarity. We propose a programmatic list of potential applications of asset and investor em-
beddings to finance and economics more generally.
Christopher Hoegner, Technical University of Munich
Mihail Velikov, Pennsylvania State University
Abstract: This study assesses the expected returns of machine learning-based anomaly trading strategies, accounting for transaction costs, post-publication decay, and the post-decimalization era of high liquidity. Contrary to claims in prior literature, more sophisticated machine learning strategies are profitable, earning net out-of-sample monthly returns of up to 1.42%, despite having turnover rates exceeding 50% and selecting some difficult-to-arbitrage stocks. A trading strategy that employs a long short-term memory model to combine anomaly characteristics yields a six-factor generalized (net) alpha of 1.20% (t-stat of 3.46). While prevalent cost-mitigation techniques reduce turnover and costs, they do not improve net anomaly performance. Overall, we document return predictability from deep-learning models that cannot be explained by common risk factors or limits to arbitrage.
Abstract: We use deep Bayesian neural networks to investigate the determinants of trading activity in a large sample of institutional equity portfolios. Our methodology allows us to evaluate hundreds of potentially relevant explanatory variables, estimate arbitrary nonlinear interactions among them, and aggregate them into interpretable categories. Deep learning models predict trading decisions with up to 86% accuracy out-of-sample, with macroeconomic conditions and market liquidity
together accounting for most (66 − 91%) of the explained variance. Stock fundamentals, firm-specific corporate news, and analyst forecasts have comparatively low explanatory power. Our results suggest that macroeconomic risk and market microstructure considerations are the most crucial factors in understanding institutional trading patterns.