Backtesting is the cornerstone of systematic investing. It is what systematic funds do for a good part of their work. They rely heavily on history to tell them what works and what does not. But if you do a survey, you will find no lack of people who view backtest results with distrust. In particular, strong skeptics who believe firmly in their own analysis and skills to make investment decisions often scoffed at the simplicity of such an approach.
For those who develop a good model and think that they stand a good chance to secure funds or land a job with it, you might be in for some disappointment. Except for yourself, many will view what you claim with at least a dose of cautious skepticism, or if not, outright mistrust. And this includes even people on the systematic side of investing like myself. But don’t get me wrong, I am all for the use of backtesting. After all, it is a critical element in my work. So why?
Is it because history is not a guarantee of future performance?
This is one of the most common arguments around. But NO, that is not the reason. To me, this argument holds little meaning. Why not turn things around and ask them to name one thing that can guarantee future performance? Would that be their impeccable sense of market timing? Perhaps it is their flawless stock-picking skills and adaptability? Or maybe it is their perfect grasp of the market and ability to foresee how things unfold. Anyone who thinks along such lines is highly delusional. Nothing guarantees performance. So, this should not even qualify as an argument in the first place. The only place this fits is in the legal disclaimer of the fund prospectus or marketing materials.
Those who understand the market do not look back to history for guarantees. Instead, they look back to learn. This is no different from a discretionary trader drawing upon historical numbers, charts, events, analysis, and past experience to help shape his idea. A well-crafted backtest gives you a lot of insights by translating a conceptual idea into something more concrete. And from the data collected, you can have a better picture of how it worked in the past. And why is that even important? Well, it may not be the only premise, but if something you conceived has never even worked in the past, then what is your basis for it to work in the future?
Alright, so this is not the reason. Then what really is?
No One Really Knows What Goes Behind the Scenes
Systematic traders or investors take the inner details of their models to graves. Those working for financial institutions can be contractually prohibited from divulging any details. But even if he is not bounded by such constraints, he has no reason to freely share it with anyone else. Models are absolutely replicable. More people using it only means more competition, leading to over-saturation in the space, and an eventual dilution of the profits or alpha. Next, the models are your ideas and they represent what you are worth. Would you readily give that away to someone else? Most will not let go of their models without a strong case. Within the industry, this goes both ways. No fellow practitioner will give you their algorithm for no apparent reason, and neither will they go around probing and asking another for the exact details of their models.
But of course, there are also many exceptions. For example, some might share it with people they trust or work with. Others may contribute to communities which they benefitted from. Then there are others who might be doing it for academic purposes. Or perhaps, they are looking to monetize their investment strategy. But under most circumstances, we can safely assume that other than high-level details and the backtest outcomes, we are not going to see what goes behind it. Unfortunately, this inevitably gives rise to many important questions about the backtests. And if they can’t be adequately addressed or verified, trust can be hard to come by.
And what are some of these questions? They mostly revolve around 3 types of integrity: Data, Model, and Human.
Data Integrity
The very first step in a backtest is to collect the necessary data. If you get this initial step wrong, the entire backtest can be trashed. There are many questions one can ask about it.
What data did you use? Is it exchange-traded? What is granularity? Where did you get it from? If it is on stocks, are dividends taken into account? Are they reinvested? Did you adjust for splits? If it is futures, then did you adjust the adjoining contracts and factor in roll costs before you backtest? Are there bad data? Did you clean it up? Did you use any proxies or methods to extrapolate or interpolate any data to address gaps or data insufficiency? And were you testing your model on a large universe of stocks? And if that is the case, is your data free of survivorship bias? This can go on and on…
But all these questions ultimately lead to answering one thing: Is your data reliable? Because if this fails, then there is no point in proceeding further. So coming from the perspective of one who does not have the benefit of such details, any doubts he may have is entirely reasonable.
Model Integrity
Now, let’s suppose you nailed the data and managed to convince others about it. Sounds like good news? Indeed, it is. But the bad news is, we are just getting started.
Length of Data
How long was the backtest period? Is it long enough to capture different market conditions and cycles? While it might be unnecessary to stretch things as far back as 100, 50, or even 20 years, it is still preferable to put it through different market cycles and conditions. This gives a better appreciation of how the strategy performs under distinct regimes. As an example, it is not really helpful showing a long-only model that does spectacularly well during a bull market without giving me the other side of the picture.
Forward bias
This is a common mistake made by newbies and a fatal one if no further validation on the backtests is done. Forward bias happens when one uses data ahead of time to predict the future. Such models look terrific until you operationalize them. And you will realize you don’t have the data you need at the point you need it. Because they are not available yet.
Let’s use an example. You want to build a system that predicts, at the end of each day, the level an index will close the following day. And in your backtest, you somehow ended up using the following day’s closing price as one of your inputs. Needless to say, you will get some jaw-dropping and unbelievable backtest results. But unless you have a time machine or a crystal ball, there is no way you can hack this. And if you do have a crystal ball, why even need a model? I know this sounds ridiculous and far-fetched but I have seen it happen.
Logic Error
Things can get rather unwieldy when you are working with complex backtests. You translated your concept into a model and you assume it is working according to what you want. But sometimes, you may be in for a nasty surprise. Unless you perform checks at different points and selected scenarios, the unexpected can always occur. As an illustration, perhaps you thought your model will stay out of a trade for a period of time once it is stopped. However, your model put the trade back on right after stopping out instead. That defeats the whole purpose of having a stop in the first place.
Curve fitting
Curve fitting is another well-known issue and one that is particularly difficult to tackle. Because, technically, they are not errors. And almost everyone will experience this. Backtesters often fall into the trap of perfection. Given the power of hindsight, some just can’t resist the temptation to play god. They keep on adding parameters and optimizing their models down to the precision of a decimal until their model fits the history with extraordinary accuracy. But how much use are such models in practice? If luck is on your side, it might last you a couple of months. But more often than not, the model will break down from the onset. I came across competitions where backtest entries did incredibly well on historical data and then fails spectacularly once it is tested with unseen data.
Another more subtle form of curve fitting is often mistaken as adapting. Humans have a penchant for tweaking things especially when they view what they do as an improvement. They re-optimize the parameters or introduce more variables whenever the model hits a wall. For instance, let’s say your model crashed 20% not long after it is deployed. This is not something you expect. So you start to tune your model until the crash is wiped out clean. And this repeats every time your model seems to be giving way. If you got to keep doing that every now and then, it might be a better option to shelf the model and just put it under observation. But to others who are not privy to all these ad-hoc discretionary “adaptive” changes you have made along the way, how can they be sure of the real performance of your model?
Liquidity / Capacity
The liquidity and capacity of a strategy are something often overlooked. All strategies have a limited capacity (how scalable it is). And what do I mean by that? When you look to buy, you want as many sellers as possible and when you want to sell, you hope to see a whole sea of ready buyers. This allows you to move quickly and efficiently in and out of a position. A liquid market is large enough to do that. An illiquid one, on the other hand, is the opposite. In the latter, a large buy or sell order can trigger an up or down move on the security respectively against you. This makes your transactions a lot more costly. To minimize the impact your orders have on the markets, you might have to break your order up and execute them over a period.
Let’s look at a simple case. You can deploy a billion dollars to build a portfolio of US stocks. Just stocks listed on the New York Stock Exchange alone hold a market capitalization of more than USD 30 trillion. A billion is just a small drop in the ocean. But you can’t pull the same thing off on the Cambodia stock market because its total market capitalization is only less than half a billion.
This might be low on priority for retail investors who are less likely to run into such issues. But if you are trying to convince an institution about your strategy, then be fully prepared to give a number on the maximum asset size your strategy can take and back it up. If you pluck the number from the air, then you will end up with air.
Transaction costs
Many people omit transaction costs e.g. commissions, slippage, bid-ask spreads, and market impact. The implicit assumption is that such costs are too small to impact the results significantly. If you are running a low turnover strategy on a liquid market using a low-cost broker, then yes, this might hold. However, if you are trading illiquid stuff or doing shorter-term trading where you can turn over your entire portfolio a few times in a month, week, or day. Then a backtest result without factoring in transaction costs can be grossly misleading. Even a modest transaction cost of a few basis points can have a tremendous impact on the results of a high-frequency strategy. Imagine doing 100-round trades a day at a cost of 5bps each. This easily adds up to around 5% per day. It is fully capable of flipping a seemingly excellent backtest result deep into the red.
Human Integrity
We talked about data and model integrity, but there is a more sophisticated and problematic type of integrity issue – human integrity. With no independent checks and balances, backtests can be easily manipulated. I am not saying here that all or most people are dishonest. In fact, most mistakes are unintentional, arising out of unconscious biases, inexperience, or negligence. But there will always be a small group of people that will not hesitate to falsify results to steer things in their favor. An experienced person can always make the results look believable and spin a convincing story behind them. It can pose quite a challenge. While we can easily fix a corrupted model if we know where the problem lies, there is, unfortunately, no solution to a corrupted mind.
How Can We Bridge The Trust Gap?
There are ways to bridge the trust gap, at least partially, provided the other party is open to giving serious consideration to backtest results. Assuming you already provided as many details as you can except for the secret sauce, the rest is all about establishing credibility.
1. Put your money where your mouth is.
You can deploy real money on the model to build a live track record that further supports your backtest results. Depending on the strategy and how the market goes, this can mean a few months to years. For instance, a high-frequency strategy capitalizing on short-term market inefficiency might do with a shorter track record. A long-term strategy with fewer trades will need a much longer time. And of course, you will need to produce verifiable statements from a reliable broker. Don’t make things complicated or murky by using multiple accounts or different brokers, and then linking the results all over the place. If what you desire is trust, then keep things simple and relevant. You can also consider using an independent party like Fundseeder to verify and keep track of your performance.
2. Publish your signals in real-time for others to track.
You can provide the signals as you run forward with your backtested model whether on paper or live. This lets others keep track of the trades you make and see how your model performs over time. But it can be tricky to deliver timely information if you are running a short-term time-sensitive strategy.
3. Code your model on a third-party platform.
The previous 2 approaches do not prevent someone from meddling with the model whether by tweaking it or overriding it with discretionary inputs. Sometimes, the live or walk-forward results reveal clues about it e.g. a marked deviation from how the model used to perform. However, this is not always apparent. So to gain an even higher level of credibility, you can code your strategy on independent backtest platforms. But unfortunately, you have to accept that your code now resides somewhere else other than your own private PC.
Want to know more about AllQuant?
Commenti