Wednesday, 13 August 2008
WHAT IS A COMPLETE, MECHANICAL TRADING SYSTEM?
A complete, mechanical trading system, one that can be tested and employed in a totally objective fashion, without requiring human judgment, must provide both entries and exits. To be truly complete, a mechanical system must explicitly provide the following information:
1. When and how, and possibly at what price, to enter the market
2. When and how, and possibly at what price, to exit the market with a loss
3. When and how, and possibly at what price, to exit the market with a profit
WHAT ARE GOOD ENTRIES AND EXITS?
In other words, what constitutes a good entry or exit?
Notice we used the terms entry orders and exit orders, not entry or exit signals.
Why?
Because “signals” are too ambiguous.
Does a buy “signal” mean that
one should buy at the open of the next bar, or buy using a stop or limit order? And
if so, at what price? In response to a “signal” to exit a long position, does the exit occur at the close, on a profit target, or perhaps on a money management stop?
Each of these orders will have different consequences in terms of the results achieved.
THE SCIENTIFIC APPROACH TO SYSTEM DEVELOPMENT
The basic substance of the scientific approach as applied herein is as f0110ws:
1. The object of study, in this case a trading system (or one or more of its elements), must be either directly or indirectly observable, preferably
without dependence on subjective judgment, something easily achieved with proper testing and simulation software when working with complete
mechanical trading systems.
2. An orderly means for assessing the behavior of the object of study must
be available, which, in the case of trading systems, is back-testing over
long periods of historical data, together with, if appropriate, the application of various models of statistical inference, the aim of the latter being to provide a fix or reckoning of how likely a system is to hold up in the future and on different samples of data.
TOOLS AND MATERIALS NEEDED FOR THE CIENTIFIC APPROACH
First, a universe of reliable market data on which to perform back-testing and statistical analyses must be available. Since this book is focused on commodities trading, the market data used as the basis for our universe on an end-of-day time frame will be a subset of the diverse set of markets supplied by Pinnacle Data Corporation: these include the agricultural, metals, energy resources, bonds, currencies, and market indices.
Intraday time-frame trading is not addressed in this book, although it is one of our primary areas of interest that may be pursued in a subsequent volume.
TYPES OF DATA
Commodities pricing data is available for individual or continuous contracts. Individual contract data consists of quotations for individual commodities contracts.
At any given time, there may be several contracts actively trading.
Most speculators trade the front-month contracts, those that are most liquid and closest to expiration, but are not yet past first notice date. As each contract nears expiration, or passes first notice date, the trader “rolls over” any open position into the next contract.
DATA TIME FRAMES
Depending on the time frame being traded and on the nature of the trading system, individual ticks, 5.minute bars, 20-minute bars, or daily, weekly, fortnightly (bimonthly), monthly, quarterly, or even yearly data may be necessary.
A data source usually has a natural time frame.
For example, when collecting intraday data, the natural time frame is the tick. The tick is an elastic time frame:
Sometimes ticks come fast and furious, other times sporadically with long intervals between them.
The day is the natural time frame for end-of-day pricing data.
For other kinds of data, the natural time frame may be bimonthly, as is the case for the Commitment of Traders releases; or it may be quarterly, typical of company earnings reports.
DATA QUALITY
Some forecasting models, including those based on neural networks, can be exceedingly sensitive to a few errant data points; in such cases, the need for clean, error-free data is extremely important. Time spent finding good data, and then giving it a final scrubbing, is time well spent.
Data errors take many forms, some more innocuous than others. In real-time trading, for example, ticks are occasionally received that have extremely deviant, if not obviously impossible, prices.
The S&P 500 may appear to be trading at 952.00 one moment and at 250.50 the next! Is this the ultimate market crash?
No-a few seconds later, another tick will come along, indicating the S&P 500 is again trading at 952.00 or thereabouts.
What happened? A bad tick, a “noise spike,” occurred in the data.
This kind of data error, if not detected and eliminated, can skew the results produced by almost any mechanical trading model.
DATA SOURCES AND VENDORS
Data may be purchased from value-added vendors, downloaded from any of several exchanges, and extracted from a wide variety of databases accessible over the Internet and on compact discs.
Value-added vendors, such as Tick Data and Pinnacle, whose data have been used extensively in this work, can supply the trader with relatively clean data in easy-to-use form.
They also provide convenient update services and, at least in the case of Pinnacle, error corrections that are handled automatically by the downloading software, which makes the task of maintaining a reliable, up-to-date database very straightforward. Popular suppliers of end-of-day commodities data include Pinnacle Data Corporation (800-724-4903), Prophet Financial Systems (650-322-4183).
Commodities Systems Incorporated (CSI, 800.274.4727), and Technical Tools (800-231-8005).
Intraday historical data, which are needed for testing short time frame systems, may be purchased from Tick Data (SOO-822- 8425) and Genesis Financial Data Services (800-62 l-2628).
Day traders should also look into Data Transmission Network (DTN, SOO-485-4000), Data Broadcasting Corporation (DBC, 800.367.4670), Bonneville Market Information (BMI, 800-532-3400), and Future Source-Bridge (X00-621 -2628); these data distributors
TYPES OF SIMULATORS
One form is the integrated, easy to- use software application that provides some basic historical analysis and simulation along with data collection and charting.
The other form is the specialized software component or class library that can be incorporated into user-written software to provide system testing and evaluation functionality.
Software components and class libraries offer open architecture, advanced features, and high levels of performance, but require programming expertise and such additional elements as graphics, report generation, and data management to be useful.
Integrated applications packages, although generally offering less powerful simulation and testing capabilities, are much more accessible to the novice.
PROGRAMMING THE SIMULATOR
The language used may be either a generic programming language, such as C+ + or FORTRAN, or a proprietary scripting language.
Without the aid of a formal language, it would be impossible to express a system’s trading rules with the precision required for an accurate simulation.
The need for programming of some kind should not be looked upon as a necessary evil.
Programming can actually benefit the trader by encouraging au explicit and disciplined expression of trading ideas.
For an example of how trading logic is programmed into a simulator, consider Trade Station, a popular integrated product from Omega Research that contains an interpreter for a basic system writing language (called Easy Language) with historical simulation capabilities.
Omega’s Easy Language is a proprietary, trading specific language based on Pascal (a generic programming language).
SIMULATOR OUTPUT
Better simulators provide figures for maximum run-up, average favorable and adverse excursion, inferential statistics, and more, not to mention highly detailed analyses of individual trades.
An extraordinary simulator might also include in its output some measure of risk relative to reward, such as the annualized risk to- reward ratio (ARRR) or the Sharp Rario, an important and well-known measure used to compare the performances of different portfolios, systems, or funds (Sharpe, 1994).
The output from a trading simulator is typically presented to the user in the form of one or more reports.
Two basic kinds of reports are available from most trading simulators: the performance summary and the trade-by-trade, or “detail,” report.
The information contained in these reports can help the trader evaluate a system’s “trading style” and determine whether the system is worthy of real-money trading.
SIMULATOR PERFORMANCE
Trading simulators vary dramatically in such aspects of performance as speed, capacity, and power. Speed is important when there is a need to carry out many tests or perform complex optimizations, genetic or otherwise. It is also essential when developing systems on complete portfolios or using long, intraday data series involving thousands of trades and hundreds of thousands of data points.
In some instances, speed may determine whether certain explorations can even be attempted.
Simulator capacity involves problem size restrictions regarding the number of bars on which a simulation may be performed and the quantity of system code the simulator can handle.
Finally, the power a simulator gives the user to express and test complex trading ideas, and to run tests and even system optimizations on complete portfolios, can be significant to the serious, professional trader. A fairly powerful simulator is required, for example, to run many of the trading models examined in this book.
RELIABILITY OF SIMULATORS
This is true even for reputable vendors with great products.
Other problems pertain to the assumptions made regarding ambiguous situations in which any of several orders could be executed in any of several sequences during a bar.
Some of these items, e.g., the so-called bouncing tick (Ruggiero, 1998), can make it seem like the best system ever had been discovered when, in fact, it could bankrupt any trader.
WHAT OPTIMIZERS DO
What is meant by the best possible solution to a problem? Before attempting to define that phrase, let us first consider what constitutes a solution.
In trading, a solution is a particular set of trading rules and perhaps system parameters, All trading systems have at least two males (an entry rule and an exit rule), and most have one or more parameters. Rules express the logic of the trading system, and generally appear as “if-then” clauses in whatever language the trading system has been written. Parameters determine the behavior of the logic expressed in the rules;
they can include lengths of moving averages, connection weights in neural networks, thresholds used in comparisons, values that determine placements for stops and profit targets, and other similar items. The simple moving-average crossover system, used in the previous chapter to illustrate various trading simulators, had two rules:
one for the buy order and one for the sell order. It also had a single parameter, the length of the moving average. Rules and parameters completely define a trading system
and determine its performance.
HOW OPTIMIZERS ARE USED
Traders sometimes use optimizers to discover rule combinations that trade profitably. In Part II, we will demonstrate how a genetic optimizer can evolve profitable rule-based entry models. More commonly, traders call upon optimizers to determine the most appropriate values for system parameters; almost any kind
of optimizer, except perhaps an analytic optimizer, may be employed for this purpose.
Various kinds of optimizers, including powerful genetic algorithms, are
effective for training or evolving neural or fuzzy logic networks. Asset allocation problems yield to appropriate optimization strategies. Sometimes it seems as if the only limit on how optimizers may be employed is the user’s imagination, and therein lies a danger: It is easy to be seduced into “optimizer abuse” by the great and alluring power of this tool.
The correct and incorrect applications of optimizers are discussed later in this chapter.
TYPES OF OPTIMIZERS
Optimizers can be classified along such dimensions as human versus machine, complex versus simple, special purpose versus general purpose, and analytic versus stochastic.
All optimizers-regardless of kind, efficiency, or reliability-execute a search for the best of many potential solutions to a formally specified problem.
implicit Optimizers
Eventually, the trader builds a system worthy of being traded with real money. Was this system an optimized one? Since no parameters were ever explicitly adjusted and no rules were ever rearranged by the software, it appears as if the trader has succeeded in creating an unoptimized system.
However, more than one solution from a set of many possible solutions was tested and the best solution was selected for use in trading or further study.
Brute Force Optimizers
Consider a case where there are four parameters to optimize and whereeach parameter can take on any of 50 values. Brute force optimization would require that 504 (about 6 million) tests or simulations be conducted before the optimal parameter set could be determined: if one simulation was executed every 1.62 seconds (typical for Trade Station), the optimization process would take about 4 months to complete. This approach is not very practical, especially when many systems need to be tested and optimized, when there are many parameters, when the parameters can take on many values, or when you have a life.
Genetic Optimizers
Genetic optimizers endeavor to harness some of that incredible problem-
solving power through a crude simulation of the evolutionary process. In
terms of overall performance and the variety of problems that may be solved, there is no general-purpose optimizer more powerful than a properly crafted genetic one.
In addition to randomness, genetic optimizers employ selection and recombination.
The clever integration of random chance, selection, and recombination is responsible for the genetic optimizer’s great power. A full discussion of genetic algorithms, which are the basis for genetic optimizers, appears in Part II.
Genetic optimizers have many highly desirable characteristics.
Optimization by Simulated Annealing
Optimizers based on annealing mimic the thermodynamic process by which liquids freeze and metals anneal. Starting out at a high temperature, the atoms of a liquid or molten metal bounce rapidly about in a random fashion.
Slowly cooled, they mange themselves into an orderly configuration-a crystal-that represents a minimal energy state for the system.
Simulated in software, this thermodynamic process readily solves large-scale optimization problems.
As with genetic optimization, optimization by simulated annealing is a very powerful Stochastic technique, modeled upon a natural phenomenon, that can find globally optimal solutions and handle ill-behaved fitness functions.
Simulated annealing has effectively solved significant combinatorial problems, including the famous “traveling salesman problem,” and the problem of how best to arrange the millions of circuit elements found on modem integrated circuit chips, such as those that power computers. Methods based on simulated annealing should not be construed as limited to combinatorial optimization; they can readily be adapted to the optimization of real-valued parameters.
Consequently, optimizers based on simulated annealing are applicable to a wide variety of problems, including those faced by traders.
Analytic Optimizers
In some instances, analytic methods can yield a direct (no iterative) solution to an optimization problem.
This happens to be the case for multiple regression, where solutions can be obtained with a few matrix calculations. In multiple regression, the goal is to find a set of regression weights that minimize the sum of the squared prediction errors. In other cases, iterative techniques must be used.
The connection weights in a neural network,
for example, cannot be directly determined. They must be estimated using an iterative procedure,
such as back-propagation.
Many iterative techniques used to solve multivariate optimization problems (those involving several variables or parameters) employ some variation on the theme of steepest ascent. In its most basic form, optimization by steepest ascent works as follows: A point in the domain of the fitness function (that is, a set of parameter values) is chosen by some means. The gradient vector at that point is evaluated by computing the derivatives of the fitness function with respect to each of the variables or parameters; this defines the direction in dimensional parameter space for which a fixed amount of movement will produce the greatest increase in fitness.
A small step is taken up the hill in fitness space, along the direction of the gradient.
The gradient is then recomputed at this new point, and another, perhaps smaller, step is taken. The process is repeated until convergence occurs.
Monday, 11 August 2008
Linear Programming
The techniques of linear programming are designed for optimization problems involving linear cost or fitness functions, and linear constraints on the parameters or input variables.
Linear programming is typically used to solve resource allocation problems.
In the world of trading, one use of linear programming might be to allocate capital among a set of investments to maximize net profit.
If risk adjusted profit is to be optimized, linear programming methods cannot be used:
Risk-adjusted profit is not a linear function of the amount of capital allocated to each of the investments; in such instances, other techniques (e.g., genetic algorithms) must be employed.
HOW TO FAIL WITH OPTIMIZATION
However, knowledge of the way failure is achieved can be of great benefit when seeking to avoid it.
Failure with an optimizer is easy to accomplish by following a few key rules. First, be sure to use a small data sample when running sirindations: The smaller the sample, the greater the likelihood it will poorly represent the data on which the trading model will actually be traded. Next, make sure the trading system has a large number of parameters and rules to optimize: For a given data sample, the greater the number
of variables that must be estimated, the easier it will be to obtain spurious results.
It would also be beneficial to employ only a single sample on which to run tests; annoying out-of-sample data sets have no place in the rose-colored world of the ardent loser.
Finally, do avoid the headache of inferential statistics. Follow these
rules and failure is guaranteed.
What shape will failure take? Most likely, system performance will look
great in tests, but terrible in real-time trading. Neural network developers call this phenomenon “poor generalization”; traders are acquainted with it through the experience of margin calls and a serious loss of trading capital.
Large Parameter Sets
As the number of elements undergoing optimization rises, a model’s ability to capitalize on idiosyncrasies in the development sample increases along with the proportion of the model’s fitness that can be attributed to mathematical artifact. The result of optimizing a large number of variables-whether rules, parameters, or both-will be a model that performs well on the development data, but poorly on out-of-sample test data and in actual trading.
It is not the absolute number of free parameters that should be of concern,
but the number of parameters relative to the number of data points.
The shrinkage formula discussed in the context of small samples is also heuristically relevant here: It illustrates how the relationship between the number of data points and the number of parameters affects the outcome.
HOW TO SUCCEED WITH OPTIMIZATION
As a first step, optimize on the largest possible representative sample and make sure many simulated trades are available for analysis.
The second step is to keep the number of free parameters or rules small, especially in relation to sample size.
A third step involves running tests on out-of-sample data, that is, data not used or even seen during the optimization process.
As a fourth and final step, it may be worthwhile to statistically assess the results.