Wednesday, 13 August 2008

WHAT IS A COMPLETE, MECHANICAL TRADING SYSTEM?

One of the problems with Trading early trading was that his “system” only provided entry signals, leaving the determination of exits to subjective judgment; it was not, therefore, a complete, mechanical trading system.
A complete, mechanical trading system, one that can be tested and employed in a totally objective fashion, without requiring human judgment, must provide both entries and exits. To be truly complete, a mechanical system must explicitly provide the following information:
1. When and how, and possibly at what price, to enter the market
2. When and how, and possibly at what price, to exit the market with a loss
3. When and how, and possibly at what price, to exit the market with a profit

WHAT ARE GOOD ENTRIES AND EXITS?

Given a mechanical trading system that contains an entry model to generate entry orders and an exit model to generate exit orders (including those required for money management), how are the entries and exits evaluated to determine whether they are good?
In other words, what constitutes a good entry or exit?
Notice we used the terms entry orders and exit orders, not entry or exit signals.
Why?
Because “signals” are too ambiguous.
Does a buy “signal” mean that
one should buy at the open of the next bar, or buy using a stop or limit order? And
if so, at what price? In response to a “signal” to exit a long position, does the exit occur at the close, on a profit target, or perhaps on a money management stop?
Each of these orders will have different consequences in terms of the results achieved.

THE SCIENTIFIC APPROACH TO SYSTEM DEVELOPMENT

This is intended to accomplish a systematic and detailed analysis of the individual components that make up a complete trading system. We are proposing nothing less than a scientific study of entries, exits, and other trading system elements.
The basic substance of the scientific approach as applied herein is as f0110ws:

1. The object of study, in this case a trading system (or one or more of its elements), must be either directly or indirectly observable, preferably
without dependence on subjective judgment, something easily achieved with proper testing and simulation software when working with complete
mechanical trading systems.

2. An orderly means for assessing the behavior of the object of study must
be available, which, in the case of trading systems, is back-testing over
long periods of historical data, together with, if appropriate, the application of various models of statistical inference, the aim of the latter being to provide a fix or reckoning of how likely a system is to hold up in the future and on different samples of data.

TOOLS AND MATERIALS NEEDED FOR THE CIENTIFIC APPROACH

Before applying the scientific approach to the study of the markets, a number of things must be considered.
First, a universe of reliable market data on which to perform back-testing and statistical analyses must be available. Since this book is focused on commodities trading, the market data used as the basis for our universe on an end-of-day time frame will be a subset of the diverse set of markets supplied by Pinnacle Data Corporation: these include the agricultural, metals, energy resources, bonds, currencies, and market indices.
Intraday time-frame trading is not addressed in this book, although it is one of our primary areas of interest that may be pursued in a subsequent volume.

TYPES OF DATA


Commodities pricing data is available for individual or continuous contracts. Individual contract data consists of quotations for individual commodities contracts.
At any given time, there may be several contracts actively trading.
Most speculators trade the front-month contracts, those that are most liquid and closest to expiration, but are not yet past first notice date. As each contract nears expiration, or passes first notice date, the trader “rolls over” any open position into the next contract.

Working with individual contracts, therefore, can add a great deal of complexity to simulations and tests.

Not only must trades directly generated by the trading system be dealt with, but the system developer must also correctly handle rollovers and the selection of appropriate contracts.

DATA TIME FRAMES

Data may be used in its natural time frame or may need to be processed into a different time frame.
Depending on the time frame being traded and on the nature of the trading system, individual ticks, 5.minute bars, 20-minute bars, or daily, weekly, fortnightly (bimonthly), monthly, quarterly, or even yearly data may be necessary.
A data source usually has a natural time frame.
For example, when collecting intraday data, the natural time frame is the tick. The tick is an elastic time frame:
Sometimes ticks come fast and furious, other times sporadically with long intervals between them.

The day is the natural time frame for end-of-day pricing data.
For other kinds of data, the natural time frame may be bimonthly, as is the case for the Commitment of Traders releases; or it may be quarterly, typical of company earnings reports.

DATA QUALITY

Data quality varies from excellent to awful. Since bad data can wreak havoc with all forms of analysis, lead to misleading results, and waste precious time, the best data that can be found when running tests and trading simulations.
Some forecasting models, including those based on neural networks, can be exceedingly sensitive to a few errant data points; in such cases, the need for clean, error-free data is extremely important. Time spent finding good data, and then giving it a final scrubbing, is time well spent.
Data errors take many forms, some more innocuous than others. In real-time trading, for example, ticks are occasionally received that have extremely deviant, if not obviously impossible, prices.

The S&P 500 may appear to be trading at 952.00 one moment and at 250.50 the next! Is this the ultimate market crash?
No-a few seconds later, another tick will come along, indicating the S&P 500 is again trading at 952.00 or thereabouts.

What happened? A bad tick, a “noise spike,” occurred in the data.
This kind of data error, if not detected and eliminated, can skew the results produced by almost any mechanical trading model.

DATA SOURCES AND VENDORS

Today there are a great many sowed from which data may be acquired.
Data may be purchased from value-added vendors, downloaded from any of several exchanges, and extracted from a wide variety of databases accessible over the Internet and on compact discs.
Value-added vendors, such as Tick Data and Pinnacle, whose data have been used extensively in this work, can supply the trader with relatively clean data in easy-to-use form.
They also provide convenient update services and, at least in the case of Pinnacle, error corrections that are handled automatically by the downloading software, which makes the task of maintaining a reliable, up-to-date database very straightforward. Popular suppliers of end-of-day commodities data include Pinnacle Data Corporation (800-724-4903), Prophet Financial Systems (650-322-4183).
Commodities Systems Incorporated (CSI, 800.274.4727), and Technical Tools (800-231-8005).

Intraday historical data, which are needed for testing short time frame systems, may be purchased from Tick Data (SOO-822- 8425) and Genesis Financial Data Services (800-62 l-2628).
Day traders should also look into Data Transmission Network (DTN, SOO-485-4000), Data Broadcasting Corporation (DBC, 800.367.4670), Bonneville Market Information (BMI, 800-532-3400), and Future Source-Bridge (X00-621 -2628); these data distributors

TYPES OF SIMULATORS

There are two major forms of trading simulators.
One form is the integrated, easy to- use software application that provides some basic historical analysis and simulation along with data collection and charting.
The other form is the specialized software component or class library that can be incorporated into user-written software to provide system testing and evaluation functionality.
Software components and class libraries offer open architecture, advanced features, and high levels of performance, but require programming expertise and such additional elements as graphics, report generation, and data management to be useful.
Integrated applications packages, although generally offering less powerful simulation and testing capabilities, are much more accessible to the novice.

PROGRAMMING THE SIMULATOR

Regardless of whether an integrated or component-based simulator is employed, the trading logic of the user’s system must be programmed into it using some computer language.
The language used may be either a generic programming language, such as C+ + or FORTRAN, or a proprietary scripting language.
Without the aid of a formal language, it would be impossible to express a system’s trading rules with the precision required for an accurate simulation.
The need for programming of some kind should not be looked upon as a necessary evil.

Programming can actually benefit the trader by encouraging au explicit and disciplined expression of trading ideas.
For an example of how trading logic is programmed into a simulator, consider Trade Station, a popular integrated product from Omega Research that contains an interpreter for a basic system writing language (called Easy Language) with historical simulation capabilities.

Omega’s Easy Language is a proprietary, trading specific language based on Pascal (a generic programming language).

SIMULATOR OUTPUT

All good trading simulators generate output containing a wealth of information about the performance of the user’s simulated account. Expect to obtain data on gross and net profit, number of winning and losing trades, worst-case drawdown, and related system characteristics, from even the most basic simulators.
Better simulators provide figures for maximum run-up, average favorable and adverse excursion, inferential statistics, and more, not to mention highly detailed analyses of individual trades.

An extraordinary simulator might also include in its output some measure of risk relative to reward, such as the annualized risk to- reward ratio (ARRR) or the Sharp Rario, an important and well-known measure used to compare the performances of different portfolios, systems, or funds (Sharpe, 1994).
The output
from a trading simulator is typically presented to the user in the form of one or more reports.
Two basic kinds of reports are available from most trading simulators: the performance summary and the trade-by-trade, or “detail,” report.
The information contained in these reports can help the trader evaluate a system’s “
trading style” and determine whether the system is worthy of real-money trading.

SIMULATOR PERFORMANCE


Trading simulators vary dramatically in such aspects of performance as speed, capacity, and power. Speed is important when there is a need to carry out many tests or perform complex optimizations, genetic or otherwise. It is also essential when developing systems on complete portfolios or using long, intraday data series involving thousands of trades and hundreds of thousands of data points.
In some instances, speed may determine whether certain explorations can even be attempted.

Some problems are simply not practical to study unless the analyses can be accomplished in a reasonable length of time.
Simulator capacity involves problem size restrictions regarding the number of bars on which a simulation may be performed and the quantity of system code the simulator can handle.
Finally, the power a simulator gives the user to express and test complex trading ideas, and to run tests and even system optimizations on complete portfolios, can be significant to the serious, professional trader. A fairly powerful simulator is required, for example, to run many of the
trading models examined in this book.

RELIABILITY OF SIMULATORS

Trading simulators vary in their reliability and trustworthiness. No complex software, and that includes trading simulation software, is completely bug-free.
This is true even for reputable vendors with great products.
Other problems pertain to the assumptions made regarding ambiguous situations in which any of several orders could be executed in any of several sequences during a bar.
Some of these items, e.g., the so-called bouncing tick (Ruggiero, 1998), can make it seem like the best system ever had been discovered when, in fact, it could bankrupt any trader.

WHAT OPTIMIZERS DO

Optimizers exist to find the best possible solution to a problem.
What is meant by the best possible solution to a problem? Before attempting to define that phrase, let us first consider what constitutes a solution.
In trading, a solution is a particular set of trading rules and perhaps system parameters, All trading systems have at least two males (an entry rule and an exit rule), and most have one or more parameters. Rules express the logic of the trading system, and generally appear as “if-then” clauses in whatever language the trading system has been written. Parameters determine the behavior of the logic expressed in the rules;
they can include lengths of moving averages, connection weights in neural networks, thresholds used in comparisons, values that determine placements for stops and profit
targets, and other similar items. The simple moving-average crossover system, used in the previous chapter to illustrate various trading simulators, had two rules:
one for the buy
order and one for the sell order. It also had a single parameter, the length of the moving average. Rules and parameters completely define a trading system
and determine its performance.

HOW OPTIMIZERS ARE USED

Optimizers are wonderful tools that can be used in a myriad of ways. They help shape the aircraft we fly, design the cars we drive, and even select delivery routes for our mail.
Traders sometimes use optimizers to discover rule combinations that trade profitably. In Part II, we will demonstrate how a genetic optimizer can evolve profitable rule-based entry models. More commonly, traders call upon optimizers to determine the most appropriate values for system parameters; almost any kind
of optimizer, except perhaps an analytic optimizer, may be employed for this purpose.
Various kinds of optimizers, including powerful genetic algorithms, are
effective for training or evolving neural or fuzzy logic networks. Asset allocation problems yield to appropriate optimization strategies. Sometimes it seems as if the only limit on how optimizers may be employed is the user’s imagination, and therein lies a danger: It is easy to be seduced into “optimizer abuse” by the great and alluring power of this tool.

The correct and incorrect applications of optimizers are discussed later in this chapter.

TYPES OF OPTIMIZERS

There are many kinds of optimizers, each with its own special strengths and weak nesses, advantages and disadvantages.

Optimizers can be classified along such dimensions as human versus machine, complex versus simple, special purpose versus general purpose, and analytic versus stochastic.

All optimizers-regardless of kind, efficiency, or reliability-execute a search for the best of many potential solutions to a formally specified problem.


implicit Optimizers

A mouse cannot be used to click on a button that says “optimize.” There is no special command to enter. In fact, there is no special software or even machine insight. Does this mean there is no optimizer? No. Even when there is no optimizer apparent, and it seems as though no optimization is going on, there is. It is known as implicit optimization and works as follows: The trader tests a set of rules based upon some ideas regarding the market. Performance of the system is poor, and so the trader reworks the ideas, modifies the system’s rules, and runs another simulation Better performance is observed. The trader repeats this process a few times, each time making changes based on what has been learned along the way.
Eventually, the trader builds a system worthy of being traded with real money. Was this system an optimized one? Since no parameters were ever explicitly adjusted and no rules were ever rearranged by the software, it appears as if the trader has succeeded in creating an unoptimized system.

However, more than one solution from a set of many possible solutions was tested and the best solution was selected for use in trading or further study.

Brute Force Optimizers

A brute force optimizer searches for the best possible solution by systematically testing all potential solutions, i.e., all definable combinations of rules, parameters, or both. Because every possible combination must be tested, brute force optimization can be very slow. Lack of speed becomes a serious issue as the number of combinations to be examined grows. Consequently, brute force optimization is subject to the law of “combinatorial explosion.” Just how slow is brute force optimization?
Consider a case where there are four parameters to optimize and whereeach parameter can take on any of 50 values. Brute force optimization would require that 504 (about 6 million) tests or simulations be conducted before the optimal parameter set could be determined: if one simulation was executed every 1.62 seconds (typical for Trade Station), the optimization process would take about 4 months to complete. This approach is not very practical, especially when many systems need to be tested and optimized, when there are many parameters, when the parameters can take on many values, or when you have a life.

Genetic Optimizers

Imagine something powerful enough to solve all the problems inherent in the creation of a human being. That something surely represents the ultimate in problem solving and optimization. What is it? It is the familiar process of evolution.
Genetic optimizers endeavor to harness some of that incredible problem-
solving power through a crude simulation of the evolutionary process. In
terms of overall performance and the variety of problems that may be solved, there is no general-purpose optimizer more powerful than a properly crafted genetic one.


Genetic optimizers are Stochastic optimizers in the sense that they take advantage of random chance in their operation. It may not seem believable that tossing dice can be a great way to solve problems, but, done correctly, it can be!
In addition to randomness, genetic optimizers employ selection and recombination.

The clever integration of random chance, selection, and recombination is responsible for the genetic optimizer’s great power. A full discussion of genetic algorithms, which are the basis for genetic optimizers, appears in Part II.
Genetic optimizers have many highly desirable characteristics.

Optimization by Simulated Annealing



Optimizers based on annealing mimic the thermodynamic process by which liquids freeze and metals anneal. Starting out at a high temperature, the atoms of a liquid or molten metal bounce rapidly about in a random fashion.

Slowly cooled, they mange themselves into an orderly configuration-a crystal-that represents a minimal energy state for the system.

Simulated in software, this thermodynamic process readily solves large-scale optimization problems.



As with genetic optimization, optimization by simulated annealing is a very powerful Stochastic technique, modeled upon a natural phenomenon, that can find globally optimal solutions and handle ill-behaved fitness functions.
Simulated annealing has effectively solved significant combinatorial problems, including the famous “traveling salesman problem,” and the problem of how best to arrange the millions of circuit elements found on modem integrated circuit chips, such as those that power computers. Methods based on simulated annealing should not be construed as limited to combinatorial optimization; they can readily be adapted to the optimization of real-valued parameters.
Consequently, optimizers based on simulated annealing are applicable to a wide variety of problems, including those faced by traders.

Analytic Optimizers

Analysis (as in “real analysis” or “complex analysis”) is an extension of classical college calculus. Analytic optimizers involve the well-developed machinery of analysis, specifically differential calculus and the study of analytic functions, in the solution of practical problems.


In some instances, analytic methods can yield a direct (no iterative) solution to an optimization problem.

This happens to be the case for multiple regression, where solutions can be obtained with a few matrix calculations. In multiple regression, the goal is to find a set of regression weights that minimize the sum of the squared prediction errors. In other cases, iterative techniques must be used.


The connection weights in a neural network,
for example, cannot be directly determined. They must be estimated using an iterative procedure,
such as back-propagation.


Many iterative techniques used to solve multivariate optimization problems (those involving several variables or parameters) employ some variation on the theme of steepest ascent. In its most basic form, optimization by steepest ascent works as follows: A point in the domain of the fitness function (that is, a set of parameter values) is chosen by some means. The gradient vector at that point is evaluated by computing the derivatives of the fitness function with respect to each of the variables or parameters; this defines the direction in dimensional parameter space for which a fixed amount of movement will produce the greatest increase in fitness.
A small step is taken up the hill in fitness space, along the direction of the gradient.
The gradient is then recomputed at this new point, and another, perhaps smaller, step is taken. The process is repeated until convergence occurs.

Monday, 11 August 2008

Linear Programming




The techniques of linear programming are designed for optimization problems involving linear cost or fitness functions, and linear constraints on the parameters or input variables.
Linear programming is typically used to solve resource allocation problems.
In the world of trading, one use of linear programming might be to allocate capital among a set of investments to maximize net profit.
If risk adjusted profit is to be optimized, linear programming methods cannot be used:
Risk-adjusted profit is not a linear function of the amount of capital allocated to each of the investments; in such instances, other techniques (e.g., genetic algorithms) must be employed.

HOW TO FAIL WITH OPTIMIZATION

Most traders do not seek failure, at least not consciously.
However, knowledge of the way failure is achieved can be of great benefit when seeking to avoid it.


Failure with an optimizer is easy to accomplish by following a few key rules. First, be sure to use a small data sample when running sirindations: The smaller the sample, the greater the likelihood it will poorly represent the data on which the trading model will actually be traded. Next, make sure the trading system has a large number of parameters and rules to optimize: For a given data sample, the greater the number
of variables that must be estimated, the easier it will be to obtain spurious results.
It would also be beneficial to employ only a single sample on which to run tests; annoying out-of-sample data sets have no place in the rose-colored world of the ardent loser.

Finally, do avoid the headache of inferential statistics. Follow these
rules and failure is guaranteed.
What shape will failure take? Most likely, system performance will look
great in tests, but terrible in real-time trading. Neural network developers call this phenomenon “poor generalization”; traders are acquainted with it through the experience of margin calls and a serious loss of trading capital.



Large Parameter Sets

An excessive number of free parameters or rules will impact an optimization effort in a manner similar to an insufficient number of data points.

As the number of elements undergoing optimization rises, a model’s ability to capitalize on idiosyncrasies in the development sample increases along with the proportion of the model’s fitness that can be attributed to mathematical artifact. The result of optimizing a large number of variables-whether rules, parameters, or both-will be a model that performs well on the development data, but poorly on out-of-sample test data and in actual trading.

It is not the absolute number of free parameters that should be of concern,
but the number of parameters relative to the number of data points.

The shrinkage formula discussed in the context of small samples is also heuristically relevant here: It illustrates how the relationship between the number of data points and the number of parameters affects the outcome.


HOW TO SUCCEED WITH OPTIMIZATION

Four steps can be taken to avoid failure and increase the odds of achieving successful optimization.

As a first step, optimize on the largest possible representative sample and make sure many simulated trades are available for analysis.
The second step is to keep the number of free parameters or rules small, especially in relation to sample size.


A third step involves running tests on out-of-sample data, that is, data not used or even seen during the optimization process.
As a fourth and final step, it may be worthwhile to statistically assess the results.