Wednesday 13 August 2008

DATA QUALITY

Data quality varies from excellent to awful. Since bad data can wreak havoc with all forms of analysis, lead to misleading results, and waste precious time, the best data that can be found when running tests and trading simulations.
Some forecasting models, including those based on neural networks, can be exceedingly sensitive to a few errant data points; in such cases, the need for clean, error-free data is extremely important. Time spent finding good data, and then giving it a final scrubbing, is time well spent.
Data errors take many forms, some more innocuous than others. In real-time trading, for example, ticks are occasionally received that have extremely deviant, if not obviously impossible, prices.

The S&P 500 may appear to be trading at 952.00 one moment and at 250.50 the next! Is this the ultimate market crash?
No-a few seconds later, another tick will come along, indicating the S&P 500 is again trading at 952.00 or thereabouts.

What happened? A bad tick, a “noise spike,” occurred in the data.
This kind of data error, if not detected and eliminated, can skew the results produced by almost any mechanical trading model.

No comments: