Foreign Exchange Rate Forecasting With Neural Nets

Foreign Exchange Rate Forecasting With Artificial Neural Networks

Source: Journal of Technical Analysis, by CMT Association

This research evaluates the utility of Technical Analysis (TA) incorporated with Machine Learning (ML). In application, Artificial Neural Networks (ANNs) equipped with technical indicators as input variables are utilized to forecast the weekly logarithmic returns of the EUR/USD over a short-term forecast horizon of one period.

LEARNING OBJECTIVES

TA and its sub-section of technical indicators are gathered to assemble an extensive candidate dataset featuring standard and advanced technical indicators measuring all ‘technical’ aspects of the EUR/USD time series.
The empirical results demonstrate that ANNs equipped with technical indicators might be questionable FX forecasting models.
Robust IVS has shown that MAs are descriptive variables of FX data, and modelers should perhaps limit candidate datasets to MAs and reduce the often-time-consuming stage of dataset assembly.
The outperformance of the MACD, a simple linear model, versus a variety of computationally intelligent nonlinear models, reinforces the view that simple models are often superior to complex ones.

Abstract

This research evaluates the utility of Technical Analysis (TA) incorporated with Machine Learning (ML). In application, Artificial Neural Networks (ANNs) equipped with technical indicators as input variables are utilized to forecast the weekly logarithmic returns of the EUR/USD over a short-term forecast horizon of one period. In doing so, TA and its sub-section of technical indicators are gathered to assemble an extensive candidate dataset featuring standard and advanced technical indicators measuring all ‘technical’ aspects of the EUR/USD time series. The candidate dataset is then subjected to a two-stage Input Variable Selection (IVS) procedure, producing an informative subset of technical indicators to serve as input variables to the ANNs. A variety of ANNs is then trained and tested on the EUR/USD time series data with their performance evaluated over a 25-year sample period (1994 to 2019), reserving the last 5-years for out-of-sample testing. Two benchmark models are applied: the Moving Average Convergence Divergence (MACD) model and a Support Vector Regression (SVR) model fitted with relevant currency index data. The empirical results demonstrate the MACD is the superior forecasting model across most out-of-sample forecast evaluation metrics.

Introduction

Background & Motivation

Technical Analysis (TA) has evolved significantly since its inception. Manual calculation and plotting of charts and indicators were replaced by computerized analysis during the Information Age of the mid-20th century. That era allowed for the rapid growth of TA, offering revolutionary technical indicators such as the RSI, MACD, and Bollinger Bands, all applicable at the click of a mouse. During that era, a new class of models were developed, known as Artificial Neural Networks (ANNs). Like the above-mentioned technical indicators, ANNs were also regarded as revolutionary, offering non-linear capabilities, universal approximation, and powerful computational properties. Initial ANNs such as the Multi-Layer Perceptron (MLP) demonstrated potential but were hindered by the limited computing power of the 1980-90s, causing researchers to lose interest. However, with the exponential increase in computing power from the 1990s onwards, ANNs once again caught the attention of researchers. It was not long before practitioners acknowledged the potential for merging TA with ML methods.

This merger of TA and ML is advocated by David Aronson, CMT: "The TA practitioner who understands data-mining methods and the important issues in indicator design will be well- positioned to play this crucial role in the human-machine partnership of twenty-first century technical analysis…Given that human intelligence is essentially unchanging, but computer intelligence is increasing at an exponential rate, no other approach to TA makes sense". Today, ML methods are fast becoming mainstream, with Chui et al. and the Boston Consulting Group estimating that by the year 2025 the field of wealth management will be dominated by the interaction of Artificial Intelligence (AI) and big data analytics. Therefore, the motivation underpinning this research is to merge and evaluate TA with ML methods to continue the evolution of TA and apply it to modern-day financial market forecasting.

Outline & Contribution

This research contributes to the discipline of TA and the body of knowledge by expanding on prior research in two areas: 1) focusing on the data collection stage by assembling a comprehensive candidate dataset of technical indicators, and 2) utilizing a broader set of forecast evaluation metrics. Point 1 is central to this research as it concerns TA. A literature review of prior TA-ANN based research reveals a notable deficiency in the data collection stage by failing to assemble a comprehensive candidate dataset of technical indicators. This deficiency is compounded by candidate datasets not featuring technical indicators measuring all ‘technical’ aspects of a time series such as trend, momentum, support & resistance, cycles, and volatility. It is argued that neglecting to assemble a comprehensive candidate dataset that measures all technical aspects of a time series is not only unrepresentative of TA and its subsection of technical indicators, but also depriving an ANN of potentially informative input data therefore likely hindering its performance. In solution, this research follows the data preparation scheme devised by Yu et al. with emphasis on the data collection stage.

Regarding point 2, prior research tends to employ limited sets of forecast evaluation metrics that, although valid, do not yield informative insight into model performance. In solution, this research utilizes a comprehensive set of forecast evaluation metrics. In total, a set of ten metrics are used to evaluate forecast performance, namely: Mean Error (ME), Mean Square Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Percent Error (MPE), Mean Absolute Percent Error (MAPE), Theils-U1, Theils-U2, Pesaran-Timmerman (PT), and Diebold-Mariano (DM). It is intended that the comprehensive candidate dataset and broader set of forecast evaluation metrics contribute in a meaningful way to the discipline of TA and the existing body of knowledge.

Artificial Neural Networks (ANNs)

ANNs were introduced by McCulloch & Pitts and are a class of nonlinear, Computationally Intelligent (CI) models inspired by the structure and functioning of biological neurons. Today, ANNs are a common technology applied across numerous domains, including financial market forecasting and are considered the invention that will have the greatest impact on our lives. An ANN relates a set of input variables {Xi}, i = 1…k to a set of one or more output variables {YJ}, J = 1…k. The network architecture determines the relationship between X and Y, and it is the architecture that differentiates ANNs from other data analysis models. A standard ANN has a minimum of three layers, see figure 1:

Figure 1: ANN Process (One Hidden Layer)

The first layer is termed the input layer, in which the number of nodes corresponds to the number of input variables, X. The third layer is called the output layer, in which the number of nodes corresponds to the number of output variables. The second layer, known as the hidden layer, is the crucial layer that separates the input layer from the output layer in which input variables are squashed or transformed by an activation function – usually a logistic or log sigmoid transformation. The number of nodes in the hidden layer also defines the complexity the ANN is capable of fitting. While the addition of this hidden layer approach may seem complicated, it represents an efficient and effective way to model a nonlinear statistical process. ANNs consist of a variety of architectures, each with unique processes and functions designed for specific tasks such as forecasting and pattern recognition. The advantages of ANNs over traditional forecasting models is found in their nonlinear and adaptive nature, coupled with a variety of architectures and the ability to accept any explanatory variables as inputs. Furthermore, ANNs are universal function approximators, capable of capturing any type of complex relationship. Since the number of possible nonlinear relationships in Foreign Exchange (FX) data is large, ANNs are well-suited to approximate them. For a description of all featured ANNs, see Appendix A.

Historical Data

This research adheres to Dow Theory and its principle of ‘closing prices only’ to represent the EUR/USD time series as it is widely accepted that the closing price represents investor sentiment of fair value for the period. The historical data was sourced from www.investing.com which sources its data directly from the New York Stock Exchange (NYSE). The sample period consists of 25-years of historical data (1994-2019) at weekly frequency. The choice of weekly data, opposed to the frequently researched daily data, is to steer the research away from day trading and promote a week-end forecast to gear trading towards.

Descriptive Statistics – Weekly Closing Price Data

The following table presents descriptive statistics regarding the EUR/USD weekly closing prices over the sample period:

Table 1: EUR/USD Closing Price Descriptive Statistics

From Table 1, the Mann-Kendal trend test confirms the presence of a monotonic trend in the EUR-USD weekly time series data. The Jarque-Bera test for normality finds the time series to be not normally distributed, and the augmented Dickey-Fuller test for stationarity categorizes the time series as non-stationary. All test results are typical of exchange rate data and financial time series data in general. Additionally, the test of fractal dimension generated the largest value of 2.4, indicating the time-series requires a minimum of 2 explanatory variables for modelling purposes.

Data Transform

As required in financial modelling, the time series must be stationary. To achieve stationarity, the log-returns data transform is applied to the EUR/USD weekly closing price data. The data transform is performed in MS Excel using the formula:

Descriptive Statistics – Weekly Log-Returns Data

The following table presents descriptive statistics regarding the EUR/USD log-returns time series.

Table 2: EUR/USD Log Returns Descriptive Statistics

From Table 2, the Mann-Kendal trend test confirms the presence of a monotonic trend. This result is surprising as one of the properties of the log-returns data transform is the removal of the trend from the time series. The Jarque-Bera test for normality found the time series to be not normally distributed and the augmented Dickey-Fuller test for stationarity categorizes the time series as stationary. The test results are typical of log-returns financial time series data. The test of fractal dimension generated the largest value of 2.4, equaling the value of the EUR/USD in closing-price format. Therefore, the EUR/USD log-returns time series also requires a minimum number of two explanatory variables for modelling purposes.

Candidate Dataset

Only technical indicators utilizing closing price data in their calculation are included in the candidate dataset. A total of 2,360 technical indicators measures all technical aspects of the EUR/USD time series: trend, trend strength, momentum, support & resistance, cycles, and volatility. Advanced variations of specific technical indicators are also featured. For example, technical indicators such as the MACD and RSI are fitted with filters allowing them to dynamically adapt to the changing rate of the time series. Zero-lag variations are included to minimize the inherent problem of lag, as well as causal and non-causal technical indicators.

Input Variable Selection (IVS)

The candidate dataset is initially screened for constant, quasi-constant, and duplicate variables of which there were none. A two-stage IVS procedure is then applied in-sample. Stage one IVS applies Pearson’s correlation coefficient, retaining only the variables with a correlation of ≥ ±0.70, reducing the candidate dataset by 2,063 variables, from 2,360 to 297 highly correlated variables. Stage two IVS applies Stepwise Multivariate Linear Regression performed in accordance with the Handbook of Univariate and Multivariate Data Analysis with IBM SPSS. Following implementation, the remaining 297 variables were reduced to an informative subset of 2 variables. The technical indicators are a variation of the same technical indicator, the ‘MESA Smooth’ devised by John F. Ehlers in Cybernetic Analysis for Stocks and Futures in which Ehlers applies Maximum Entropy Spectral Analysis (MESA) to financial market data. Ehlers describes the ‘MESA Smooth’ as an ‘’adaptive 2 Pole Super Smoother Filter that is tuned to a fraction of the MESA-measured Dominant Cycle.’’ The MESA Smooth models, hereafter abbreviated as S(0.1) and S(0.3), are both free from multicollinearity with condition index values of 1.33 and 3.5 respectively, and both with variance inflation factor values of 3.59. Both technical indicators make a statistically significant unique contribution at the 99% level. Of the two technical indicators, S(0.1) contributes most to the model with a standardized beta coefficient of 0.821 versus 0.203 of S(0.3).

Artificial Neural Networks

The featured ANNs consist of both static and temporal networks. The original ANNs, such as the MLP, are static. Whereas the later developed temporal ANNs feature a memory component and are designed explicitly for time-series forecasting. NeuroSolutions Pro is utilized to build, train, and test the following ANNs:

Multilayer Perceptron (MLP)
Generalized Feed-Forward (GFF)
Radial Basis Function (RBF)
Co-Active Neural Fuzzy Inference System (CANFIS)

A total of 32 variation of each ANN are produced (1-4 hidden layers x 8 activation functions).

The only exception is the RBF network which by default can feature zero hidden layers, therefore equating to 40 models. Additionally, an ANN forecast combination model is included to evaluate if ANNs benefit from the properties of a non-biased combination forecast.

Number of Hidden Layers

The hidden layer is the feature that separates ANNs from other models. It is widely accepted, and supported by the general function approximation theorem that two hidden layers can approximate any complex nonlinear function and provide powerful computational properties. This evaluation will implement 1-4 hidden layers, satisfying the two-hidden layer recommendation and extending beyond it to take advantage of deep learning (≥2 hidden layers).

Activation Functions

All ANN architectures are reproduced featuring the following activation functions: Tanh Axon, Sigmoid Axon, Linear Tanh Axon, Linear Sigmoid Axon, SoftMax Axon, Bias Axon, Linear Axon, and Axon. For a description of each activation function, see Appendix B.

Learning Algorithm

All models feature the Levenberg-Marquardt (LM) learning algorithm. The LM algorithm is a higher-order adaptive algorithm known for minimizing MSE, training significantly faster than the commonly employed momentum learning algorithm, and typically arrives at a solution with a significantly lower error.

Processing Elements

To establish the optimum number of processing elements (PEs) in each hidden layer, the ‘vary a parameter’ function of NeuroSolutions Pro is employed which allows automated incremental testing to identify the optimal number of PEs defined in terms of lowest Mean Absolute Error (MAE). The optimum number of PEs to be determined is confined to a generous range of 1 to 10 with testing increments of 1. Therefore, a total of 10 PEs are tested for each hidden layer. During this phase, the network tests all PEs over three cycles, thus a total of 30 simulations (10 PEs x 3 cycles) are performed on each hidden layer. This equates to 30 simulations for ANNs featuring one hidden layer, and 60, 90, and 120 simulations for ANNs featuring 2, 3, and 4 hidden layers respectively.

Iterations

The number of iterations is set to 1000 and programmed to terminate after 100 iterations without improvement in MSE, thus implementing the early stopping procedure to guard against over-fitting.

Weights

Each model features batch weighting which updates the weights after the presentation of the entire training set.

Data Division

This evaluation features a 25-year sample period, (1994-2019) divided into a training, cross- validation, and testing ratio of 70:10:20 summarized in Table 3:

Table 3: Data Division of Sample Period

Best Performing ANNs

The best performing ANN from each class in terms of RMSE is selected to represent that class of ANN in the forecast evaluation results and compete with the benchmark models. The best performing ANNs from each class are:

Table 4: Best Performing ANNs

The featured CANFIS model consists of the Gaussian membership function and the TSK fuzzy model.

Benchmark Models

Two benchmark models are presented. The first is a MACD model with the following rationale: if ANNs equipped with informative technical indicators cannot outperform a commonly used technical indicator, there is no advantage to TA-ANN based forecasting. The second benchmark is an SVR model fitted with Euro Index and USD Index data. Like ANNs, an SVR model is computationally intelligent, but by comparison, has a simplistic architecture. The rationale is to circumvent the time-consuming and complex requirements of dataset assembly and IVS, evaluating if a simplistic CI model equipped with relevant currency indexes as input variables, can outperform the conventionally constructed ANNs. See Appendix D for details.

Empirical Results

Forecast Evaluation Metrics

A set of ten forecast evaluation metrics is employed to evaluate the forecast performance of the ANNs and the benchmark models. Regarding the interpretation of ME, MSE, RMSE, MAE, MPE, and MAPE, the metrics are biased, values are unbounded, can be positive or negative, with zero being the optimal value. Therefore, values towards zero indicate superior forecast accuracy. The Theils-U1 and Theils-U2 metrics are both unbiased metrics. For Theils-U1, values are bounded between 0 and 1, with values closer to zero indicating superior forecasting accuracy. For Theils-U2, values of <1 indicate superior forecasting ability over a Naïve forecast, values of 1 equal a Naïve forecast, and values >1 indicate the model has underperformed a Naïve forecast. For the PT metric, larger values indicate superior performance in forecasting directional change i.e. a positive or negative return for the period. For the full description, formula, and interpretation of the above statistics, see Appendix C.

In-Sample Forecast Evaluation

Table 5 presents the in-sample forecast evaluation results of the ANNs and benchmark models:

Table 5: In-Sample Forecast Evaluation Results

From Table 5, the in-sample results demonstrate that the SVR benchmark model is the superior forecasting model across most performance metrics. The SVR model obtained 6 of the 9 forecast evaluation metrics: MSE, RMSE, MAE, MPE, Theils-U1, and Theils-U2. The MACD benchmark model ranked a close second on the same metrics and secured the ME and PT metrics. The only metric the benchmark models did not secure is the MAPE, acquired by the MLP model. Amongst the ANNs, the MLP model is superior to other ANNs in terms of forecast error, securing the metrics of ME, MSE, RMSE, MAE, and MAPE. The ANN(FC) outperformed all other ANNs regarding the Theils-U1, Theils-U2, and PT metrics. All models accept the RBF network outperformed a Naïve forecast. To conclude the in-sample forecast results, the benchmark models are the superior forecasting models, with the SVR model dominating the in-sample performance metrics.

Out-of-Sample Forecast Evaluation

Specific to out-of-sample forecast evaluation is the Diebold-Mariano (DM) metric, which is applied to pairs of forecasts (ANN Vs Benchmark). A negative DM value indicates that the ANN model is more accurate than the benchmark model. The lower the negative DM value, the more accurate the ANN forecast. Conversely, positive DM values indicate the benchmark model is more accurate. Table 6 presents the out-of-sample forecast evaluation results of the various ANNs and benchmark models.

Table 6: Out-of-Sample Forecast Evaluation Results

From Table 6, out-of-sample performance is largely inconsistent with in-sample performance. The SVR benchmark model did not obtain any metrics, and largely relinquished its in-sample performance to the MACD benchmark model. The MACD model dominates the out-of-sample results, securing the: MSE, RMSE, MAE, Theils-U1, and PT metrics. Amongst the ANNs, results are mixed. The GFF model obtains the MSE and RMSE metrics, emerging as the best model in terms of minimized forecast error. The RBF model obtains the MAE, Theils-U1, and PT metrics, demonstrating that it is the most accurate of the ANNs and superior at forecasting directional change. Of the 5 ANNs, only the MLP, GFF, and ANN(FC) networks outperformed a Naïve forecast. Regarding the DM statistic specific to out-of-sample forecasts, when paired with the SVR benchmark, all ANN models generated negative values indicating they are superior forecasting models. This is perhaps not surprising given the SVR models inadequate out-of- sample performance. However, when paired with the MACD benchmark model, no ANNs generated negative values, indicating the MACD is a superior forecasting model. To conclude the out-of-sample results, the MACD benchmark model dominates the performance metrics, obtaining 5 of the 10 forecasting evaluation metrics including the key metrics of RMSE, Theils- U1, and PT.

Point Forecast

Table 7 displays the point forecasts, point forecast errors, and rank. The actual value one week in advance is -0.3400.

Table 7: Point Forecasts Results

From Table 7, the results show that the RBF model generated the most accurate point forecast with a forecast of -0.1073 compared with the actual value of -0.3400. The ANN(FC) ranked a reasonable third, supporting the view that an aggregate model can improve forecast performance. However, despite the SVR and MACD benchmark models dominating the in- sample and out-of-sample results, both ranked poorly on the point forecast at 4/7 and 6/7 respectively, and both failing to generate the correct directional sign.

Discussion

This research evaluated the use of ANNs applied to forecast the weekly logarithmic returns of the EUR/USD exchange rate over a 1-period forecast horizon. TA was incorporated into the research by compiling a comprehensive candidate dataset of 2,360 technical indicators measuring all technical aspects of the EUR/USD time series. The candidate dataset was then data-mine to extract an informative subset of technical indicators to serve as input variables to the ANNs. The two-stage IVS procedure selected two variations of the same technical indicator, the ‘MESA Smooth’, essentially a MA with superior smoothing ability. This finding suggests that candidate datasets of various MAs may be more suitable to FX time-series forecasting than candidate datasets compiled of the full spectrum of technical indicators.

This suggestion is further compounded by the fact that of the 297 technical indicators subjected to the final stage of IVS, of the 15 suggested sets it generated, all 15 largely consisted of MAs. Integrated with informative technical indicators as input variables, a variety of ANNs were then trained and tested on the EUR/USD time series over a 25-year sample period (1994-2019), reserving the last five-years for out-of-sample testing. The ANNs forecasting performance is mixed, both in-sample and out-of-sample. The ANN(FC) model emerged as a notable and consistent model by securing the in-sample: MSE, RMSE, and MAE metrics and MPE and MAPE metrics out-of-sample. The ANN(FC) model also ranked a reasonable 3/7 on the point forecast and generated the correct directional sign. The ANN(FC) performance reinforces the view that a combination forecast does offer increased accuracy and stability. However, the empirical results found that the MACD and SVR benchmark models dominated both the in-sample and out-of-sample periods, clearly demonstrating their superior forecasting performance over the ANNs and ANN(FC) model. This is especially true regarding the key performance metrics of RMSE, Theils-U1, Theils-U2, and DM. The SVR model performed exceptionally well during the in-sample period, but the MACD was the superior model in the crucial out-of-sample testing period, outperforming all ANNs by obtaining the: MSE, RMSE, MAE, Theils-U1, PT, and DM metrics. The superior performance of the benchmark models questions the utility of ANNs equipped with technical indicators applied to FX forecasting.

Though, despite its superior performance across the out-of-sample period, the MACD ranked poorly on the all-important point-forecast, with the MACD ranking 6/7 and failing to generate the correct directional sign.

Conclusion

In conclusion, the empirical results have demonstrated that ANNs equipped with technical indicators might be questionable FX forecasting models. If implemented, it is advised to form an ANN(FC) model to increase forecast accuracy and stability. Robust IVS has shown that MAs are descriptive variables of FX data, and modellers should perhaps limit candidate datasets to MAs and reduce the often-time-consuming stage of dataset assembly. Lastly, the outperformance of the MACD, a simple linear model, versus a variety of computationally intelligent nonlinear models, reinforces the view that simple models are often superior to complex ones.

Appendix A

Description of Artificial Neural Networks

Multilayer Perceptron (MLP)

The feed-forward ANN coupled with the log sigmoid activation function is also known as the multilayer perceptron of MLP network. Figure 2 depicts a feed-forward ANN with one hidden layer featuring the Levenberg Marquart (LM) learning algorithm and Tanh Axon activation function, generated by the ANN software NeuroSolutions Pro:

Figure 2: Feed-Forward ANN (One Hidden Layer)

Thus far only ANN architectures with one hidden layer have been presented. It is, of course, possible to feature more than one hidden layer as found in multilayer feedforward networks. For a problem displaying increased complexity, it can be approached by increasing the number of hidden layers to two or more. Feed-forward networks with multiple hidden layers increase model complexity and do so at the cost of additional parameters to estimate, which in turn reduces degrees of freedom if the sample size is limited and increases training time. The additional parameters also introduce the likelihood that the parameter estimates may converge to a local rather than a global optimum.

Generalized Feedforward (GFF)

A generalized feed-forward (GFF) network is a variation of the feed-forward network in that it features jump connections. These jump connections allow for the inputs x to have direct linear links to the output y in addition to the output via the normal route of the hidden layer. Figure 3 depicts a GFF network featuring one hidden layer:

Figure 3: Generalized Feed-Forward ANN (One Hidden Layer)

An advantage of the GFF network is that it embeds the pure linear model in addition to the feed-forward ANN, allowing for the possibility that a nonlinear function may also have a linear component. If the underlying relationship between the inputs and output is a pure linear relationship, then only the direct jump connectors will be significant within the architecture.

However, if the underlying relationship between the inputs and output is a complex nonlinear relationship, then the jump connectors serve an insignificant role. The GFF network also allows for the relationship between input variables and the output to be decomposed into linear and nonlinear components. In practice, GFF networks often solve problems more efficiently than other ANNs.

Radial Basis Function

The motivation behind the introduction of the RBF was to improve the accuracy of MLPs while decreasing training time and network complexity. An RBF network utilizes a radial basis or Gaussian density function as the activation function. Furthermore, the structure of the RBF is different from that of the MLP in that input neurons can be a linear combination of regressors like other networks. However, there is only one input signal and one set of coefficients of the input variables. The signal from the input layer is the same overall neurons which are Gaussian transformations, around k different means, of the input signals. Therefore, the input signals have different centers for the radial bases or normal distributions. The differing Gaussian transformations are combined in a linear function for forecasting the output.

Figure 4: Radial Basis Function (RBF) ANN

In application, RBF networks can improve accuracy and reduce training time due to their simpler and straightforward training phase. RBF networks are efficiently designed, have good generalization ability, and show tolerance to input noise. A disadvantage of the RBF network is finding the optimal parameter values for its RBF functions and their optimal number.

Co-Active Neuro-Fuzzy Inference System (CANFIS)

The Co-Active Neuro-Fuzzy Inference Systems (CANFIS) integrates fuzzy inputs with an ANN to solve poorly defined problems. Fuzzy inference systems are valuable as they combine the explanatory nature of rules (membership functions) with the power of ANNs. The CANFIS powerful capability stems from pattern-dependent weights between the consequent layer and the fuzzy association layer. A disadvantage of a CANFIS is that it should not be used to predict values outside the extreme contained in the learning dataset. Another disadvantage is that sufficient dataset volume is required to build the model. As such it is not capable of direct prediction for sites which have a lack of archived observations. The CANFIS model is also more computationally intensive than most other models.

Figure 5: Co-Active Neural Fuzzy Inference System (CANFIS) model

ANN Forecast Combination, ANN(FC), Simple Average Method

The simple average forecast combination is a basic but versatile forecasting technique that can be applied to most forecasting models to enhance forecasting robustness and accuracy as well as serving as an unbiased benchmark. For example given a set of five ANN forecasts; the combination forecast at time t is calculated as:

Appendix B

Activation Functions Description

Axon

The Axon performs an identity map between its input and output activity. The Axon is the first member of the Axon family, and all subsequent axon members subclass its functionality. Furthermore, each subclass will use the above icon with a graph of their activation function superimposed on top.

Bias Axon

The Bias Axon provides a bias term. Most nonlinear axons are subclasses of this component to inherit this bias characteristic. The Weights access point of the Bias Axon allows access to the Bias vector.

Linear Axon

The Linear Axon implements a linear axon with slope and offset control. It is, therefore, more powerful than the Bias Axon because it performs an affine transform. The bias is inherited from the Bias Axon, and the slope is controlled by an additional parameter β, which is not adaptive.

Linear Sigmoid Axon

The Linear Sigmoid Axon substitutes the intermediate portion of the sigmoid by a line of slope β, making it a piecewise linear approximation of the sigmoid. This component is more computationally efficient than the Sigmoid Axon (it is much easier to compute the map).

Linear Tanh Axon

The Linear Tanh Axon substitutes the intermediate portion of the Tanh by a line of slope β, making it a piecewise linear approximation of the Tanh. This component is more computationally efficient than the Tanh Axon (it is much easier to compute the map). The Weights access point of the Linear Tanh Axon provides access to the Bias vector (ωi).

Sigmoid Axon

The Sigmoid Axon applies a scaled and biased sigmoid function to each neuron in the layer. The scaling factor and bias are inherited from the Linear Axon. The range of values for each neuron in the layer is between 0 and 1. Such nonlinear elements provide a network with the ability to make soft decisions. The Weights access point of the Sigmoid Axon allows access to the Bias vector (ωi).

SoftMax Axon

The SoftMax Axon is a component used to interpret the output of the neural net as a probability. For a set of numbers to constitute a probability density function, their sum must equal one. Often the output of an ANN produces a similarity measure. To convert this similarity measure to a probability, the SoftMax Axon is used at the output of the network. The Weights access point of the SoftMax Axon provides access to the Bias vector (ωi).

Tanh Axon

The Tanh Axon applies a bias and Tanh function to each neuron in the layer. This will squash the range of each neuron in the layer to between -1 and 1. Such nonlinear elements provide a network with the ability to make soft decisions. The Weights access point of the Tanh Axon allows access to the Bias vector (ωi).

Appendix C

Forecast Evaluation Metrics

Forecast Accuracy

For the following statistics, denoting the series of interest as yt and a forecast of it as ft, the resulting forecast error is given as et = (yt – ft), for t = 1...T. Per this notation, the set of forecast evaluation statistics are presented below:

Theils-U1

The Theils-U1 statistic, U meaning unbiased, is bounded between 0 and 1, with values closer to 0 indicating greater forecasting accuracy. The Theils-U1 measure was applied by comparing the ANN forecasts to that of a Random Walk. Denoting the series of interest as yt and a forecast as ft, the resulting forecast error is given as et = (yt – ft), for t = 1...T. Using this notation, Theils-U1 is presented as:

Theils-U2

A value less than 1 for Theils-U2 confirms the superiority of the competing forecast, while a value greater than 1 shows a higher accuracy for the benchmark forecast. Denoting the series of interest as yt and the forecast as ft, the resulting forecast error is given as et = (yt – ft), for t = 1...T. Using this notation, Theils-U1 is presented as:

Pesaran & Timmermann (1992)

The Pesaran & Timmermann (1992) test, abbreviated as PT, is a non-parametric test to examine the ability of a forecast to predict the direction of change in a time series. Denoting the time series as yt and its forecast xt the PT test is defined as:

Diebold-Mariano

Appendix D

Benchmark Models

Moving Average Convergence Divergence (MACD)

The performance of the ANN models is benchmarked against a Moving Average Convergence Divergence (MACD) model. The MACD is a practical benchmark because if the ANNs equipped with informative technical indicators cannot outperform a common standalone technical indicator, then ANN modelling is costly and ineffective. The MACD is a well- established technical indicator invented in the late 1970s by Gerald Appel and showcased in his book; Technical Analysis, Power Tools for Active Investors. Technically, the MACD is an oscillator that measures trend and momentum and is comparable to the dual MA crossover strategy. Although observable as the interplay of two Exponential Moving Averages (EMAs), the MACD is calculated using three EMAs:

MACD Line = 12 period EMA – 26 period EMA
Signal Line = 9 period EMA of MACD Line

The MACD line is a 12 period EMA minus a 26 period EMA. The signal line is a 9 period EMA of the MACD line and is plotted alongside the MACD line. Appel advises the use of the 12, 26, 9 period parameters and the use of EMAs as they are more responsive compared to simple moving averages (SMAs). It is the interplay between the MACD line and the signal line in the form of crossovers and divergences that generates the trading signals. A buy signal is generated when the signal line rises above the MACD line. Conversely, a sell signal is generated when the signal line falls below the MACD line.

Support Vector Regression (SVR)

Support Vector Regression (SVR), is a robust technique for constructing data-driven, non- linear regression models and is commonly used in financial market and macroeconomic applications and hold advantages such as global solutions, insusceptibility to local minima and equilibrium between model accuracy and model complexity. An SVR is specified as:

Where ω and b are the regression parameter vectors of the function and φ(x) is the non-linear function that maps the input data vector x into a feature space where the training data exhibit linearity. Figure 7 is an example of the SVR generated in Neuro Solutions Pro: notice that in comparison to the ANNs, it has a simpler structure.

Figure 6: Support Vector Regression (SVR) Model

Shared content and posted charts are intended to be used for informational and educational purposes only. The CMT Association does not offer, and this information shall not be understood or construed as, financial advice or investment recommendations. The information provided is not a substitute for advice from an investment professional. The CMT Association does not accept liability for any financial loss or damage our audience may incur.

About the Author | Ross Gordon, PhD, MFTA, MSTA

Ross Gordon holds a Ph.D. in Quantitative Finance from the Adam Smith Business School at the University of Glasgow. His thesis researched Machine Learning, utilizing Artificial Neural Networks (ANNs) applied to financial market and macroeconomic forecasting.

Ross is a certified technical analyst with the International Federation of Technical Analysts (IFTA), having acquired both the CFTe and MFTA designations. He is currently engaged in the Chartered Market Technician (CMT) program offered by the Chartered Market Technician's Association (CMTA) and is a member of the UK Society of Technical Analysts (STA). Ross utilizes Technical Analysis in its traditional manner as well as combine it with computationally intelligent models and AI.

Get on the path to success now!

Learn how the CMT program can prepare you for incredible career opportunities.

CLICK TO LEARN MORE

RANKED AMONG THE ELITE

Preeminent global designation for professional technical analysts

EARN YOUR CMT DESIGNATION AND START GETTING THE RESPECT AND EARNING POWER YOU DESERVE

CLICK TO GET STARTED

CONTENTS

Abstract

Introduction

Background & Motivation

Outline & Contribution

Artificial Neural Networks (ANNs)

Figure 1: ANN Process (One Hidden Layer)

Historical Data

Descriptive Statistics – Weekly Closing Price Data

Table 1: EUR/USD Closing Price Descriptive Statistics

Data Transform

Descriptive Statistics – Weekly Log-Returns Data

Table 2: EUR/USD Log Returns Descriptive Statistics

Candidate Dataset

Input Variable Selection (IVS)

Artificial Neural Networks

Number of Hidden Layers

Activation Functions

Learning Algorithm

Processing Elements

Iterations

Weights

Data Division

Table 3: Data Division of Sample Period

Best Performing ANNs

Table 4: Best Performing ANNs

Benchmark Models

Empirical Results

Forecast Evaluation Metrics

In-Sample Forecast Evaluation

Table 5: In-Sample Forecast Evaluation Results

Out-of-Sample Forecast Evaluation

Table 6: Out-of-Sample Forecast Evaluation Results

Point Forecast

Table 7: Point Forecasts Results

Discussion

Conclusion

Appendix A

Description of Artificial Neural Networks

Multilayer Perceptron (MLP)

Figure 2: Feed-Forward ANN (One Hidden Layer)

Generalized Feedforward (GFF)

Figure 3: Generalized Feed-Forward ANN (One Hidden Layer)

Radial Basis Function

Figure 4: Radial Basis Function (RBF) ANN

Co-Active Neuro-Fuzzy Inference System (CANFIS)

Figure 5: Co-Active Neural Fuzzy Inference System (CANFIS) model

ANN Forecast Combination, ANN(FC), Simple Average Method

Appendix B

Activation Functions Description

Axon

Bias Axon

Linear Axon

Linear Sigmoid Axon

Linear Tanh Axon

Sigmoid Axon

SoftMax Axon

Tanh Axon

Appendix C

Forecast Evaluation Metrics

Forecast Accuracy

Theils-U1

Theils-U2

Pesaran & Timmermann (1992)

Diebold-Mariano

Appendix D

Benchmark Models

Moving Average Convergence Divergence (MACD)

Support Vector Regression (SVR)

Figure 6: Support Vector Regression (SVR) Model

Get on the path to success now!

RANKED AMONG THE ELITE

Preeminent global designation for professional technical analysts

LISTEN TO FILL THE GAP