[Type here]

[Type here]

Bui Ngoc Duc

Abstract:

Keywords

Data mining, Game theory, policy making process, reinforcement learning

Algorithmic Trading: Game-theoretic and Simulation Approach to Reinforcement Learning bot

1

[Type here]

[Type here]

Bui Ngoc Duc

1. Chapter 1: Introduction

1.1. Problem statement

Trading stocks on the stock market is one of the major investment activities. In the past,

investors developed a number of stock analysis method that could help them predict the direction

of stock price movement. Modelling and predicting of equity future price, based on the current

financial information and news, is of enormous use to the investors. Investors want to know

whether some stock will rise or fall over certain period of time. In order to predict how some

company, in which investor want to invest, would perform in future, they developed a number of

analysis methods based on current and past financial data and other information about the

company. Financial balance sheets and various ratios that describe the health of company are the

bases of technical analysis that investors undertake to analyze and predict company’s future

stock prize. Predicting the direction of stock price is particularly important for value investing.

Experienced analysts could apply some mathematical models that are proven based on the

past data in order to evaluate company’s intrinsic value. However, markets do not remain stable

and indicators that have strong predictive value over one period may cease to generate excess

returns as soon as market conditions change. New investment strategies and new technology

were introduced, which made some of the old models obsolete. Since financial literacy became

higher, there are more market players than ever. Two measures have been proposed to counter

this evolving market behavior. First, some trading systems are based on genetic algorithms that

transform the indicators that are used as attributes over time [6] [28]. Second, more commonly,

the data set is fit to nonlinear models using machine learning algorithms such as Artificial Neural

Networks [10].

2

[Type here]

[Type here]

Bui Ngoc Duc

The introduction to algorithms in trading definitely changed the stock market. Algorithms

made it easy to react fast to certain events on the stock market. Machine learning algorithms also

enabled analysts to create models for predicting prices of stocks much easier. Introduction of

machine learning caused that new models can be developed based on the past data. The proof is

the AI fund have outperformed their peers while providing downside protection, according to

Eurekahedge’s report.

3

[Type here]

[Type here]

Bui Ngoc Duc

The table above is comparing AI funds to the average hedge fund and systematic

CTA/managed futures strategies, which can be considered the rough approximation for the

average quant fund. Source: Eurekahedge.

For the successful performance of AI fund, in this paper we will describe introduction to the

method for creating artificial agent trading on stock market using stock prices and through

several machine learning algorithms.

1.2. Objective of research

The monetary motivation behind the predictive value of buying and selling stocks at profitable

positions is a key driver of this research. Our main hypothesis was that by applying machine

learning and training it on the past data, it is possible to predict the movement of the stock price

through market’s patterns, then applying algorithms to create a profitable trading agent. We use

Profit and Loss (PnL) factor of agent through the test to justify the profitability of our agent. We

shall conduct some simulations to examine whether the agent is profitable under different data

set (seen and unseen) then calculate the average PnL of the agent.

1.3. Scope of the research

This thesis only provides elementary introduction approach to the algorithmic trading and game

theory approach as the frame work for market environment. The game environment is

uncomplicated when we assumed that others respond to our agent’s strategies indicate the stock

price movement. Moreover, the algorithms used to create and train the agent exploits from the

machine learning algorithms library called “Scikit-learn”, “Keras”. Nevertheless, exploited

algorithms and functions shall be explained in the Appendix of this thesis.

4

[Type here]

[Type here]

Bui Ngoc Duc

1.4. Overview

The thesis is organized in the following manner:

• Chapter 1 is stated the motivation for writing this thesis, the objectives and scope of the

research.

• In Chapter 2, we provide the background of Efficient Market Hypothesis (EMH) and

it’s contradicts, as well as relevant works for this topic.

• The game theoretical frame work background for describe the market, simulation

approach and algorithms are established in Chapter 3.

• Chapter 4 describes the methods of data collection as well as data processing,

implementation and simulation on different variable of model.

• Section 5 is the last section, we will discuss the final results of our agent, explain the

limitations of our research and state future improvement.

5

[Type here]

[Type here]

Bui Ngoc Duc

Chapter 2: literature review:

This section begins with a background to efficient markets and then gives a brief review of

previous empirical studies that use machine learning algorithms to construct trading strategies.

1.5. Efficient Markets

One of the strongest oppositions to the existence of profitable trading strategies is founded on the

ideas of Efficient Market Hypothesis (EMH). Since EMH implies that our search for

continuously profitably trading strategies is futile, we first give an overview of EMH and then

show the empirical results that contradict this theory.

EMH states that the current market price reflects the assimilation of all the information

available [13]. That is, its proponents argue that since the stocks always trade at their fair value

on stock exchanges, it is impossible to outperform the overall market through expert stock

selection or market timing. Any new information is quickly integrated into the market price.

Fama formalized the concept of efficient markets in 1970 by expressing the non-predictability of

market prices:

Where:

is the price of security j at time t;

is the one-period percentage return; and

is the information reflected at time t.

6

[Type here]

[Type here]

Bui Ngoc Duc

Based on this expectation expression, Fama argues that there is no possibility of finding

excess market returns via market timing based solely on information in , hence dispelling the

possibility of trading strategies based on technical indicators.

On the other hand, despite the theoretically sound nature of EMH, research over the last 30

years has shown that several assumptions made in EMH may be unrealistic. First, a fundamental

assumption is that investors behave rationally, or that the deviations of the many irrational

investors cancel out. However, some research has shown that investors are not strictly rational

[41], or devoid of biases [20]. Indeed, people with a conservatism bias tend to underweight new

information. Moreover, experiments have shown that these biases tend to be systematic and that

deviations do not cancel each other out [21]. This leads to over- and under-reaction to news

events.

From the 1990s, literature has seen the growing decline of the EMH and the emergence of

behavioral finance. Behavioral finance views the market as an aggregate of human actions filled

with imperfect and inefficient decisions. Under this theory, the financial markets are a reflection

of human desires, goals, motivations, errors and overconfidence [40]. An alternative to EMH that

has grown traction is the idea of the Adaptive Market Hypothesis, which posits that profit

opportunities from inefficiencies exist in finance markets but are eroded away as the knowledge

of the efficiency spreads throughout the public and the public capitalizes on the opportunities. By

this view of financial markets, many have built evolutionary and/or non-linear models and

demonstrated that excess returns can be attained on out-of-sample data.

7

[Type here]

[Type here]

Bui Ngoc Duc

1.6. Previous Research

Because of their ability to model nonlinear relationships without pre-specification during the

modeling process, neural networks (NNs) have become a popular method in financial time-series

forecasting. NNs also offer huge flexibility in the type of architecture of the model, in terms of

number of hidden nodes and layers. Indeed, Pekkaya and Hamzacebi compare the results from

using a linear regression versus a NN model to forecast macro variables and show that the NN

gives much better results [35].

Many studies have used NNs and shown promising results in the financial markets.

Grudnitski and Osburn implemented NNs to forecast S&P500 and Gold futures price directions

and found they were able to correctly prediction the direction of monthly price changes 75% and

61% respectively [15]. Another study showed that a NN-based model leads to higher arbitrage

profits compared to cost of carry models. Phua, Ming and Lin implement a NN using

Singapore’s stock market index and show a forecasting accuracy of 81% [36]. Similarly, NN

models applied to weekly forecasting of Germany’s FAZ index find favorable predictive results

compared to conventional statistical approaches [14].

More recently, NNs have been augmented or adapted to improve performance on financial

time series forecasting. Shaoo et al. show that cascaded functional link artificial neural networks

(CFLANN) perform the best in FX markets [39]. Egrioglu et al. introduce a new method based

on feed forward artificial neural networks to analyze multivariate high order fuzzy time series

forecasting models [12]. Liao and Wang used a stochastic time effective neural network model to

show predictive results on the global stock indices. Bildirici and Ersin combined NNs with

ARCH/GARCH and other volatility-based models to produce a model that out performed ANNs

or GARCH based models alone. Moreover, Yudong and Lenan used back-trial chemotaxis

8

[Type here]

[Type here]

Bui Ngoc Duc

optimization (BCO) and back-propagation NN on S&P500 index and conclude that their hybrid

model (IBCO-BP) offers less computational complexity, better prediction accuracy and less

training time.

Another popular machine learning classification technique that does not require any domain

knowledge or parameter setting is the decision tree. It also often offers a better visually

interpretable model compared to NN, as the nodes in the tree can be easily understood. The

simplest type of decision tree model is the classification and regression tree (CART). Sorensen et

al. show that CART decision trees perform better than single-factor models-based on the same

variables in picking stock portfolios [42]. Wang and Chan use a two-layer bias decision tree to

predict the daily stock prices of Microsoft, Intel and IBM, finding excess returns compared to a

buy and hold method [43]. Another study found that a boosted alternating decision tree with

expert weighing generated abnormal returns for the S&P500 index during the test period [11]. To

improve accuracy, some studies used the random forest algorithm for classification, which will

be further discussed in chapter 4. Namely, Booth et al. show that a regency-weighted ensemble

of random forests produced superior results when analyzed on a large sample of stocks from the

DAX in terms of both profitability and prediction accuracy compared with other ensemble

techniques [7]. Similarly, a gradient boosted random forest model applied to Singapore’s stock

market was able to generate excess returns compared with a buy-and-hold strategy [37]. Some

recent research combines decision tree analysis with evolutionary algorithms to allow the model

to adapt to changing market conditions. Hsu et al. present constraint-based evolutionary

classification trees (CECT) and show strong predictability of a company’s financial performance

[16].

9

[Type here]

[Type here]

Bui Ngoc Duc

Support Vector Machines (SVM) are also often used in prediction market behaviors. Huang

et al. compare SVM with other classification methods (random Walk, linear discriminant

analysis, quadratic discriminant analysis and elman backpropagation neural networks) and finds

that SVM performs the best in forecasting weekly movements of the Nikkei 225 index [17].

Similarly, Kim compares SVM with NN and case-based reasoning (CBR) and finds that SVM

outperforms both in forecasting the daily direction of change in the Korea composite stock price

index (KOSPI) [23]. Likewise, Yang et al. use a margin-varying Support Vector Regression

model and show empirical results that have good predictive value for the Hang Seng Index [46].

Nair et al. propose a system that is a genetic algorithm optimized decision treesupport vector

machine hybrid and validate its performance on the BSE-Sensex and found that its predictive

accuracy is better than that of both a NN and Naive bayes based model [31].

While some studies have tried to compare various machine learning algorithms against each

other, the results have been inconsistent. Patel et al. compares four prediction models, NN, SVM,

random forest and naive-Bayes and find that over a ten years period of various indices, the

random forest model performed the best. However, Ou and Wang examine the performance of

ten machine learning classification techniques on the Hang Sen Index and found that the SVM

outperformed the other models [33]. Kara et al. compared the performance of NN versus SVM

on the daily Istanbul Stock Exchange National 100 Index and found that the average

performance of the NN model (75.74%) was significantly better than that of the SVM model

(71.52%) [22].

Machine learning researches are focus on predictive modeling. However, aiming to create an

agent in dynamic environment that is able to learn and improve his performance policy during

training requires another approach of machine learning that is reinforcement learning, when

10

[Type here]

[Type here]

Bui Ngoc Duc

agent is created to find the optimal policies and maximize its reward. But that is kind of a

isolated way to think about the trading environment; what if there is other agents in the world

and in fact evidence suggest that there are in fact others agents exist in the world with our agent.

Thus, game theory - the mathematics of conflict between participants is the missing piece to

complete the model of market. Eric Engle et al [note] provided the theoretical ideas of combining

game theory and machine learning to agent-based approach in stocks, but lack of implementation

result

Chapter 3: Theoretical reviews

In the first part of this chapter, we laid out the foundations of game theory. At the beginning it

formalizes the basic deﬁnitions, which are necessary to be able to correctly speak about games

and game-plays. Consecutively it presents the standard representations of games. The

background in game theory is essential for ﬁnding rational responses and also for general

reasoning about games. A mathematical formalization of game theory in this chapter is inspired

by [16]. In the later part of the chapter, we shall mention how the game theory is applied to

create decision making agent in stock market environment along with the difficulties of

traditional game theory approach and the need for simulation approach and algorithms.

Game theory frame work

Game theory is a part of applied mathematics that studies a strategic decision making. It uses

mathematical models to formulate interactions between intelligent rational decision-makers.

These interactions are called games.

11

[Type here]

[Type here]

Bui Ngoc Duc

Game

Games are played within a game environment (foot note :” The diﬀerence between games and

game environments is sometimes omitted. Although, it is useful to distinguish them, especially in

the context of general game playing. This problematics is further explained in chapter 4”) (also

called world) and are composed of system of rules, which deﬁnes the players, the actions and

postulates the dynamics of the game. The game is called a puzzle, if there is no more than one

agent involved. Otherwise it is a conﬂict [18].

Deﬁnition 2.1. Player

A player (or an agent) is an entity able to act. His activities alter the world in which he exists.

The concept of game consists of active and passive elements. Passive elements represent the

information, i.e. which actions are feasible for a particular agent in a given state, or how the

game will evolve under certain conditions and actions taken. Active elements in the game form

the players. Without the players, the game remains static. Only their actions can manipulate the

game.

Deﬁnition 2.2. Action

An action (or a move) is a change in the game caused by a player in a particular situation.

A valid game environment enables all agents to act and be immediately aware of their actions.

Their activity can lead to changing current situation as a consequence of their decision making.

Diﬀerent situations which can occur before the game terminates are called states of the game.

Game is played within a game environment.

12

[Type here]

[Type here]

Bui Ngoc Duc

Every game begins in a root state and then progresses according to the game dynamics, as

participating agents make their decisions. All rational players select their actions to achieve their

goals. Theory of utility was established to recognize the eﬀects of their behavior and evaluate the

situations in which the agents are located. Utility is a value which measures the usefulness of the

current state of the game for each player.

Deﬁnition 2.3. Utility

Let S be a set with weak ordering preference relation ≤. Utility (or outcome) is a cardinal

element e ∈ S, representing the motivation of players. The function u is said to be utility function

IFF ∀x, y ∈ S: u(x) ≤ u(y) ⇔ x ≤ y.

All together, a mathematical game is a structure, which conclusively deﬁnes the whole game and

its development.

Deﬁnition 2.4. Game

Game is a tuple , where:

is a set of players;

is a set of sets of available actions for each player; and

u is a utility function .

This general deﬁnition of game expects all players to act simultaneously in just one round

and then it ends. Nevertheless, the end of a game in ﬁnite time is guaranteed only in the so-called

ﬁnite games. It signiﬁes that at some point they will terminate and the utilities are assigned. All

ﬁnite games have starting and terminal states. In these games the number of players is ﬁnite, as

well as the number of permitted actions for each player. An agent can face only ﬁnitely many

situations in ﬁnite game, and the game-play cannot go on indeﬁnitely [19].

13

[Type here]

[Type here]

Bui Ngoc Duc

Agents’ strategies

When there is more than a single agent in the environment, the whole game changes in

accordance to the activity of all players. In this setting the outcome depends not only on actions

of one particular agent, but on the behavior of all of them. Strategies can be seen as plans

contingency or policy for playing the game. In every situation, agent’s reaction is deﬁned by his

strategy.

This approach is certainly rational enough in puzzles, where there is only one agent to set the

course of the world. In contrast, in the environments with greater number of other players it is

prefer able to rather randomize over the set of pure strategies, following selected probability

distribution. Sometimes rather than a strategy, randomizing the decisions can be seen as a belief

of an agent, that he can proﬁt from playing such action. This kind of strategy is called mixed.

Playing a mixed strategy ensures that every agent can only guess what will happen; and

compared to the pure strategies, the outcome is now less predictable.

Optimal strategy

The whole game theory was originally established to solve a simple question. What is an optimal

reaction? How should an agent react to be the most likely to win the game? The answer is that

the fundamental advantage for a player can be an information about the strategies of his

opponents. In other words, once an agent is able to guess the next action of any other agent, he

can deliberately follow a strategy which maximizes his terminal utility. In conclusion, the set of

all optimal strategies (meaning the strategies with the highest equal expected utility of a rational

well-informed agent pi is then absolutely decided by the strategies of the others.

14

[Type here]

[Type here]

Bui Ngoc Duc

.

Deﬁnition Best response

Agent’s strategy in game is a best response to strategies:

Unfortunately, in most cases the information about the opponents’ strategies is out of reach or

obtaining is impossible in sense of computational complexity. Another possibility would be to

estimate the strategies, e. g. from the previous actions of other players, and consecutively adjust

his own one.

Deﬁnition 2.5. Nash equilibrium (NE)

Given a game and strategies , players P are in Nash equilibrium

If the stage of the world allows no one to beneﬁt from changing his strategy, the situation

remains stable. It has been proved, that in every game with ﬁnitely many players and with ﬁnite

set of pure strategies, there is at least one Nash equilibrium proﬁle, although it might consist of

mixed strategies [22]. ].( choox nay xem references roi sua laic chop hu hop)

Game representations

There is a number of various representations of games. The most simple one was presented at the

beginning of this part. Although the general deﬁnition is su ﬃcient enough for the mathematical

apparatus, for concrete game examples it is more convenient to establish standard forms and

structures for working with the game data. Diﬀerent representations extend the general

deﬁnition, thus allowing various games to express their speciﬁc aspects in more suitable form.

Algorithms for ﬁnding Nash equilibria can be adapted to a particular representation to reduce

computational complexity. There exist several representations of games, taking into account

15

[Type here]

[Type here]

Bui Ngoc Duc

stochasticity, number of players and decision points, possibility of cooperation and other

important characteristics of the game.

Normal form

Normal (or strategic) form is a basic type of game representation, . Each player moves once and

actions are chosen simultaneously. This makes the model simpler than other forms and easier to

solve for Nash equilibrium, but lacks any temporal locality.

The most famous representative game of normal form game is Prisoner’s Dilemma which is

describes as follow:

Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary

confinement with no means of communicating with the other. The prosecutors lack sufficient

evidence to convict the pair on the principal charge. They hope to get both sentenced to a year in

prison on a lesser charge. Simultaneously, the prosecutors offer each prisoner a bargain. Each

prisoner is given the opportunity either to: betray the other by testifying that the other committed

the crime, or to cooperate with the other by remaining silent. The offer is:

If X and Y each betray the other, each of them serves 5 years in prison

If X betrays Y but Y remains silent, X will be set free and Y will serve 20 years in prison

(and vice versa)

If X and Y both remain silent, both of them will only serve 1 year in prison (on the

lesser charge)

16

[Type here]

[Type here]

Bui Ngoc Duc

An example of Prisoner’s Dilemma game

From that example we would observe that both confess is the Nash equilibrium of this game

because both player have no incentive to change their options.

Extensive form

Extensive form models a multi-agent sequential decision making. Convenient representation of

an extensive-form game is a game tree. Such structure allows to express even complicated

branching of the game, restricting actions in diﬀerent game states to the feasible ones only.

Definition 2.6. Game tree

Every game tree is a tuple where:

S is a set of game states;

Z is a subset of S of terminal states;

A is a set of game actions;

e is an expander function, e: s ∈ S → {a ∈ A | a is executable in s};

f is a successor function, f: (s ∈ S × a ∈ e(s)) → t ∈ S; and

r ∈ S is a root state.

Using the notion of a game tree, now it is possible to deﬁne an extensive-form game. This

representation consists of a game tree with a set of players, who are assigned to the states of the

17

[Type here]

[Type here]

Bui Ngoc Duc

tree; and a utility function, which determines the utility in every terminal state, i.e. in every leaf

of the game tree.

Definition 2.7. Extensive-form games

Game in extensive form is a tuple , where:

is a set of players;

T is a game tree

b is a player belonging function1) ; and

u is a utility function

Extensive form was originally designed for sequential games, where players take their actions

one by one. Game trees in these games provide a suitable way to visualize the game-play. This

representation is also more complex than normal form.

In the example of matching pennies in extensive form, the second player can always make her

choice dependent on the first player’s choice; if the first player selects Head, she will select Tail,

and if the first player selects Tail, she will select Head. If paired with any of the two pure

strategies of the first player, we have a Nash equilibrium in pure strategies.

An example of extensive-form game – Matching pennies.

18

[Type here]

[Type here]

Bui Ngoc Duc

Stochastic games (Markov Games)

Arguably, most—if not all—real-world systems are influenced by events of a probabilistic

nature. Shapley (1953) was the first to define a game model that in corporates probabilistic

choices.

Definition 2.8. Stochastic games

According to Shapley, stochastic games is a tuple of where:

S: is the states of the games;

Ai is the set of available action for player i, A is the set of available action for players;

T: is the transitions function it means that at state S if player I choose action a i and

others choose action simultaneously then the probability of reaching some next states S’;

R: is the reward for the players for taking chosen actions.

γ: is the discount factor.

N: is the number of players

Shapley games are played by a finite number of players on a finite state space, and in each state,

each player chooses one of finitely many actions resulting profile of actions determines a reward

for each player and a probability distribution on successor states.

In principle, a stochastic game proceeds ad infinitum. The payoff that each player receives is

given by a function of the infinite stream of rewards for this player: Shapley considered games

where payoffs are discounted sum of rewards; other popular payoff functions are the limit

average of the rewards or the total sum of the reward that was mentioned by Filar & Vrieze in

1997.

A pure strategy in a stochastic game assigns an action to each possible sequence of states visited

so far, where as a randomized strategy assigns a probability distribution on actions to each such

sequence. Hence, every player has at his command, the Nash’s theorem of equilibrium is not

19

[Type here]

[Type here]

Bui Ngoc Duc

applicable. Nevertheless, in the case of discounted payoffs, there always exists a Nash

equilibrium in randomized strategies. There is even a Nash equilibrium where strategies only

depend on the current state and not on the full history of visited states; we call such strategies

stationary. For the general sum game, the Nash equilibria do not exist.

Thus, how the stochastic game could be applied to our research in order to create an agent having

the ability to make decision without the supervised of human. In principle, the stock market as a

stochastic game between our agent and others self interested agent, they can cooperate or

competitive with others in order to gain the optimize reward. However, the practical problem is

unable to know all the information about other agents’ decision and state. Then, in the context of

this thesis, we describe the stock market game as two player stochastic game, all the interaction

of other agents to our agent’s action shall be reflected through market movement (nature). As can

be seen, it could be ease to directly apply the stochastic game to stock market where our agent

chooses an action based on the current state, estimate the next available states and rewards, then

choose the best respond at the current state. However, it is impossible to predetermine all state

and available next states along with the rewards from taking actions because of the complex the

nature of the market. Fortunately, other research field that holds the key factor to solve our

problem simulation and computer science approach in the form of machine learning.

Simulation

In the following parts, we shall mention some key concept of simulation and machine learning to

provide more insight on how they could be the solution for the problem of traditional stochastic

game.

Simulation

20

[Type here]

[Type here]

Bui Ngoc Duc

Simulation methods are ways to imitate of the operation of real-world systems. It first requires

that a model be developed representing characteristics, behaviors and functions of the selected

system or process. The model represents the system itself, whereas the simulation represents the

operation of the system over time.

The methods are widely used is Economy, Biology, Engineering and almost all sciences. It is

usually done using computers making changes to variables and performing predictions about the

behavior of the system. A good example of the usefulness of computer simulation can be found

in automobile traffic simulation, grocery stores check out lines, inventory management, stock

prices predictions, environmental consequences of policies and so on.

Key issues in simulation include acquisition of valid source information about the relevant

selection of key characteristics and behaviors, the use of simplifying approximations and

assumptions within the simulation, and fidelity and validity of the simulation outcomes.

Procedures and protocols for model verification and validation are an ongoing field of academic

study, refinement, research and development in simulations technology or practice, particularly

in the field of computer simulation.

21

[Type here]

[Type here]

Bui Ngoc Duc

The simulation procedure.

Algorithms

Machine learning

Machine learning: Machine learning is a field of computer science that often uses statistical

techniques to give computers the ability to "learn" (i.e., progressively improve performance on a

specific task) with data, without being explicitly programmed [Samuel, Arthur (1959). "Some Studies in Machine Learning

Using the Game of Checkers". IBM Journal of Research and Development.]

Analysts like to talk about their model that they build in term of the problem that they solve. A

model is the process of taking in observations then provide predictions. There was a lot of

models that people have built base on the application of simulation model, for example the

famous Black-Scholes model that predicts options prices. Those models are developed by using

mathematical formula based.

22

[Type here]

[Type here]

Bui Ngoc Duc

However, to deal with the problem of building an agent that can learn and adapt to the

environment, we need simulation approach under the form of machine learning. With machine

learning, we do not use direct observations like modeling, we try to use data. The machine

learning process is to take historical data, run it through a machine learning algorithm to generate

the model. The model is not built by human but the machine it self. Then when we need to use

the model, we just provide some input and the out put come out automatically.

Application to stock data

The application of machine learning approach to stock data is quite straight-forward, the

following figure shall describe how it works with historical stock data. The historical data

represents the value of the features for a particular stock through time horizon, we represent

those features by stacking these one behind the other. We use machine learning algorithms to

train our agent based on those features and historical price.

23

[Type here]

[Type here]

Historical data

Bui Ngoc Duc

Historical price

Time horizon

Features (x)

P/E

Bollinger band

Moving average

Price (y)

An example of machine learning algorithm applies in stock data.

Reinforcement learning

The simple machine learning model is good at predictive result from recognizing the market

pattern of the input data; however, in order to create an agent that is able to determine the best

respond under specific pattern we shall use another research branch of machine learning –

Reinforcement Learning (RL)

The trading agent might be conveniently modeled in the framework of reinforcement learning as

mention above. This framework adjusts the parameters of an agent to maximize the expected

payoff or reward generated due to its actions. Therefore, the agent learns a policy that tells him

24

[Type here]

[Type here]

Bui Ngoc Duc

the actions it must perform to achieve its best performance. This optimal policy is exactly what

we hope to find when we are building an automated trading strategy.

To solving Stochastic games of our agent, Markov decision processes (MDPs) are the most

common model when implementing reinforcement learning. It can be considered as narrow

down model of Stochastic games. The MDP model of the environment consists, among other

things, of a discrete set of states S and a discrete set of actions taken from A. In this project, we

only mention the action set of our agent because we assume that other agent action will be

reflected as price movement of the stock; depending on the position of the learner (long or short),

at each time step t it will be allowed to choose an action at from different subsets from the action

space A, that consists of three possible actions:

Where:

None indicates that the agent shouldn't have any order in the market.

Long and Short means that the agent should execute a market order to buy or sell 100

stocks (the size of an order will always be a hundred shares).

So, at each discrete time step t, the agent senses the current state and choose to take an action at.

The environment responds by providing the agent a reward and by producing the succeeding

state The functions r and δ only depend on the current state and action (it is memoryless), are

part of the environment and are not necessarily known to the agent.

The task of the agent is to learn a policy that maps each state to an action, selecting its next

action at based solely on the current observed state st, that is . The optimal policy, or control

25

[Type here]

Bui Ngoc Duc

Abstract:

Keywords

Data mining, Game theory, policy making process, reinforcement learning

Algorithmic Trading: Game-theoretic and Simulation Approach to Reinforcement Learning bot

1

[Type here]

[Type here]

Bui Ngoc Duc

1. Chapter 1: Introduction

1.1. Problem statement

Trading stocks on the stock market is one of the major investment activities. In the past,

investors developed a number of stock analysis method that could help them predict the direction

of stock price movement. Modelling and predicting of equity future price, based on the current

financial information and news, is of enormous use to the investors. Investors want to know

whether some stock will rise or fall over certain period of time. In order to predict how some

company, in which investor want to invest, would perform in future, they developed a number of

analysis methods based on current and past financial data and other information about the

company. Financial balance sheets and various ratios that describe the health of company are the

bases of technical analysis that investors undertake to analyze and predict company’s future

stock prize. Predicting the direction of stock price is particularly important for value investing.

Experienced analysts could apply some mathematical models that are proven based on the

past data in order to evaluate company’s intrinsic value. However, markets do not remain stable

and indicators that have strong predictive value over one period may cease to generate excess

returns as soon as market conditions change. New investment strategies and new technology

were introduced, which made some of the old models obsolete. Since financial literacy became

higher, there are more market players than ever. Two measures have been proposed to counter

this evolving market behavior. First, some trading systems are based on genetic algorithms that

transform the indicators that are used as attributes over time [6] [28]. Second, more commonly,

the data set is fit to nonlinear models using machine learning algorithms such as Artificial Neural

Networks [10].

2

[Type here]

[Type here]

Bui Ngoc Duc

The introduction to algorithms in trading definitely changed the stock market. Algorithms

made it easy to react fast to certain events on the stock market. Machine learning algorithms also

enabled analysts to create models for predicting prices of stocks much easier. Introduction of

machine learning caused that new models can be developed based on the past data. The proof is

the AI fund have outperformed their peers while providing downside protection, according to

Eurekahedge’s report.

3

[Type here]

[Type here]

Bui Ngoc Duc

The table above is comparing AI funds to the average hedge fund and systematic

CTA/managed futures strategies, which can be considered the rough approximation for the

average quant fund. Source: Eurekahedge.

For the successful performance of AI fund, in this paper we will describe introduction to the

method for creating artificial agent trading on stock market using stock prices and through

several machine learning algorithms.

1.2. Objective of research

The monetary motivation behind the predictive value of buying and selling stocks at profitable

positions is a key driver of this research. Our main hypothesis was that by applying machine

learning and training it on the past data, it is possible to predict the movement of the stock price

through market’s patterns, then applying algorithms to create a profitable trading agent. We use

Profit and Loss (PnL) factor of agent through the test to justify the profitability of our agent. We

shall conduct some simulations to examine whether the agent is profitable under different data

set (seen and unseen) then calculate the average PnL of the agent.

1.3. Scope of the research

This thesis only provides elementary introduction approach to the algorithmic trading and game

theory approach as the frame work for market environment. The game environment is

uncomplicated when we assumed that others respond to our agent’s strategies indicate the stock

price movement. Moreover, the algorithms used to create and train the agent exploits from the

machine learning algorithms library called “Scikit-learn”, “Keras”. Nevertheless, exploited

algorithms and functions shall be explained in the Appendix of this thesis.

4

[Type here]

[Type here]

Bui Ngoc Duc

1.4. Overview

The thesis is organized in the following manner:

• Chapter 1 is stated the motivation for writing this thesis, the objectives and scope of the

research.

• In Chapter 2, we provide the background of Efficient Market Hypothesis (EMH) and

it’s contradicts, as well as relevant works for this topic.

• The game theoretical frame work background for describe the market, simulation

approach and algorithms are established in Chapter 3.

• Chapter 4 describes the methods of data collection as well as data processing,

implementation and simulation on different variable of model.

• Section 5 is the last section, we will discuss the final results of our agent, explain the

limitations of our research and state future improvement.

5

[Type here]

[Type here]

Bui Ngoc Duc

Chapter 2: literature review:

This section begins with a background to efficient markets and then gives a brief review of

previous empirical studies that use machine learning algorithms to construct trading strategies.

1.5. Efficient Markets

One of the strongest oppositions to the existence of profitable trading strategies is founded on the

ideas of Efficient Market Hypothesis (EMH). Since EMH implies that our search for

continuously profitably trading strategies is futile, we first give an overview of EMH and then

show the empirical results that contradict this theory.

EMH states that the current market price reflects the assimilation of all the information

available [13]. That is, its proponents argue that since the stocks always trade at their fair value

on stock exchanges, it is impossible to outperform the overall market through expert stock

selection or market timing. Any new information is quickly integrated into the market price.

Fama formalized the concept of efficient markets in 1970 by expressing the non-predictability of

market prices:

Where:

is the price of security j at time t;

is the one-period percentage return; and

is the information reflected at time t.

6

[Type here]

[Type here]

Bui Ngoc Duc

Based on this expectation expression, Fama argues that there is no possibility of finding

excess market returns via market timing based solely on information in , hence dispelling the

possibility of trading strategies based on technical indicators.

On the other hand, despite the theoretically sound nature of EMH, research over the last 30

years has shown that several assumptions made in EMH may be unrealistic. First, a fundamental

assumption is that investors behave rationally, or that the deviations of the many irrational

investors cancel out. However, some research has shown that investors are not strictly rational

[41], or devoid of biases [20]. Indeed, people with a conservatism bias tend to underweight new

information. Moreover, experiments have shown that these biases tend to be systematic and that

deviations do not cancel each other out [21]. This leads to over- and under-reaction to news

events.

From the 1990s, literature has seen the growing decline of the EMH and the emergence of

behavioral finance. Behavioral finance views the market as an aggregate of human actions filled

with imperfect and inefficient decisions. Under this theory, the financial markets are a reflection

of human desires, goals, motivations, errors and overconfidence [40]. An alternative to EMH that

has grown traction is the idea of the Adaptive Market Hypothesis, which posits that profit

opportunities from inefficiencies exist in finance markets but are eroded away as the knowledge

of the efficiency spreads throughout the public and the public capitalizes on the opportunities. By

this view of financial markets, many have built evolutionary and/or non-linear models and

demonstrated that excess returns can be attained on out-of-sample data.

7

[Type here]

[Type here]

Bui Ngoc Duc

1.6. Previous Research

Because of their ability to model nonlinear relationships without pre-specification during the

modeling process, neural networks (NNs) have become a popular method in financial time-series

forecasting. NNs also offer huge flexibility in the type of architecture of the model, in terms of

number of hidden nodes and layers. Indeed, Pekkaya and Hamzacebi compare the results from

using a linear regression versus a NN model to forecast macro variables and show that the NN

gives much better results [35].

Many studies have used NNs and shown promising results in the financial markets.

Grudnitski and Osburn implemented NNs to forecast S&P500 and Gold futures price directions

and found they were able to correctly prediction the direction of monthly price changes 75% and

61% respectively [15]. Another study showed that a NN-based model leads to higher arbitrage

profits compared to cost of carry models. Phua, Ming and Lin implement a NN using

Singapore’s stock market index and show a forecasting accuracy of 81% [36]. Similarly, NN

models applied to weekly forecasting of Germany’s FAZ index find favorable predictive results

compared to conventional statistical approaches [14].

More recently, NNs have been augmented or adapted to improve performance on financial

time series forecasting. Shaoo et al. show that cascaded functional link artificial neural networks

(CFLANN) perform the best in FX markets [39]. Egrioglu et al. introduce a new method based

on feed forward artificial neural networks to analyze multivariate high order fuzzy time series

forecasting models [12]. Liao and Wang used a stochastic time effective neural network model to

show predictive results on the global stock indices. Bildirici and Ersin combined NNs with

ARCH/GARCH and other volatility-based models to produce a model that out performed ANNs

or GARCH based models alone. Moreover, Yudong and Lenan used back-trial chemotaxis

8

[Type here]

[Type here]

Bui Ngoc Duc

optimization (BCO) and back-propagation NN on S&P500 index and conclude that their hybrid

model (IBCO-BP) offers less computational complexity, better prediction accuracy and less

training time.

Another popular machine learning classification technique that does not require any domain

knowledge or parameter setting is the decision tree. It also often offers a better visually

interpretable model compared to NN, as the nodes in the tree can be easily understood. The

simplest type of decision tree model is the classification and regression tree (CART). Sorensen et

al. show that CART decision trees perform better than single-factor models-based on the same

variables in picking stock portfolios [42]. Wang and Chan use a two-layer bias decision tree to

predict the daily stock prices of Microsoft, Intel and IBM, finding excess returns compared to a

buy and hold method [43]. Another study found that a boosted alternating decision tree with

expert weighing generated abnormal returns for the S&P500 index during the test period [11]. To

improve accuracy, some studies used the random forest algorithm for classification, which will

be further discussed in chapter 4. Namely, Booth et al. show that a regency-weighted ensemble

of random forests produced superior results when analyzed on a large sample of stocks from the

DAX in terms of both profitability and prediction accuracy compared with other ensemble

techniques [7]. Similarly, a gradient boosted random forest model applied to Singapore’s stock

market was able to generate excess returns compared with a buy-and-hold strategy [37]. Some

recent research combines decision tree analysis with evolutionary algorithms to allow the model

to adapt to changing market conditions. Hsu et al. present constraint-based evolutionary

classification trees (CECT) and show strong predictability of a company’s financial performance

[16].

9

[Type here]

[Type here]

Bui Ngoc Duc

Support Vector Machines (SVM) are also often used in prediction market behaviors. Huang

et al. compare SVM with other classification methods (random Walk, linear discriminant

analysis, quadratic discriminant analysis and elman backpropagation neural networks) and finds

that SVM performs the best in forecasting weekly movements of the Nikkei 225 index [17].

Similarly, Kim compares SVM with NN and case-based reasoning (CBR) and finds that SVM

outperforms both in forecasting the daily direction of change in the Korea composite stock price

index (KOSPI) [23]. Likewise, Yang et al. use a margin-varying Support Vector Regression

model and show empirical results that have good predictive value for the Hang Seng Index [46].

Nair et al. propose a system that is a genetic algorithm optimized decision treesupport vector

machine hybrid and validate its performance on the BSE-Sensex and found that its predictive

accuracy is better than that of both a NN and Naive bayes based model [31].

While some studies have tried to compare various machine learning algorithms against each

other, the results have been inconsistent. Patel et al. compares four prediction models, NN, SVM,

random forest and naive-Bayes and find that over a ten years period of various indices, the

random forest model performed the best. However, Ou and Wang examine the performance of

ten machine learning classification techniques on the Hang Sen Index and found that the SVM

outperformed the other models [33]. Kara et al. compared the performance of NN versus SVM

on the daily Istanbul Stock Exchange National 100 Index and found that the average

performance of the NN model (75.74%) was significantly better than that of the SVM model

(71.52%) [22].

Machine learning researches are focus on predictive modeling. However, aiming to create an

agent in dynamic environment that is able to learn and improve his performance policy during

training requires another approach of machine learning that is reinforcement learning, when

10

[Type here]

[Type here]

Bui Ngoc Duc

agent is created to find the optimal policies and maximize its reward. But that is kind of a

isolated way to think about the trading environment; what if there is other agents in the world

and in fact evidence suggest that there are in fact others agents exist in the world with our agent.

Thus, game theory - the mathematics of conflict between participants is the missing piece to

complete the model of market. Eric Engle et al [note] provided the theoretical ideas of combining

game theory and machine learning to agent-based approach in stocks, but lack of implementation

result

Chapter 3: Theoretical reviews

In the first part of this chapter, we laid out the foundations of game theory. At the beginning it

formalizes the basic deﬁnitions, which are necessary to be able to correctly speak about games

and game-plays. Consecutively it presents the standard representations of games. The

background in game theory is essential for ﬁnding rational responses and also for general

reasoning about games. A mathematical formalization of game theory in this chapter is inspired

by [16]. In the later part of the chapter, we shall mention how the game theory is applied to

create decision making agent in stock market environment along with the difficulties of

traditional game theory approach and the need for simulation approach and algorithms.

Game theory frame work

Game theory is a part of applied mathematics that studies a strategic decision making. It uses

mathematical models to formulate interactions between intelligent rational decision-makers.

These interactions are called games.

11

[Type here]

[Type here]

Bui Ngoc Duc

Game

Games are played within a game environment (foot note :” The diﬀerence between games and

game environments is sometimes omitted. Although, it is useful to distinguish them, especially in

the context of general game playing. This problematics is further explained in chapter 4”) (also

called world) and are composed of system of rules, which deﬁnes the players, the actions and

postulates the dynamics of the game. The game is called a puzzle, if there is no more than one

agent involved. Otherwise it is a conﬂict [18].

Deﬁnition 2.1. Player

A player (or an agent) is an entity able to act. His activities alter the world in which he exists.

The concept of game consists of active and passive elements. Passive elements represent the

information, i.e. which actions are feasible for a particular agent in a given state, or how the

game will evolve under certain conditions and actions taken. Active elements in the game form

the players. Without the players, the game remains static. Only their actions can manipulate the

game.

Deﬁnition 2.2. Action

An action (or a move) is a change in the game caused by a player in a particular situation.

A valid game environment enables all agents to act and be immediately aware of their actions.

Their activity can lead to changing current situation as a consequence of their decision making.

Diﬀerent situations which can occur before the game terminates are called states of the game.

Game is played within a game environment.

12

[Type here]

[Type here]

Bui Ngoc Duc

Every game begins in a root state and then progresses according to the game dynamics, as

participating agents make their decisions. All rational players select their actions to achieve their

goals. Theory of utility was established to recognize the eﬀects of their behavior and evaluate the

situations in which the agents are located. Utility is a value which measures the usefulness of the

current state of the game for each player.

Deﬁnition 2.3. Utility

Let S be a set with weak ordering preference relation ≤. Utility (or outcome) is a cardinal

element e ∈ S, representing the motivation of players. The function u is said to be utility function

IFF ∀x, y ∈ S: u(x) ≤ u(y) ⇔ x ≤ y.

All together, a mathematical game is a structure, which conclusively deﬁnes the whole game and

its development.

Deﬁnition 2.4. Game

Game is a tuple , where:

is a set of players;

is a set of sets of available actions for each player; and

u is a utility function .

This general deﬁnition of game expects all players to act simultaneously in just one round

and then it ends. Nevertheless, the end of a game in ﬁnite time is guaranteed only in the so-called

ﬁnite games. It signiﬁes that at some point they will terminate and the utilities are assigned. All

ﬁnite games have starting and terminal states. In these games the number of players is ﬁnite, as

well as the number of permitted actions for each player. An agent can face only ﬁnitely many

situations in ﬁnite game, and the game-play cannot go on indeﬁnitely [19].

13

[Type here]

[Type here]

Bui Ngoc Duc

Agents’ strategies

When there is more than a single agent in the environment, the whole game changes in

accordance to the activity of all players. In this setting the outcome depends not only on actions

of one particular agent, but on the behavior of all of them. Strategies can be seen as plans

contingency or policy for playing the game. In every situation, agent’s reaction is deﬁned by his

strategy.

This approach is certainly rational enough in puzzles, where there is only one agent to set the

course of the world. In contrast, in the environments with greater number of other players it is

prefer able to rather randomize over the set of pure strategies, following selected probability

distribution. Sometimes rather than a strategy, randomizing the decisions can be seen as a belief

of an agent, that he can proﬁt from playing such action. This kind of strategy is called mixed.

Playing a mixed strategy ensures that every agent can only guess what will happen; and

compared to the pure strategies, the outcome is now less predictable.

Optimal strategy

The whole game theory was originally established to solve a simple question. What is an optimal

reaction? How should an agent react to be the most likely to win the game? The answer is that

the fundamental advantage for a player can be an information about the strategies of his

opponents. In other words, once an agent is able to guess the next action of any other agent, he

can deliberately follow a strategy which maximizes his terminal utility. In conclusion, the set of

all optimal strategies (meaning the strategies with the highest equal expected utility of a rational

well-informed agent pi is then absolutely decided by the strategies of the others.

14

[Type here]

[Type here]

Bui Ngoc Duc

.

Deﬁnition Best response

Agent’s strategy in game is a best response to strategies:

Unfortunately, in most cases the information about the opponents’ strategies is out of reach or

obtaining is impossible in sense of computational complexity. Another possibility would be to

estimate the strategies, e. g. from the previous actions of other players, and consecutively adjust

his own one.

Deﬁnition 2.5. Nash equilibrium (NE)

Given a game and strategies , players P are in Nash equilibrium

If the stage of the world allows no one to beneﬁt from changing his strategy, the situation

remains stable. It has been proved, that in every game with ﬁnitely many players and with ﬁnite

set of pure strategies, there is at least one Nash equilibrium proﬁle, although it might consist of

mixed strategies [22]. ].( choox nay xem references roi sua laic chop hu hop)

Game representations

There is a number of various representations of games. The most simple one was presented at the

beginning of this part. Although the general deﬁnition is su ﬃcient enough for the mathematical

apparatus, for concrete game examples it is more convenient to establish standard forms and

structures for working with the game data. Diﬀerent representations extend the general

deﬁnition, thus allowing various games to express their speciﬁc aspects in more suitable form.

Algorithms for ﬁnding Nash equilibria can be adapted to a particular representation to reduce

computational complexity. There exist several representations of games, taking into account

15

[Type here]

[Type here]

Bui Ngoc Duc

stochasticity, number of players and decision points, possibility of cooperation and other

important characteristics of the game.

Normal form

Normal (or strategic) form is a basic type of game representation, . Each player moves once and

actions are chosen simultaneously. This makes the model simpler than other forms and easier to

solve for Nash equilibrium, but lacks any temporal locality.

The most famous representative game of normal form game is Prisoner’s Dilemma which is

describes as follow:

Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary

confinement with no means of communicating with the other. The prosecutors lack sufficient

evidence to convict the pair on the principal charge. They hope to get both sentenced to a year in

prison on a lesser charge. Simultaneously, the prosecutors offer each prisoner a bargain. Each

prisoner is given the opportunity either to: betray the other by testifying that the other committed

the crime, or to cooperate with the other by remaining silent. The offer is:

If X and Y each betray the other, each of them serves 5 years in prison

If X betrays Y but Y remains silent, X will be set free and Y will serve 20 years in prison

(and vice versa)

If X and Y both remain silent, both of them will only serve 1 year in prison (on the

lesser charge)

16

[Type here]

[Type here]

Bui Ngoc Duc

An example of Prisoner’s Dilemma game

From that example we would observe that both confess is the Nash equilibrium of this game

because both player have no incentive to change their options.

Extensive form

Extensive form models a multi-agent sequential decision making. Convenient representation of

an extensive-form game is a game tree. Such structure allows to express even complicated

branching of the game, restricting actions in diﬀerent game states to the feasible ones only.

Definition 2.6. Game tree

Every game tree is a tuple where:

S is a set of game states;

Z is a subset of S of terminal states;

A is a set of game actions;

e is an expander function, e: s ∈ S → {a ∈ A | a is executable in s};

f is a successor function, f: (s ∈ S × a ∈ e(s)) → t ∈ S; and

r ∈ S is a root state.

Using the notion of a game tree, now it is possible to deﬁne an extensive-form game. This

representation consists of a game tree with a set of players, who are assigned to the states of the

17

[Type here]

[Type here]

Bui Ngoc Duc

tree; and a utility function, which determines the utility in every terminal state, i.e. in every leaf

of the game tree.

Definition 2.7. Extensive-form games

Game in extensive form is a tuple , where:

is a set of players;

T is a game tree

b is a player belonging function1) ; and

u is a utility function

Extensive form was originally designed for sequential games, where players take their actions

one by one. Game trees in these games provide a suitable way to visualize the game-play. This

representation is also more complex than normal form.

In the example of matching pennies in extensive form, the second player can always make her

choice dependent on the first player’s choice; if the first player selects Head, she will select Tail,

and if the first player selects Tail, she will select Head. If paired with any of the two pure

strategies of the first player, we have a Nash equilibrium in pure strategies.

An example of extensive-form game – Matching pennies.

18

[Type here]

[Type here]

Bui Ngoc Duc

Stochastic games (Markov Games)

Arguably, most—if not all—real-world systems are influenced by events of a probabilistic

nature. Shapley (1953) was the first to define a game model that in corporates probabilistic

choices.

Definition 2.8. Stochastic games

According to Shapley, stochastic games is a tuple of where:

S: is the states of the games;

Ai is the set of available action for player i, A is the set of available action for players;

T: is the transitions function it means that at state S if player I choose action a i and

others choose action simultaneously then the probability of reaching some next states S’;

R: is the reward for the players for taking chosen actions.

γ: is the discount factor.

N: is the number of players

Shapley games are played by a finite number of players on a finite state space, and in each state,

each player chooses one of finitely many actions resulting profile of actions determines a reward

for each player and a probability distribution on successor states.

In principle, a stochastic game proceeds ad infinitum. The payoff that each player receives is

given by a function of the infinite stream of rewards for this player: Shapley considered games

where payoffs are discounted sum of rewards; other popular payoff functions are the limit

average of the rewards or the total sum of the reward that was mentioned by Filar & Vrieze in

1997.

A pure strategy in a stochastic game assigns an action to each possible sequence of states visited

so far, where as a randomized strategy assigns a probability distribution on actions to each such

sequence. Hence, every player has at his command, the Nash’s theorem of equilibrium is not

19

[Type here]

[Type here]

Bui Ngoc Duc

applicable. Nevertheless, in the case of discounted payoffs, there always exists a Nash

equilibrium in randomized strategies. There is even a Nash equilibrium where strategies only

depend on the current state and not on the full history of visited states; we call such strategies

stationary. For the general sum game, the Nash equilibria do not exist.

Thus, how the stochastic game could be applied to our research in order to create an agent having

the ability to make decision without the supervised of human. In principle, the stock market as a

stochastic game between our agent and others self interested agent, they can cooperate or

competitive with others in order to gain the optimize reward. However, the practical problem is

unable to know all the information about other agents’ decision and state. Then, in the context of

this thesis, we describe the stock market game as two player stochastic game, all the interaction

of other agents to our agent’s action shall be reflected through market movement (nature). As can

be seen, it could be ease to directly apply the stochastic game to stock market where our agent

chooses an action based on the current state, estimate the next available states and rewards, then

choose the best respond at the current state. However, it is impossible to predetermine all state

and available next states along with the rewards from taking actions because of the complex the

nature of the market. Fortunately, other research field that holds the key factor to solve our

problem simulation and computer science approach in the form of machine learning.

Simulation

In the following parts, we shall mention some key concept of simulation and machine learning to

provide more insight on how they could be the solution for the problem of traditional stochastic

game.

Simulation

20

[Type here]

[Type here]

Bui Ngoc Duc

Simulation methods are ways to imitate of the operation of real-world systems. It first requires

that a model be developed representing characteristics, behaviors and functions of the selected

system or process. The model represents the system itself, whereas the simulation represents the

operation of the system over time.

The methods are widely used is Economy, Biology, Engineering and almost all sciences. It is

usually done using computers making changes to variables and performing predictions about the

behavior of the system. A good example of the usefulness of computer simulation can be found

in automobile traffic simulation, grocery stores check out lines, inventory management, stock

prices predictions, environmental consequences of policies and so on.

Key issues in simulation include acquisition of valid source information about the relevant

selection of key characteristics and behaviors, the use of simplifying approximations and

assumptions within the simulation, and fidelity and validity of the simulation outcomes.

Procedures and protocols for model verification and validation are an ongoing field of academic

study, refinement, research and development in simulations technology or practice, particularly

in the field of computer simulation.

21

[Type here]

[Type here]

Bui Ngoc Duc

The simulation procedure.

Algorithms

Machine learning

Machine learning: Machine learning is a field of computer science that often uses statistical

techniques to give computers the ability to "learn" (i.e., progressively improve performance on a

specific task) with data, without being explicitly programmed [Samuel, Arthur (1959). "Some Studies in Machine Learning

Using the Game of Checkers". IBM Journal of Research and Development.]

Analysts like to talk about their model that they build in term of the problem that they solve. A

model is the process of taking in observations then provide predictions. There was a lot of

models that people have built base on the application of simulation model, for example the

famous Black-Scholes model that predicts options prices. Those models are developed by using

mathematical formula based.

22

[Type here]

[Type here]

Bui Ngoc Duc

However, to deal with the problem of building an agent that can learn and adapt to the

environment, we need simulation approach under the form of machine learning. With machine

learning, we do not use direct observations like modeling, we try to use data. The machine

learning process is to take historical data, run it through a machine learning algorithm to generate

the model. The model is not built by human but the machine it self. Then when we need to use

the model, we just provide some input and the out put come out automatically.

Application to stock data

The application of machine learning approach to stock data is quite straight-forward, the

following figure shall describe how it works with historical stock data. The historical data

represents the value of the features for a particular stock through time horizon, we represent

those features by stacking these one behind the other. We use machine learning algorithms to

train our agent based on those features and historical price.

23

[Type here]

[Type here]

Historical data

Bui Ngoc Duc

Historical price

Time horizon

Features (x)

P/E

Bollinger band

Moving average

Price (y)

An example of machine learning algorithm applies in stock data.

Reinforcement learning

The simple machine learning model is good at predictive result from recognizing the market

pattern of the input data; however, in order to create an agent that is able to determine the best

respond under specific pattern we shall use another research branch of machine learning –

Reinforcement Learning (RL)

The trading agent might be conveniently modeled in the framework of reinforcement learning as

mention above. This framework adjusts the parameters of an agent to maximize the expected

payoff or reward generated due to its actions. Therefore, the agent learns a policy that tells him

24

[Type here]

[Type here]

Bui Ngoc Duc

the actions it must perform to achieve its best performance. This optimal policy is exactly what

we hope to find when we are building an automated trading strategy.

To solving Stochastic games of our agent, Markov decision processes (MDPs) are the most

common model when implementing reinforcement learning. It can be considered as narrow

down model of Stochastic games. The MDP model of the environment consists, among other

things, of a discrete set of states S and a discrete set of actions taken from A. In this project, we

only mention the action set of our agent because we assume that other agent action will be

reflected as price movement of the stock; depending on the position of the learner (long or short),

at each time step t it will be allowed to choose an action at from different subsets from the action

space A, that consists of three possible actions:

Where:

None indicates that the agent shouldn't have any order in the market.

Long and Short means that the agent should execute a market order to buy or sell 100

stocks (the size of an order will always be a hundred shares).

So, at each discrete time step t, the agent senses the current state and choose to take an action at.

The environment responds by providing the agent a reward and by producing the succeeding

state The functions r and δ only depend on the current state and action (it is memoryless), are

part of the environment and are not necessarily known to the agent.

The task of the agent is to learn a policy that maps each state to an action, selecting its next

action at based solely on the current observed state st, that is . The optimal policy, or control

25

## A new approach to semantic and syntactic functions of English adjectives – A contrastive analysis with their Vietnamese equivalents

## Tài liệu Research " TWO ESSAYS IN INTERNATIONAL ECONOMICS: AN EMPIRICAL APPROACH TO PURCHASING POWER PARITY AND THE MONETARY MODEL OF EXCHANGE RATE DETERMINATION " pdf

## Báo cáo khoa học: A kinetic approach to the dependence of dissimilatory metal reduction by Shewanella oneidensis MR-1 on the outer membrane cytochromes c OmcA and OmcB potx

## Báo cáo khoa học: "Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese" potx

## Báo cáo khoa học: "Hybrid Approach to User Intention Modeling for Dialog Simulation" doc

## Research " AN ENGINEERING APPROACH TO LOGISTICS AND PRODUCT MARKET FLOW USING MODIFIED PROGESSIVE EVENT EXPONENTIAL SMOOTHING " docx

## Báo cáo khoa học: "Towards a Uniﬁed Approach to Memory- and Statistical-Based Machine Translation" pdf

## Báo cáo khoa học: "Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts" doc

## Báo cáo khoa học: "A Unification-based Approach to Morpho-syntactic Parsing of Agglutinative and Other (Highly) Inflectional Languages" doc

## Báo cáo khoa học: "A Unification-based Approach to Morpho-syntactic Parsing of Agglutinative and Other (Highly) Inflectional Languages" ppt

Tài liệu liên quan