Game Theory Part 1
Game theory is a branch of mathematics that deals with decision-making in situations in which two or more players have competing interests. It is often used in economics and biology, but it can also be used in poker. In order to explain the concept we will start off easily, which means that this article won't include a lot of poker. We will start applying the concept to the game in Part 2 once we have the basics covered. The most well-known example of game theory is the prisoner's dilemma. Many of you will already know this example, but for those of you who don't I'll explain it once more.
Somewhere a felony is being committed and the police arrest two men. The police are sure that these two men committed the crime, but they don't have the evidence to prove it. Then one of them has a great idea. They decide to move the prisoners to separate rooms and give both of them the same proposition. They can snitch on their partner and therefore spend less time in jail or they can both remain silent. If none of the two prisoners say anything, the police have no evidence and both will go to jail for 1 year for gun possession. If one of them snitches and the other one stays silent, then the snitch will go free and his partner will go to jail for 10 years. If both prisoners snitch, both go to jail for 8 years (not 10 because both of them helped with the investigation).
We can present this information in a table as follows:
This table is called a payoff matrix, and in it we see the possible outcomes for the two players in the game. The first number is always the outcome for prisoner 1 (P1), and the number behind the comma is the outcome for prisoner 2 (P2). If both are silent, both go to jail for 1 year. If one snitches and the other one is silent, one will go to jail for 10 years and the other one goes free. If both of them snitch, both will go to jail for 8 years. The favoured outcome by the police is that both of them snitch on each other, as the streets would then be safe from both of these criminals. And the police are lucky, because the way they set up this situation, both prisoners will always snitch. Why does this happen?
Well, let's just imagine we are P1. You are sitting in your cell thinking about what to do, and you aren't sure what P2 is going to do, as you have no way of communicating with him. So you think about all the things that could happen. Let's say P2 snitches. In that case it would be better for you to snitch as well, as you would spend less time in jail if you do. Let's say P2 doesn't snitch. In this case it is also better for you to snitch, as you wouldn't spend any time in jail at all. So actually it doesn't matter what P2 does, because you will always be better off if you snitch. For P2, the situation is exactly the same, and he will also always be better off when snitching. Maybe P1 thinks: "maybe I should just say nothing and hope that P2 does the same, in which case we would both be out after 1 year". But if P2 then does decide to snitch, you would be in jail for 10 years! Do you want to take that risk? Usually you wouldn't, so at the end of this "game", both prisoners will end up going to jail for 8 years.
This situation is called the Nash-equilibrium, named after the great mathematician John Forbes Nash. You can find out more about him in the great film A Beautiful Mind. This equilibrium states that no player can adjust his/her strategy to profit from it. This is exactly what the two prisoners are facing here. Imagine we are in the Nash equilibrium and both prisoners snitch and go to jail for 8 years. P1 can change his strategy and be silent, but then he has to go to jail for 10 years. He doesn't win anything. The same is true for P2. Neither play can profit here from changing his strategy.
There are countless other examples for this. For example a war between two countries. A country can chose to go to war or not go to war. The other country has the same options. If both decide not to go to war, they each keep their indigenous land. If country A goes to war but country B doesn't, then A will get extra land and B will suffer losses. If country B goes to war and A doesn't, then B gains land and A loses. If both go to war then both will suffer some losses. Again we can present this situation in a payoff matrix. If a country keeps its indigenous land, the country gets a score of 0. Extra land is a score of 10 and losses are a score of -10. Some loss gets a score of -5
Again you can see that it is best for both countries to go to war, no matter what the strategy of the other country is going to be. Imagine you are country A. If country B attacks, you can decide to do nothing (-10) or counter attack (-5). Fighting back is definitely the best option here as it reduces your losses. If country B doesn't attack then country A can choose peace (0) or war (10). Again, the best option for country A is to go to war. Country B is in exactly the same situation. This is why the countries will always end up going to war, and this is called the Nash equilibrium.
The NATO is obviously trying to keep the world a peaceful place. As a solution for the problem in the example above, the NATO could say to both countries: "If you attack the other country we will bomb you back to the Stone Age". If the NATO decides to bomb a country, this country will suffer heavy losses. This would alter the payoff matrix in the following way:
Now the situation changes for both countries. Let's again imagine we are country A. Country B can attack and you can choose peace (-10) or war (-15). Now peace is the better choice. If country B now chooses peace, you can also choose peace (0) or war (-10). Again peace is the better option. In this new example the Nash equilibrium would be peace for both countries. The involvement of NATO has successfully altered the payoff matrix for both countries, resulting in peace.
Let's now look at an example that has a little bit more to do with poker. The game is called Odds and Evens and is made up of two opponents who, at the same time, have to make a decision. Both players get a coin in their hand and they can both decide whether or not to keep it in their hand. At a certain moment, both players will be asked to open their hand, and if the total amount of coins is equal to 0 or 2, player A wins, and if it's equal to 1, player B wins. The winner gets a score of +1 and the loser gets a score of -1. In a payoff matrix this would look as follows:
This game is called a constant-sum game. This is because the sum of the outcomes in the table is always equal to a constant, in this case 0. In this example it is obvious that player A needs to try and always do the same thing as player B, as then there will always be either 0 or 2 coins that appear and A wins. Player B has to try and always do exactly the opposite of what player A does, as then there will always be 1 coin that appears and player B wins.
The two players can try and look for patterns in their opponent's game and react accordingly. In this case, the player who is best at discovering these patterns will win the game. But there is also another option. Let's say you're player B and you think of yourself as a worse player than player A. What can you do about this?
Let's say you decide to turn around 0 coin X% of the time and 1 coins (1-X)% of the time. Player A is a better player, he can read our patterns and will choose a certain option 100% of the time. Let's say we decide to turn over 0 coins 75% of the time and turn over 1 coin 25% of the time, so X = 0.75 and X-1 = 0.25. Player A won't take long to figure out this pattern and will decide to turn over 0 coins 100% of the time in order to maximise his EV. Why? Well, here we have the EV formula for player A:
EV(A) = (1)(X)(Y) + (1)(1-X)(1-Y) + (-1)(X)(1-Y) + (-1)(X-1)(Y)
Where Y is the percentage of the time that A turns over 0 coins and X is the percentage of the time that we (player B) turn over 0 coins. The formula might look complicated but it actually really isn't. Player A always gets a score of +1 if he does the same thing player B does, and gets a score of -1 if he does the opposite. Because A is better than we are, he knows that X = 0.75 and 1-X = 0.25 and can include this in his EV formula:
EV(A) = (1)(0.75)(Y) + (1)(0.25)(1-Y) + (-1)(0.75)(1-Y) + (-1)(0.25)(Y)
EV(A) = 0.75Y + 0.25 – 0.25 Y – 0.75 + 0.75Y – 0.25Y
EV(A) = Y – 0.5
Now all that player A has to do is decide on a value for Y. If we know that Y has to be between 0 and 1 (as it represents a percentage), it becomes obvious that the formula is optimal for player A when Y = 1. His EV is then equal to 1 – 0.5 = 0.5. And this is actually correct. If A decides to play Y = 1 and turn over 0 coins 100% of the time, he will win 75% of the time and get a score of +1 and lose 25% of the time and get a score of -1. Together this gives us: (0.75)(1) + (0.25)(-1) = 0.5.
Now let's imagine that we (player B) do exactly the opposite, so X = 0.25 and 1-X = 0.75. If we now look at player A's EV formula, we will get the following:
EV(A) = (1)(0.25)(Y) + (1)(0.75)(1-Y) + (-1)(0.25)(1-Y) + (-1)(0.75)(Y)
EV(A) = 0.25Y + 0.75 – 0.75Y - 0.25 + 0.25Y -0.75Y
EV(A) = -Y + 0.5
This is player A's EV formula after he discovered our betting pattern, and again Y has to be between 0 and 1. We can see that his EV is greatest when Y = 0, as his EV would then equal 0.5.
So what player A does is, first of all, find out our strategy (because he is better than we are) and calculate our X to then put this value into his EV formula. He then chooses a value for Y that optimises his EV and this will always be 0 or 1, depending on player B's strategy. Player A will therefore always choose to do a certain action 100% of the time, because he knows our strategy as a result of his skill-advantage.
Because of this skill-advantage, player A will always choose the optimal strategy against us. He has become our Nemesis. The Nemesis always knows our strategy and always chooses the best counter strategy to maximise his EV, while we as player B cry ourselves to sleep at night because we just can't seem to win. But the next day we wake up with fresh hope. We go looking for a strategy that is best for us, knowing that player A will always react with a max-EV strategy.
If we (player B) decide to turn over 0 coins more than 50% of the time our EV is:
EV(B) = (-1)(X) + (1)(1-X)
EV(B) = 1 – 2X
Because we are expecting player A to turn over 0 coins 100% of the time, because he is better than we are and knows our betting pattern. If we then turn over 0 coins X% of the time, player A will also turn over 0 coins and get a score of +1, while we get a score of -1. The other (1-X)% of the time, we will turn over a coin, A won't turn over a coin because this is his optimal strategy, and we get a score of +1.
If we then decide to turn over 1 coin more than 50% of the time, our EV is:
EV(B) = (1)(X) + (-1)(1-X)
EV(B) = 2X – 1
Because we are expecting player A to always turn over 1 coin here, because he is better than we are and knows our betting pattern. If we turn over 0 coins X% of the time, player A will turn over 1 coin (because this is his optimal strategy), we win and get a score of +1. The other (1-X)% of the time we do turn over a coin, and A will do the same and win, and we get a score of -1.
Our EV, therefore, depends on X. Remember that X is equal to the percentage of the time that we turn over 0 coins. We now also have two different EV formulas for two different strategies. Strategy 1 implies that we turn over 0 coins more than 50% of the time and our EV formula is then: EV = 1 – 2x. Strategy 2 implies that we turn over 1 coin more than 50% of the time and our EV formula is then: EV = 2x – 1.
Putting these two formulas into a graph, we get:
From this table we can see that for us (player B) the optimal strategy is to turn over 1 coin 50% of the time and turn over 0 coins 50% of the time. Although this conclusion might seem logical to many of you, often calculating these outcomes can be complicated. Once you have to deal with more complex situations it will be very helpful to know the actual process of how to reach these conclusions. Strategy 1 is possible for X = 0.5 to X = 1, and Strategy 2 is possible for X = 0 to X = 0.5. Both strategies reach their optimum at X = 0.5, where EV = 0. Seeing as for any other value of X our EV is negative, we can counter player A by making each play 50% of the time. Player A is now no longer in a position to exploit his skill-advantage. He can now do what he wants, but he will never be able to reach an EV that is higher than 0, and we have successfully neutralised his skill-advantage.
This example shows how game theory can neutralise your opponent's skill-advantage. This method can also be used to help with your poker game, but that's something we'll talk about in Part 2.
I hope you found this article interesting. As always, questions, comments, and criticism are more than welcome in the forum.