## Familiarize your self with the Gymnasium environment

The aim of this notebook is to help you getting started with the [Gymnasium](https://gymnasium.farama.org) environment. Gymnasium (formerly Gym) is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Pinball.

We are now going to:

    - see how to install the gym toolkit
    - learn how to use it
    - have fun

#### Installation

The simplest way to install gymnasium is to use *pip*. This is easily done, apart from a slight option activation to get all the available environments installed, using the code below:

In [1]:
pip install gymnasium

Note: you may need to restart the kernel to use updated packages.


You may also have to install the *pygame* library as well.

In [2]:
pip install pygame

Note: you may need to restart the kernel to use updated packages.


We are now all set to start learning how to interact with the gymnasium toolkit!

In [1]:
import gymnasium as gym
import pygame

#### Interacting with the environment

The gymnasium library is a collection of test problems, often called **environments**, that you can use to work out your reinforcement learning algorithms. These environments have a shared interface, allowing you to write general algorithms. Let's have a look at the CartPole environments and play a bunch of games.s

In [3]:
env = gym.make("CartPole-v1", render_mode="human")
observation, info = env.reset(seed=42)
for _ in range(50):
   action = env.action_space.sample()  # this is where you would insert your policy
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()
env.close()
pygame.display.quit()
pygame.quit()

1   HIToolbox                           0x00007ff823bb0726 _ZN15MenuBarInstance22EnsureAutoShowObserverEv + 102
2   HIToolbox                           0x00007ff823bb02b8 _ZN15MenuBarInstance14EnableAutoShowEv + 52
3   HIToolbox                           0x00007ff823b54908 SetMenuBarObscured + 408
4   HIToolbox                           0x00007ff823b544ca _ZN13HIApplication15HandleActivatedEP14OpaqueEventRefhP15OpaqueWindowPtrh + 164
5   HIToolbox                           0x00007ff823b4e996 _ZN13HIApplication13EventObserverEjP14OpaqueEventRefPv + 252
6   HIToolbox                           0x00007ff823b16bd2 _NotifyEventLoopObservers + 153
7   HIToolbox                           0x00007ff823b4e3e6 AcquireEventFromQueue + 494
8   HIToolbox                           0x00007ff823b3d3ec ReceiveNextEventCommon + 285
9   HIToolbox                           0x00007ff823b3d2b3 _BlockUntilNextEventMatchingListInModeWithFilter + 70
10  AppKit                              0x00007ff81d344f33 _DPS

If everythings went fine, you should have seen a window where "someone" played the CartPole game. Let me summarize what the above lines did. First we load the gymnasium library (but you should have already guessed it, no?). Next we create our first gym **environment** which here happens to be the CartPole game. We then initialize the environment from the *env.reset()* call while the *env.render()* call allows us to display the current game state. Next we just do a bunch of random actions using successive *env.step* calls, more details later. Finally we do not forget to close our environment.  

Of course there are many available environments and you can get a peek either by browing the gym website or by invoking:

We are now going to take a closer the look to our gym environment. We will briefly detail:
- *action_space* which defines the action set, i.e., $\mathcal{A}$ of the lecture notes;
- *observation_space* which defines the state/observation space, i.e., $\mathcal{X}$ of the lecture notes;
- *reset* which is a method that (re)initialize the environment and outputs a initial state;

In [2]:
env1 = gym.make("CartPole-v1", render_mode = "human")
init1 = env1.reset()
env2 = gym.make("FrozenLake-v1", render_mode = "human")
init2 = env2.reset()

print(env1.action_space)## -> Discrete(2) indicates that we have only 2 possible actions '0' and '1'
print(env2.action_space)## -> Discrete(4) we now have 4 possible actions '0', ..., '4'

print(env1.observation_space)## -> Box(4,) indicates that a state is a vector of size 4
print(env2.observation_space)## -> Discrete(16) indicates that we have 16 possible states

print(init1)## this is indeed a vector of size 4
print(init2)## this is indeed an integer

1   HIToolbox                           0x00007ff823bb0726 _ZN15MenuBarInstance22EnsureAutoShowObserverEv + 102
2   HIToolbox                           0x00007ff823bb02b8 _ZN15MenuBarInstance14EnableAutoShowEv + 52
3   HIToolbox                           0x00007ff823b54908 SetMenuBarObscured + 408
4   HIToolbox                           0x00007ff823b544ca _ZN13HIApplication15HandleActivatedEP14OpaqueEventRefhP15OpaqueWindowPtrh + 164
5   HIToolbox                           0x00007ff823b4e996 _ZN13HIApplication13EventObserverEjP14OpaqueEventRefPv + 252
6   HIToolbox                           0x00007ff823b16bd2 _NotifyEventLoopObservers + 153
7   HIToolbox                           0x00007ff823b4e3e6 AcquireEventFromQueue + 494
8   HIToolbox                           0x00007ff823b3d3ec ReceiveNextEventCommon + 285
9   HIToolbox                           0x00007ff823b3d2b3 _BlockUntilNextEventMatchingListInModeWithFilter + 70
10  AppKit                              0x00007ff81d344f33 _DPS

Discrete(2)
Discrete(4)
Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)
Discrete(16)
(array([ 0.04129543, -0.01724758, -0.01258496,  0.00614324], dtype=float32), {})
(0, {'prob': 1})


We are now going to see the main method which is *step*. It takes as an argument the action label, i.e., an integer, and performs this action to the current state of the environment and outputs four values:
- a *state*, i.e., an element of the *observation_space*;
- a *reward* which is a real number;
- a *boolean* indicating if the user won the game;
- a *boolean* indicating if the game was ended before winning, i.e., maximal number of moves reached.
- a *dictionary* that gives useful information (for debugging purposes only).

Let's try to perform an action to our *CartPole* problem!

In [None]:
state = env1.reset()
env1.render()
print(env1.step(1))
new_state, reward, done, truncated, info = env1.step(1)
print(new_state) ## --> here the new state
print(reward)## -> we have just been rewarded of 1
print(done)## -> is the game over?
print(truncated)## -> did you reach the maximum number of steps, i.e., you win
print(info)## -> nothing to say

(array([ 0.00705511,  0.15714493,  0.00417313, -0.31796804], dtype=float32), 1.0, False, False, {})
[ 0.01019801  0.3522072  -0.00218624 -0.60933197]
1.0
False
False
{}


1   HIToolbox                           0x00007ff823cb752b _ZN15MenuBarInstance21IsAutoShowHideAllowedEv + 259
2   HIToolbox                           0x00007ff823bb033e _ZN15MenuBarInstance24UpdateAutoShowVisibilityE5Pointh + 34
3   HIToolbox                           0x00007ff823b1f7a4 _ZN15MenuBarInstance16ForEachMenuBarDoEU13block_pointerFvPS_E + 46
4   HIToolbox                           0x00007ff823bb093d _ZN15MenuBarInstance20AutoShowHideObserverEjP14OpaqueEventRefPv + 165
5   HIToolbox                           0x00007ff823b16bd2 _NotifyEventLoopObservers + 153
6   HIToolbox                           0x00007ff823b48fb8 PostEventToQueueInternal + 700
7   HIToolbox                           0x00007ff823b4a871 _ZL29CreateAndPostEventWithCGEventP9__CGEventjhP17__CFMachPortBoost + 404
8   HIToolbox                           0x00007ff823b56ee9 _ZL15Convert1CGEventh + 246
9   HIToolbox                           0x00007ff823b56d91 _ZL16MainLoopObserverjP14OpaqueEventRefPv + 41
10  HITo

Which type of action did we just perform with the step(1) call?

In [22]:
## Give your answer here.

### Having fun

Let's focus on the [FrozenLake environment](https://gymnasium.farama.org/environments/toy_text/frozen_lake). Please carefully read its description. Try to play a game using random moves.

In [1]:
# %load solutions/FrozenLake.py