# Assignment 3: Policy evaluation

- Due 15 Oct 2021 by 23:59
- Points 20
- Submitting a file upload
- File types pdf

### Instructions

All the problems in this assignment should be solved and handed in **individually**. You should be prepared to answer questions about your solutions yourself. The full set of solutions should be submitted as a single PDF document in Canvas. Feel free to use any software of your choosing (or pen and paper) for preparing illustrations and drawings.

### Problem 1

Consider the SCM below on the variables

- , where (integer valued)
- , where
- , where (integer valued)

What is the* **value, *in terms of , of the policy for distributing financial , defined below?

### Problem 2

Consider the causal graph representing a Markov decision process (MDP) below.

Now, assume that you could access samples from the distribution defined by the policy with

Consider evaluating a new policy with action probabilities under *the same transition and reward probabilities *as above (i.e., same conditional distributions for states and rewards).

Recall that is defined as the expected sum of rewards under , that is .

A) Identify (derive) a statistical estimand of the value that uses importance weighting (or inverse-propensity weighting), derived as expectation over the distribution

B) Propose a finite-sample estimator of your estimand which makes use of samples from .

### Problem 3

In the sessions on off-policy evaluation, we argued that a difficulty with off-policy evaluation of sequential decision-making policies was to find enough samples that follow the proposed policy in data. We expand on this argument in the technical report Evaluating Reinforcement Learning Algorithms in Observational Health Settings. Read chapters 1–5 (at least) of this paper and briefly summarize the main findings of chapter 5 (~1/2 page)