DAT465
Assignment 3: Policy evaluation
Skip to content
Dashboard
  • Login
  • Dashboard
  • Calendar
  • Inbox
  • History
  • Help
Close
  • My dashboard
  • DAT465
  • Assignments
  • Assignment 3: Policy evaluation
lp1 HT21
  • Home
  • Syllabus
  • Modules
  • Assignments
  • Pages
  • Zoom
  • CanLa

Assignment 3: Policy evaluation

  • Due 15 Oct 2021 by 23:59
  • Points 20
  • Submitting a file upload
  • File types pdf

Instructions

All the problems in this assignment should be solved and handed in individually. You should be prepared to answer questions about your solutions yourself. The full set of solutions should be submitted as a single PDF document in Canvas. Feel free to use any software of your choosing (or pen and paper) for preparing illustrations and drawings.

 

Problem 1

Consider the SCM below on the variables 

  • LaTeX: Age=U_A, where LaTeX: U_A\sim Uniform\left(\left\{18,\:...,\:66\right\}\right) (integer valued)
  • LaTeX: Employed = U_E, where LaTeX: U_E \sim Bernoulli(0.8)
  • LaTeX: Salary = Employed*[(Age-18)*1000 + 15000 + U_S], where LaTeX: U_S \sim U(\{-5000, ..., 10000\}) (integer valued)
  • LaTeX: Support = 0
  • LaTeX: Income = Salary + Support

What is the value, in terms of LaTeX: Income, of the policy LaTeX: \pi for distributing financial LaTeX: Support, defined below?

LaTeX: \pi = \left\{ \begin{array}{ll} Support = 5000, & \mathrm{if\;} Age <25  \\ Support = 10000, & \mathrm{if\;} 25 \leq Age <35  \;\mathrm{and}\; unemployed \\ Support = 2000, & \mathrm{if\;} 25 \leq Age <35  \;\mathrm{and}\; employed \\  Support = 0, & \mathrm{if\;} Age \geq 35 \end{array} \right\}

 

Problem 2

Consider the causal graph representing a Markov decision process (MDP) below.

Graph.png

Now, assume that you could access samples from the distribution LaTeX: p_{\mu}defined by the policy LaTeX: \mu with 

LaTeX: p_{\mu}\left(S_1,\:A_1,\:R_1,\:S_2,\:A_2,\:R_2\right)=p\left(S_1\right)p_{\mu}\left(A_1\mid S_1\right)p\left(R_1\mid S_1,\:A_1\right)p\left(S_2\mid S_1,\:A_1,\:R_1\right)p_{\mu}\left(A_2\mid S_2\right)p\left(R_2\mid S_2,\:A_2\right)

Consider evaluating a new policy LaTeX: \pi with action probabilities LaTeX: p_{\pi}\left(A_t\mid S_t\right) under the same transition and reward probabilities as above (i.e., same conditional distributions for states and rewards). 

Recall that LaTeX: V\left(\pi\right) is defined as the expected sum of rewards under LaTeX: p_{\pi}\left(S_1,\:A_1,\:R_1,\:S_2,\:A_2,\:R_2\right), that is LaTeX: \mathbb{E}_\pi[R_1 +R_2].

 

A) Identify (derive) a statistical estimand of the value LaTeX: V\left(\pi\right) that uses importance weighting (or inverse-propensity weighting), derived as expectation over the distribution LaTeX: p_\mu

B) Propose a finite-sample estimator of your estimand which makes use of samples from LaTeX: p_\mu . 

 

Problem 3

In the sessions on off-policy evaluation, we argued that a difficulty with off-policy evaluation of sequential decision-making policies was to find enough samples that follow the proposed policy in data. We expand on this argument in the technical report Evaluating Reinforcement Learning Algorithms in Observational Health Settings. Read chapters 1–5 (at least) of this paper and briefly summarize the main findings of chapter 5 (~1/2 page)

 

 

1634335199 10/15/2021 11:59pm
Additional comments:
Rating max score to > Pts

Rubric

 
 
 
 
 
 
 
     
Can't change a rubric once you've started using it.  
Find a rubric
Find rubric
Title
You've already rated students with this rubric. Any major changes could affect their assessment results.
Title
Criteria Ratings Pts
Edit criterion description Delete criterion row
This criterion is linked to a learning outcome Description of criterion
threshold: 5 pts
Edit rating Delete rating
5 to >0 Pts
Full marks
blank
Edit rating Delete rating
0 to >0 Pts
No marks
blank_2
This area will be used by the assessor to leave comments related to this criterion.
pts
  / 5 pts
--
Additional comments
Total points: 5 out of 5