Course syllabus
News and updates
Contact details
 Examiner: prof. Richard Torkar, torkarr@chalmers.se
 Teaching assistant:
Slack channel for discussions
Join here (send me an email if the link has expired).
Student reps
A first midterm course evaluation will take place Sept 25 10.1010.30 (immediately after a morning lecture) and a final meeting on Nov 27 13.0013.45 (over Zoom).
Course purpose
This course aims to learn scientific approaches, i.e., research methods, and statistics to analyze data we collect. The analysis can then become the basis for decision support in initiatives to improve performances in software development organizations. The course prepares students for the master thesis project.
Schedule
The exam will likely be on Oct. 23 (you must always register to an exam some weeks before!) The course is at 50% speed, and we, thus, expect you to spend 20h/week on the course. You will need those hours, and the first two weeks will have a high workload.
All lectures and labs will be on campus.
I strongly recommend you to try out Exercise 1 as soon as possible since it's about installing all the stuff you will need in this course!
Week  Chapter(s)  Notes  E  LN  Videos  Papers^{1}  Presentations 

1  1—3  A highlevel introduction to the concepts we use in the course.  E1  LN1  V1, V2  The ABC of SE research 
How to build a case (Aug. 30 08:1512:00) 
2  4—6  Math notation of model specifications.  E2  LN2  V3, V4, V5, V6 
Research ethics (Sept. 4 08:1510:00) 

3  7  We ground our assumptions on information theory and the concept of maximum entropy.  E3  V7 

Validity threats (Sept. 11 08:1510:00) 

4  8—9  Interactions, Ch. 8, we won't emphasize in the course. However, understanding MCMC is important!  E4  LN4  V8  Guidelines for conducting and reporting case study research 
Case study research (Sept. 18 08:1510:00) 
5  10—11  GLMs is what we eat for breakfast :) We must understand the maxent principle! Binomial, Poisson, and Multinomial models.  E5  LN5  V9, V10  A crash course in good and bad controls 
Evidencebased software engineering and systematic reviews (Sept. 25 08:1510:00) 
6  12  Overdispersed and zeroinflated outcomes, and ordered categorical outcomes and predictors (e.g. Likert scale values).  E6  LN6  V11  Survey research in software engineering 
Survey research (Oct. 2 08:1510:00) 
7  13  Multilevel models!  E7  LN7  V12, V13  Guest lecture  Action Research (Oct. 9 08:1510:00) dat246L7.pdf (Oct. 11 10:1512:00) Lab (Oct. 11 13.1515.00) 

8  14  Modeling covariance. Continuous varying intercepts, e.g., Gaussian Processes! Chapters 15–16 we will not cover in the course.  E8  LN8  V14, V15 
Applying Bayesian analysis guidelines to empirical software engineering data 
Guest lecture  Design Science Research (Oct. 16 08:1510:00) 
 These papers are compulsory reading.
Course literature
In this course, we will use one book: R. McElreath. Statistical Rethinking: A Bayesian Course with Examples in R and STAN. 2nd edition. ISBN: 9780367139919. We will study Chapters 17, 914. Chapter 8 we will not emphasize much, while chapters 1516 we will not cover in this course.
The book uses R and Stan through the rethinking package to specify and sample statistical models. The first exercise (E1) provides instructions on how to set things up in Windows, OS X, and Linux.
Additionally, a number of papers will be used which you'll be able to find in the File area. We will clearly indicate if a paper is not compulsory reading (if we don't say anything, assume that it is compulsory reading).
Course design
Each week we will cover 13 chapters in the book.
Before we meet we expect you to have read the chapters for that week and, in addition, gone through the videos connected to the chapters. The lectures that we will have will focus on three things:
 Covering the most important things from each chapter.
 Going through practical handson examples.
 Presenting research methods commonly used in empirical software engineering (i.e., things you will use for your master thesis).
I cannot stress enough how important it is that you a) read the chapter(s), and b) go through the videos connected to each chapter, before each lecture.
I also expect you to go to Canvas (i.e., this course home page) every day and check if things have been added or changed. If there are any changes to the lectures (e.g., if one would be canceled for some reason) then this will be notified above under News and updates.
Learning objectives and syllabus
 Knowledge and understanding:
 Describe, understand, and apply empiricism in software engineering
 Describe, understand, and partly apply the principles of case study research/experiments/surveys.
 Describe and understand the underlying principles of metaanalytical studies.
 Explain the importance of research ethics.
 Recognize and define codes of ethics when conducting research in software engineering.
 State and explain the importance of threats to validity and how to control said threats.
 Describe and explain the concepts of probability space (incl. conditional probability), random variable, expected value, and random processes, and know a number of concrete examples of the concepts.
 Describe Markov chain Monte Carlo methods such as Metropolis.
 Describe and explain Hamiltonian Monte Carlo.
 Explain and describe multicollinearity, posttreatment bias, collider bias, and confounding
 Describe and explain ways to avoid overfitting
 Skills and abilities:
 Assess the suitability of and apply methods of analysis on data
 Analyze descriptive statistics and decide on appropriate analysis methods.
 Use and interpret code of ethics for software engineering research.
 Design statistical models mathematically and implement said models in a programming language.
 Make use of random processes, i.e., Bernoulli, Binomial, Gaussian, and Poisson distributions, with overdispersed outcomes.
 Make use of ordered categorical outcomes (orderedlogit) and predictors
 Assess the suitability of, from an ontological (natural process) and epistemological (maxent) perspective, various statistical distributions
 Make use of and assess directed acyclic graphs to argue causality
 Judgment and approach:
 State and discuss the tools used for data analysis and, in particular, judge their output.
 Judge the appropriateness of particular empirical methods and their applicability to attack various and disparate software engineering problems.
 Question and assess common ethical issues in software engineering research.
 Assess diagnostics from Hamiltonian Monte Carlo and quadratic approximation using information theoretical concepts, i.e., information entropy, WAIC, and PSISLOO.
 Judge posterior probability distributions for outofsample predictions and conduct posterior predictive checks.
Study plan (Chalmers)
Study plan (GU)
Examination form
If you copy any text at all you must reference it appropriately  if in doubt ask. Plagiarism is something we do not look positively upon and each year students are suspended because of it.
This course is examined in two components. First, a written exam at the end of the course. Second, an individual assignment during the course. If you pass the written exam you are given 5 credits. If you pass the individual assignment you are given 2.5 credits.
Students are given the grades fail, 3, 4, or 5 in the course. In order to pass the course, you will need to pass both the assignment and the exam, but we set the final course grade only according to the grade you got on the written exam.
For the written exam all learning outcomes, as listed above, can be tested. For the individual assignment, there is mainly a focus on Bayesian data analysis (i.e., skills and abilities, and judgment and approach).
Additionally, we recommend students go through exercises as found in the file area. Even though conducting these exercises will not give any extra credits or bonus points that can be added to the written exam or assignment, you will be in a much better place once you've done the exercises.
Written exam deadlines (note these are preliminary!):
 Oct. 23 PM
 Jan. 5 AM
 Aug. 27 PM
Assignment deadlines: