Course syllabus

Contact details

News and updates

Slack channel for discussions
Join here: https://join.slack.com/t/empiricalsoft-rp23132/shared_invite/zt-1hey7nzmb-fZN5br9Py6RydVu25zLgPA

Student reps

Course purpose

This course aims to learn scientific approaches, i.e., research methods, and statistics to analyze data we collect. The analysis can then become the basis for decision support in initiatives to improve performances in software development organizations. The course prepares students for the master thesis project.

Schedule

The exam will likely be in the week starting Oct. 24 (you always must register to an exam some weeks before!) The course is at 50% speed, and we, thus, expect you to spend 20h/week on the course. You will definitely need those hours, and the first two weeks will have a high workload.

All lectures and labs will be on campus.

Course overview
Week Chapter(s) Notes E LN Videos Papers1 Presentations
1 1-3 A high-level introduction to the concepts we use in the course. E1 LN1 V1, V2 The ABC of SE research

How to build a case (Sept. 1 08:15-10:00)
Lab (Sept. 1 10.15-12.00)
dat246-L1.pdf (Sept. 2 10:15-12:00)

2 4-6 Math notation of model specifications. E2 LN2 V3, V4, V5, V6

Research ethics (Sept. 5 08:15-10:00)
dat246-L2.pdf (Sept. 7 08:15-10:00)
Lab (Sept. 7 10.15-12.00)

3 7 We ground our assumptions on information theory and the concept of maximum entropy. E3

LN3.1
LN3

V7, V8

 

Validity threats (Sept. 12 10:15-10:00)
dat246-L3.pdf (Sept. 14 10:15-12:00)
Lab (Sept. 14 13.15-15.00)

4 8-9 Interactions (we won't emphasize this in the course). However, understanding MCMC is important! E4 LN4 V9, V10
Guidelines for conducting and reporting case study research

Case study research (Sept. 19 08:15-10:00)
dat246-L4.pdf (Sept. 21 10:15-12:00)
Lab (Sept. 21 13.15-15.00)

5 10-11 GLMs is what we eat for breakfast :) We must understand the maxent principle! Binomial, Poisson, and Multinomial models  E5 LN5 V11, V12 A crash course in good and bad controls

Evidence-based software engineering and systematic reviews (Sept. 26 10:15-12:00)
dat246-L5.pdf (Sept. 30 08:15-10:00)
Lab (Sept. 30 10.15-12.00)

6 12 Over-dispersed and zero-inflated outcomes, and ordered categorical outcomes and predictors (e.g. Likert scale values) E6 LN6 V13, V14 Survey research in software engineering

Survey research (Oct. 3 10:15-12:00)
dat246-L6.pdf (Oct. 5 13:15-15:00)
Lab (Oct. 5 15.15-17.00)

7 13 Multilevel models :) E7 LN7 V15, V16 Guest lecture - Design Science (Oct. 10 10:15-12:00)
dat246-L7.pdf (Oct. 12 10:15-12:00)
Lab (Oct. 12 13.15-15.00)
8 14 Modeling covariance. Continuous varying intercepts, e.g., Gaussian Processes! Chapters 15–16 we will not cover in the course E8 LN8 V17, V18

Applying Bayesian analysis guidelines to empirical software engineering data
with replication package

Guest lecture - Action Research (Oct. 17 10:15-12:00)
dat246-L8.pdf (Oct. 19 10:15-12:00)
Lab (Oct. 19 13.15-15.00)

  1. These papers are compulsory reading.

Course literature

In this course, we will use one book: R. McElreath. Statistical Rethinking: A Bayesian Course with Examples in R and STAN. 2nd edition. ISBN: 9780367139919.
Course boo

The book uses R and Stan through the rethinking package to specify and sample statistical models. The first exercise (E1) provides instructions on how to set things up in Windows, OS X, and Linux.

Additionally, a number of papers will be used which you'll be able to find in the File area. We will clearly indicate if a paper is not compulsory reading (if we don't say anything, assume that it is compulsory reading).

Course design

Each week we will cover 1-3 chapters in the book. Since the book covers some introductory concepts first, we will cover a lot of chapters early on.

Before we meet we expect you to have read the chapter(s) for that week and, in addition, gone through the videos connected to each chapter. The lectures that we will have will focus on three things:

  • Covering the most important things from each chapter.
  • Going through practical hands-on examples.
  • Presenting research methods commonly used in empirical software engineering (i.e., things you will use for your master thesis).

I cannot stress enough how important it is that you a) read the chapter(s), and b) go through the videos connected to each chapter, before each lecture.

I also expect you to go to Canvas (i.e., this course home page) every day and check if things have been added or changed. If there are any changes to the lectures (e.g., if one would be canceled for some reason) then this will be notified above under News and updates.

Learning objectives and syllabus

  • Knowledge and understanding:
    • Describe, understand, and apply empiricism in software engineering
    • Describe, understand, and partly apply the principles of case study research/experiments/surveys.
    • Describe and understand the underlying principles of meta-analytical studies.
    • Explain the importance of research ethics. 
    • Recognize and define codes of ethics when conducting research in software engineering.  
    • State and explain the importance of threats to validity and how to control said threats.
    • Describe and explain the concepts of probability space (incl. conditional probability), random variable, expected value, and random processes, and know a number of concrete examples of the concepts.
    • Describe Markov chain Monte Carlo methods such as Metropolis.
    • Describe and explain Hamiltonian Monte Carlo. 
    • Explain and describe multicollinearity, post-treatment bias, collider bias, and confounding
    • Describe and explain ways to avoid overfitting
  •  Skills and abilities:
    • Assess the suitability of and apply methods of analysis on data
    • Analyze descriptive statistics and decide on appropriate analysis methods.
    • Use and interpret code of ethics for software engineering research.
    • Design statistical models mathematically and implement said models in a programming language.
    • Make use of random processes, i.e., Bernoulli, Binomial, Gaussian, and Poisson distributions, with over-dispersed outcomes. 
    • Make use of ordered categorical outcomes (ordered-logit) and predictors
    • Assess the suitability of, from an ontological (natural process) and epistemological (maxent) perspective, various statistical distributions
    • Make use of and assess directed acyclic graphs to argue causality
  • Judgment and approach:
    • State and discuss the tools used for data analysis and, in particular, judge their output.
    • Judge the appropriateness of particular empirical methods and their applicability to attack various and disparate software engineering problems.
    • Question and assess common ethical issues in software engineering research. 
    • Assess diagnostics from Hamiltonian Monte Carlo and quadratic approximation using information theoretical concepts, i.e., information entropy, WAIC, and PSIS-LOO.
    • Judge posterior probability distributions for out-of-sample predictions and conduct posterior predictive checks.

Study plan (Chalmers)
Study plan (GU)

Examination form

If you copy any text at all you must reference it appropriately - if in doubt ask. Plagiarism is something we do not look positively upon and each year students are suspended because of it.

This course is examined in two components. First, a written exam at the end of the course. Second, an individual assignment during the course. If you pass the written exam you are given 5 credits. If you pass the individual assignment you are given 2.5 credits.

Students are given the grades fail, 3, 4, or 5 in the course. In order to pass the course, you will need to pass both the assignment and the exam, but we set the final course grade only according to the grade you got on the written exam.

For the written exam all learning outcomes, as listed above, can be tested. For the individual assignment, there is mainly a focus on Bayesian data analysis (i.e., skills and abilities, and judgment and approach).

Additionally, we recommend students go through exercises as found in the file area. Even though conducting these exercises will not give any extra credits or bonus points that can be added to the written exam or assignment, you will be in a much better place once you've done the exercises.

The time and date for the examinations are,

Written exam deadlines:

  • Oct. 22–29
  • First week of January
  • Late August, 2023

Assignment deadlines:

  • Oct. 7 @ 16.00
  • Nov. 4 @ 16.00
  • Dec. 9 @ 16.00

Course summary:

Date Details Due