Course syllabus

News and updates

Contact details

Examiner: prof. Richard Torkar, torkarr@chalmers.se
Teaching assistants (2024):
- Theocharis Tavantzis, gustavth@student.gu.se
- Xiaoran Zhang, xiaoran@student.chalmers.se

Slack channel for discussions
Join here (send me an email if the link has expired).

Student reps

A first mid-term course evaluation will take place Sept 25 10.10-10.30 (immediately after a morning lecture) and a final meeting on Nov 27 13.00-13.45 (over Zoom).

Course purpose

This course aims to learn scientific approaches, i.e., research methods, and statistics to analyze data we collect. The analysis can then become the basis for decision support in initiatives to improve performances in software development organizations. The course prepares students for the master thesis project.

Schedule

The exam will likely be on Oct. 23 (you must always register to an exam some weeks before!) The course is at 50% speed, and we, thus, expect you to spend 20h/week on the course. You will need those hours, and the first two weeks will have a high workload.

All lectures and labs will be on campus.

I strongly recommend you to try out Exercise 1 as soon as possible since it's about installing all the stuff you will need in this course!

Course overview (E are exercises, LN are lecture notes)
Week	Chapter(s)	Notes	E	LN	Videos	Papers¹	Presentations
1	1—3	A high-level introduction to the concepts we use in the course.	E1	LN1	V1, V2	The ABC of SE research	How to build a case (Aug. 30 08:15-12:00) dat246-L1.pdf (Aug. 30 10:15-12:00) Lab (Aug. 30 13.00-15.00)
2	4—6	Math notation of model specifications.	E2	LN2	V3, V4, V5, V6		Research ethics (Sept. 4 08:15-10:00) dat246-L2.pdf (Sept. 6 10:15-12:00) Lab (Sept. 6 13.15-15.00)
3	7	We ground our assumptions on information theory and the concept of maximum entropy.	E3	LN3.1 LN3	V7		Validity threats (Sept. 11 08:15-10:00) dat246-L3.pdf (Sept. 13 10:15-12:00) Lab (Sept. 13 13.15-15.00)
4	8—9	Interactions, Ch. 8, we won't emphasize in the course. However, understanding MCMC is important!	E4	LN4	V8	Guidelines for conducting and reporting case study research	Case study research (Sept. 18 08:15-10:00) dat246-L4.pdf (Sept. 20 10:15-12:00) Lab (Sept. 20 13.15-15.00)
5	10—11	GLMs is what we eat for breakfast :) We must understand the maxent principle! Binomial, Poisson, and Multinomial models.	E5	LN5	V9, V10	A crash course in good and bad controls	Evidence-based software engineering and systematic reviews (Sept. 25 08:15-10:00) dat246-L5.pdf (Sept. 27 10:15-12:00) Lab (Sept. 27 13.15-15.00)
6	12	Over-dispersed and zero-inflated outcomes, and ordered categorical outcomes and predictors (e.g. Likert scale values).	E6	LN6	V11	Survey research in software engineering	Survey research (Oct. 2 08:15-10:00) dat246-L6.pdf (Oct. 4 10:15-12:00) Lab (Oct. 4 13.15-15.00)
7	13	Multilevel models!	E7	LN7	V12, V13		Guest lecture - Action Research (Oct. 9 08:15-10:00) dat246-L7.pdf (Oct. 11 10:15-12:00) Lab (Oct. 11 13.15-15.00)
8	14	Modeling covariance. Continuous varying intercepts, e.g., Gaussian Processes! Chapters 15–16 we will not cover in the course.	E8	LN8	V14, V15	Applying Bayesian analysis guidelines to empirical software engineering data with replication package	Guest lecture - Design Science Research (Oct. 16 08:15-10:00) dat246-L8.pdf (Oct. 18 10:15-12:00) Lab (Oct. 18 13.15-15.00)

These papers are compulsory reading.

Course literature

In this course, we will use one book: R. McElreath. Statistical Rethinking: A Bayesian Course with Examples in R and STAN. 2nd edition. ISBN: 9780367139919. We will study Chapters 1-7, 9-14. Chapter 8 we will not emphasize much, while chapters 15-16 we will not cover in this course.
Course boo

The book uses R and Stan through the rethinking package to specify and sample statistical models. The first exercise (E1) provides instructions on how to set things up in Windows, OS X, and Linux.

Additionally, a number of papers will be used which you'll be able to find in the File area. We will clearly indicate if a paper is not compulsory reading (if we don't say anything, assume that it is compulsory reading).

Course design

Each week we will cover 1-3 chapters in the book.

Before we meet we expect you to have read the chapters for that week and, in addition, gone through the videos connected to the chapters. The lectures that we will have will focus on three things:

Covering the most important things from each chapter.
Going through practical hands-on examples.
Presenting research methods commonly used in empirical software engineering (i.e., things you will use for your master thesis).

I cannot stress enough how important it is that you a) read the chapter(s), and b) go through the videos connected to each chapter, before each lecture.

I also expect you to go to Canvas (i.e., this course home page) every day and check if things have been added or changed. If there are any changes to the lectures (e.g., if one would be canceled for some reason) then this will be notified above under News and updates.

Learning objectives and syllabus

Knowledge and understanding:

Describe, understand, and apply empiricism in software engineering
Describe, understand, and partly apply the principles of case study research/experiments/surveys.
Describe and understand the underlying principles of meta-analytical studies.
Explain the importance of research ethics.
Recognize and define codes of ethics when conducting research in software engineering.
State and explain the importance of threats to validity and how to control said threats.
Describe and explain the concepts of probability space (incl. conditional probability), random variable, expected value, and random processes, and know a number of concrete examples of the concepts.
Describe Markov chain Monte Carlo methods such as Metropolis.
Describe and explain Hamiltonian Monte Carlo.
Explain and describe multicollinearity, post-treatment bias, collider bias, and confounding
Describe and explain ways to avoid overfitting

Skills and abilities:

Assess the suitability of and apply methods of analysis on data
Analyze descriptive statistics and decide on appropriate analysis methods.
Use and interpret code of ethics for software engineering research.
Design statistical models mathematically and implement said models in a programming language.
Make use of random processes, i.e., Bernoulli, Binomial, Gaussian, and Poisson distributions, with over-dispersed outcomes.
Make use of ordered categorical outcomes (ordered-logit) and predictors
Assess the suitability of, from an ontological (natural process) and epistemological (maxent) perspective, various statistical distributions
Make use of and assess directed acyclic graphs to argue causality

Judgment and approach:

State and discuss the tools used for data analysis and, in particular, judge their output.
Judge the appropriateness of particular empirical methods and their applicability to attack various and disparate software engineering problems.
Question and assess common ethical issues in software engineering research.
Assess diagnostics from Hamiltonian Monte Carlo and quadratic approximation using information theoretical concepts, i.e., information entropy, WAIC, and PSIS-LOO.
Judge posterior probability distributions for out-of-sample predictions and conduct posterior predictive checks.

Study plan (Chalmers)
Study plan (GU)

Examination form

If you copy any text at all you must reference it appropriately - if in doubt ask. Plagiarism is something we do not look positively upon and each year students are suspended because of it.

This course is examined in two components. First, a written exam at the end of the course. Second, an individual assignment during the course. If you pass the written exam you are given 5 credits. If you pass the individual assignment you are given 2.5 credits.

Students are given the grades fail, 3, 4, or 5 in the course. In order to pass the course, you will need to pass both the assignment and the exam, but we set the final course grade only according to the grade you got on the written exam.

For the written exam all learning outcomes, as listed above, can be tested. For the individual assignment, there is mainly a focus on Bayesian data analysis (i.e., skills and abilities, and judgment and approach).

Additionally, we recommend students go through exercises as found in the file area. Even though conducting these exercises will not give any extra credits or bonus points that can be added to the written exam or assignment, you will be in a much better place once you've done the exercises.

Written exam deadlines (note these are preliminary!):

Oct. 23 PM
Jan. 5 AM
Aug. 27 PM

Assignment deadlines: