Course syllabus

News and updates

2021-11-29: Please reach out on Slack if you need input for studying. The TAs (Wenli and Priya) can help you with input, and I'll try to check Slack regularly also :)
2021-11-29:
Course eval meeting @ 10 AM here.
2021-10-26:
They key for the written exam we had yesterday has been posted in the files area under Previous exams.
2021-10-19: 
On November 10 at 13.30 you are welcome to Richard’s office (4th floor in the Jupiter building at Campus Lindholmen) to complain about the grading of the written exam. Before you come you must send Richard an email clearly pointing out where you think the error is, what you wrote, and why you believe the grading was not correct. If I don’t receive such an email before 13.30 on November 10, then I will not meet with you.

2021-09-23: Instructions for what to study for the exam.
2021-08-26: A Slack channel has been created for the course. This was very popular last year so please join! https://join.slack.com/t/slack-qqw2657/shared_invite/zt-uturqjhi-yryXmP_KI7_kscfkeo2jfQ

Contact details

Course purpose

Software development organizations need to constantly improve to become faster, better, and more efficient. This course aims to learn scientific approaches, i.e., research methods, and statistics to analyze the data we get. The analysis can then become the basis for decision support in initiatives to improve performances in software development organizations. The course prepares students for the master thesis project.

Schedule

Study period 1 starts Aug. 30 this year. The last lecture will take place on Oct. 20 and the exam will be in the week starting Oct. 25. This means that we have eight (8!) weeks, which will be packed. The course is at 50% speed, and we, thus, expect you to spend 20h/week on the course.

All regular lectures will be on Zoom. Richard will have an open-door session on campus (Wednesday afternoons) and Zoom (Mondays 08.30-10.00) every week. Here's the Zoom link for all online activities: 

https://chalmers.zoom.us/j/3956079957?pwd=Vyt1ZjhzMy9XMDlLbEoyU04wUmtNQT09

Use this password if needed: BDA

A Google Doc is available where one beforehand, and anonymously, can write questions. Take the opportunity to come and ask questions about the course, get help, listen in, just say hi, ask questions about Sweden and studying here, ask questions about your thesis, or virtually anything!

Course overview
Week Chapter(s) Notes Exercises Lecture notes Videos Papers1 Presentations2
1 1–3 A high-level introduction to the concepts we use in the course. E1 LN1 V1, V2 The ABC of SE research

How to build a case (Sept. 1 10:15-12:00)
dat246-L1.pdf (Sept. 3 10:15-12:00)

2 4–6 Math notation of model specifications. E2 LN2 V3, V4, V5, V6 Ethical issues in empirical software engineering Research ethics (Sept. 6 10:15-12:00)
dat246-L2.pdf (Sept. 8 10:15-12:00)
3 7 We ground our assumptions on information theory and the concept of maximum entropy. E3

LN3.1
LN3

V7, V8

 

Validity threats (Sept. 13 10:15-12:00)
dat246-L3.pdf (Sept. 15 10:15-12:00)

4 8–9 Interactions (we won't emphasize this in the course). However, understanding MCMC is important! E4 LN4 V9, V10
Guidelines for conducting and reporting case study research

Case study research (Sept. 20 10:15-12:00)
dat246-L4.pdf (Sept. 22 10:15-12:00)

5 10–11 GLMs is what we eat for breakfast :) We must understand the maxent principle! Binomial, Poisson, and Multinomial models  E5 LN5 V11, V12 The common patterns of nature, A crash course in good and bad controls

Evidence-based software engineering and systematic reviews (Sept. 27 10:15-12:00)
dat246-L5.pdf (Sept. 29 10:15-12:00)

6 12 Over-dispersed and zero-inflated outcomes, and ordered categorical outcomes and predictors (e.g. Likert scale values) E6 LN6 V13, V14 Survey research in software engineering

Survey research (Oct. 4 10:15-12:00)
dat246-L6.pdf (Oct. 6 10:15-12:00)

7 13 Multilevel models :) E7 LN7 V15, V16 Guest lecture - Action Research (Oct. 11 10:15-12:00)
dat246-L7.pdf (Oct. 13 10:15-12:00)
8 14 Modeling covariance. Continuous varying intercepts, e.g., Gaussian Processes! Chapters 15–16 we will not cover in the course E8 LN8 V17, V18

Applying Bayesian analysis guidelines to empirical software engineering data
with replication package

Guest lecture - Design Science (Oct. 18 10:15-12:00)
dat246-L8.pdf (Oct. 20 10:15-12:00)

  1. These papers are compulsory reading.
    here. In my slides, i.e., dat246-L*.pdf, I've taken the most important parts of his lectures and added some things.
  2. The complete presentations from McElreath's lectures can be downloaded

Course literature

In this course, we will use one book: R. McElreath. Statistical Rethinking: A Bayesian Course with Examples in R and STAN. 2nd edition. ISBN: 9780367139919.
Course boo

Before you ask, no, the 1st edition is not ok to use since a lot of things have been added to the 2nd edition, which we'll introduce in this course. Be careful so you order the correct edition!

The book uses R and Stan through the rethinking package to specify and sample statistical models. The first exercise (E1) provides instructions on how to set things up in Windows, OS X, and Linux.

Additionally, a number of papers will be used which you'll be able to find in the File area. We will clearly indicate if a paper is not compulsory reading (if we don't say anything, assume that it is compulsory reading!)

Course design

Each week we will cover >1 chapter in the book. Since the book covers some introductory concepts first, we will cover a lot of chapters early on.

Before we meet we expect you to have read the chapter(s) and, in addition, gone through the videos connected to each chapter. The lectures that we will have will focus on three things:

  • Covering the most important things from each chapter.
  • Going through practical hands-on examples.
  • Presenting research methods commonly used in empirical software engineering (i.e., things you will use for your master thesis).

I cannot stress enough how important it is that you a) read the chapter(s), and b) go through the videos connected to each chapter, before each lecture.

I also expect you to go to Canvas (i.e., this course home page) every day and check out if things have been added or changed. If there are any changes to the lectures (e.g., if one would be canceled for some reason) then this will be notified on Canvas!

Changes made since the last occasion

For 2021 we have moved elements from DAT321 Software Quality to this course (i.e., mainly Bayesian data analysis) and removed frequentist statistics completely. The course was also moved to Study Period 1.

Given that the Bayesian analysis elements were given an average course evaluation of 5 (from 1-5, where 5 is the highest) we believe that this course will be well received by the students :)

Learning objectives and syllabus

  • Knowledge and understanding:
    • Describe, understand, and apply empiricism in software engineering
    • Describe, understand, and partly apply the principles of case study research/experiments/surveys.
    • Describe and understand the underlying principles of meta-analytical studies.
    • Explain the importance of research ethics. 
    • Recognize and define codes of ethics when conducting research in software engineering.  
    • State and explain the importance of threats to validity and how to control said threats.
    • Describe and explain the concepts of probability space (incl. conditional probability), random variable, expected value, and random processes, and know a number of concrete examples of the concepts.
    • Describe Markov chain Monte Carlo methods such as Metropolis.
    • Describe and explain Hamiltonian Monte Carlo. 
    • Explain and describe multicollinearity, post-treatment bias, collider bias, and confounding
    • Describe and explain ways to avoid overfitting
  •  Skills and abilities:
    • Assess the suitability of and apply methods of analysis on data
    • Analyze descriptive statistics and decide on appropriate analysis methods.
    • Use and interpret code of ethics for software engineering research.
    • Design statistical models mathematically and implement said models in a programming language.
    • Make use of random processes, i.e., Bernoulli, Binomial, Gaussian, and Poisson distributions, with over-dispersed outcomes. 
    • Make use of ordered categorical outcomes (ordered-logit) and predictors
    • Assess the suitability of, from an ontological (natural process) and epistemological (maxent) perspective, various statistical distributions
    • Make use of and assess directed acyclic graphs to argue causality
  • Judgment and approach:
    • State and discuss the tools used for data analysis and, in particular, judge their output.
    • Judge the appropriateness of particular empirical methods and their applicability to attack various and disparate software engineering problems.
    • Question and assess common ethical issues in software engineering research. 
    • Assess diagnostics from Hamiltonian Monte Carlo and quadratic approximation using information theoretical concepts, i.e., information entropy, WAIC, and PSIS-LOO.
    • Judge posterior probability distributions for out-of-sample predictions and conduct posterior predictive checks.

Study plan (Chalmers)
Study plan (GU)

Examination form

Deadlines can be moved depending on the COVID situation.

If you copy any text at all you must reference it appropriately - if in doubt ask. Plagiarism is something we do not look positively upon and each year students are suspended because of it!

This course is examined in two components. First, a written exam at the end of the course. Second, an individual assignment during the course. If you pass the written exam you are given 5 credits. If you pass the individual assignment you are given 2.5 credits.

Chalmers and GU students are given the grades fail, 3, 4, or 5. In order to pass the course, you will need to pass both the assignment and the exam, but we set the final course grade only according to the grade you got on the written exam.

For the written exam all learning outcomes, as listed above, can be tested. For the individual assignment, there is mainly a focus on Bayesian data analysis (i.e., skills and abilities, and judgment and approach).

Additionally, we recommend students go through exercises as found in the file area. Even though conducting these exercises will not give any extra credits or bonus points that can be added to the written exam or assignment, you will be in a much better place once you've done the exercises.

The VCs of our two universities have decided that a written exam is necessary on campus. This means that each student will need to be physically on campus for the written exam.

The time and date for the examinations are:

Written exam deadlines:

  • Oct. 25–31
  • Jan. 3–5
  • Aug. 16–28, 2022

Assignment deadlines:

  • Oct. 15 @ 16.00
  • Nov. 12 @ 16.00
  • Dec. 17 @ 16.00

Course summary:

Date Details Due