Course syllabus

Course-PM

DAT400 / DIT431 DAT400 / DIT431 High-performance parallel programming lp1 HT24 (7.5 hp)

Course is offered by the department of Computer Science and Engineering

Course meetings

All lectures, problem sessions, workshops and labs will be held on campus. Check timeedit for the rooms.

Contact details

  • Miquel Pericàs <miquelp@chalmers.se> (Examiner + Lecturer). 
  • Hari Abram <hariv@chalmers.se> (Teaching Assistant)
  • Jing Chen <chjing@chalmers.se> (Teaching Assistant)
  • Kyriakos Gavras <gavras@chalmers.se> (TA/Amanuens)

Course purpose

This course looks at parallel programming models, efficient programming methodologies and performance tools with the objective of developing highly efficient parallel programs.

Student representatives contact details

Workshop links

Instructions for connecting remotely to lab machines: https://chalmers.topdesk.net/tas/public/ssp/content/detail/knowledgeitem?unid=304967f9ad004d3293b986a976e39833

Schedule

Check TimeEdit for rooms. The scheduled is shown below. 

Note: In construction / Subject to change. The first lecture will be on Tuesday, Sept 3rd, at 13:15h in room EC

 

#
Time
Session
Topic
Book Correspondence
Responsible
1
Sep 3rd, 13:15h-16h
Lecture #1
Introduction + Basic concepts of Parallelism 
1.1, 1.2, 1.3
Miquel
2
Lecture #2 (online)
Intro to Parallel Computer Architecture
2.x 
3
Sep 5th, 13:15-16h
Lecture #3
Parallel Programming Models (part 1)
3.1, 3.2, 3.3
Miquel
4
Sep 6st & Sep 10th, 8h-11:45h
Lab  #1
Intro to tools and environment
 
Kyriakos, Hari
5
Sep 10th, 13:15h-16h
Lecture #4
Parallel Programming Models (part 2)
3.4, 3.5, 3.6
Miquel
6
Sep 12th, 13:15h-16h
Lectures #5 and #6
Loops and scheduling + Performance Analysis and Roofline model
4.2, 4.6
Miquel
7
Sep 13th & Sep 17th, 8h-11:45h
Lab #2 
Program parallelization
Kyriakos, Hari
8
Sep 17th, 13:15h-15h
Exercise Session
Parallel Programming Models
 
Hari
9
Sep 19th, 13:15h-15h
Exercise Session
Performance Analysis and Roofline Model
 
Hari
10
Sep 20th & Sep 24th, 8h-11:45h
Lab #3 
Performance Analysis / Roofline Model CPUs
 

Kyriakos, Hari

11
Sept 24th, 13:15h-16h
Workshop #1
Message Passing Interface (session 1)
 
Miquel
12
Sept 26st, 13:15h-16h 
Workshop #1
Message Passing Interface (session 2)
 
Miquel
13
Sept 27th & Oct 1st, 8h-11:45h
Lab #4
Message Passing Interface  
 

Kyriakos, Hari

14
Oct 1st, 13:15h-16h
Workshop #2
OpenMP (session 1)
 
Hari
15
Oct 3rd, 13:15h-16h
Workshop #2
OpenMP (session 2)
 
Hari
16
Oct 4th & Oct 8th, 8h-11:45h
Lab #5
OpenMP
 

Kyriakos, Hari

17
Oct 8th, 13:15h-16h
Workshop #3
CUDA (session 1)
 
Miquel
18
Oct 10th, 13:15h-16h
Workshop #3
CUDA (session 2) 
 
Miquel
19
Oct 15th & Oct 18th, 8h-11:45h
Lab #6
CUDA
 

Kyriakos, Hari

20 Oct 15th 13:15h-14h Workshop #3 CUDA (session 3) 
 
Miquel
21 Oct 15th 14:15h-15h Wrap-up MPI + OpenMP + CUDA
 
Miquel
22 Oct 17th 13:15h-15h
 
23
Oct 22nd & Oct 25th, 8h-11:45h
Lab extra sessions
Extra sessions to support labs
 

Kyriakos, Hari

24
Oct 22nd, 13:15h-15h
Exam preparation 1
 
Miquel
25
Oct 24th, 13:15h-15h
Exam preparation 2
 
Miquel
26
Oct 31st
Written Exam
 
 
Miquel

 

Course literature

The theory part (part #1) of the course loosely follows the following book: "Parallel Programming for Multicore and Cluster Systems", Thomas Rauber and Gudula Rünger (3rd edition, 2023). This book can be accessed through the Chalmers Library: link to the coursebook (accessible via Chalmers library).

The practical part (part #2) which covers various programming models and libraries is based on several online resources that will be published at a later point 

Course design

The course consists of a set of lectures and laboratory sessions. The lectures start with an overview of parallel computer architectures and parallel programming models and paradigms. An important part of the discussion is mechanisms for synchronization and data exchange. Next, code transformations and performance analysis of parallel programs is covered. The course proceeds with a discussion of tools and techniques for developing parallel programs in shared address spaces. This section is based on two workshops that cover the OpenMP programming model. Next, the course discusses the development of parallel programs for distributed address space. Here, two workshops cover the Message Passing Interface (MPI). Finally, we discuss how to program GPU accelerators. This part consists of two workshops that describe the CUDA (Compute Unified Device Architecture) programming environment.

The lectures are complemented with a set of laboratory sessions in which participants explore the topics introduced in the lectures. During the lab sessions, participants parallelize sample programs over a variety of parallel architectures and use performance analysis tools to detect and remove bottlenecks in the parallel implementations of these programs. The lab sessions are done in teams of two. At the end of each session a joint report has to be submitted. There is no strict deadline for the report, but we strongly recommend to submit it before the beginning of the next lab session. 

Throughout the course, several assignments are proposed that provide bonus points. These assignments consist in reading papers and submitting solutions for proposed exercises. They are not mandatory, but they provide bonus points that are added to the score of the written exam given that the exam has reached a minimum score (this will be discussed in the first lecture). 

Changes made since the last occasion

No major changes are planned for this edition of the course.

One guest lecture focusing on HPC with Python will likely be dropped due to unavailability of the lecturer. 

Learning objectives and syllabus

Learning objectives:

Knowledge and Understanding

  • List the different types of parallel computer architectures, programming models and paradigms, as well as different schemes for synchronization and communication.
  • List the typical steps to parallelize a sequential algorithm
  • List different methods for analysis methodologies of parallel program systems

Competence and skills

  • Apply performance analysis methodologies to determine the bottlenecks in the execution of a parallel program
  • Predict the upper limit to the performance of a parallel program

Judgment and approach

  • Given a particular software, specify what performance bottlenecks are limiting the efficiency of parallel code and select appropriate strategies to overcome these bottlenecks
  • Design resource-aware parallelization strategies based on a specific algorithms structure and computing system organization
  • Argue which performance analysis methods are important given a specific context

Link to the syllabus on Studieportalen.

Study plan

Assessment

The exam (4.5c)

The final exam is in written form and accounts for 4.5 credits. Bonus points can contribute to increase the score of the final exam. Bonus points are only added if your exam score reaches 20pts (out of 60)! The pass score in the final exam is 24/60 pt.

The labs (3.0c)

Successful completion of the labs accounts for 3.0 credits. 

The final grade is the same grade as the exam. To be given a pass grade on the whole course, both components need to have a pass. 

Course summary:

Date Details Due