Course syllabus

DAT470 / DIT065 Computational techniques for large-scale data (lp4 VT23, 7.5 hp)  

Course is offered by the department of Computer Science and Engineering

Contact details

Course purpose

The advent of big-data has led to the development of new programming paradigms, in particular for parallel systems allowing the computation with big data on redundant clusters of commodity computers. This course provides an introduction to different programming paradigms, e.g. MapReduce and extensions, which facilitate computations with Terabytes of data. It also demonstrates that for specific tasks algorithms and data structures can provide highly efficient alternatives. 

Learning outcomes

After completion of the course the student should be able to:

Learning objectives

  • discuss important technological aspects when designing and implementing analysis solutions for large-scale data,
  • explain differences between parallel programming models
  • describe data structures and algorithms for big data and discuss their utility

Skills and abilities

  • implement applications for transforming and analyzing large-scale data with different parallel software frameworks,
  • use algorithms and datastructures for computations with large-scale data

Judgement ability and approach

  • suggest appropriate computational infrastructures and methodological approaches for analysis tasks and discuss their advantages and drawbacks,
  • discuss advantages and drawbacks of different strategies of parallelization,
  • decide between algorithmic and parallelization-based approaches for accelerating computational workloads

Course content

The aim of this course is to deepen the students’ knowledge and skills and familiarize them with the technical and technological side of data science, including software respectively hardware environments. The course will introduce aspects of designing and implementing large-scale data science solutions.

In particular, the course will include:

  • an overview of computer architectures, algorithmic approaches, and high- performance computing infrastructures with a focus on limitations for processing large-scale data,
  • an introduction to relevant frameworks for cluster computing with large-scale data,
  • implementation of data analysis tools on a cluster using Python and appropriate software frameworks,
  • data structures and algorithms, such as index structures, which can greatly accelerate computations with large-scale data

Schedule

TimeEdit

Course literature

This course will use original articles, white papers, book chapters, software manuals and tutorials. They will be posted along with the lectures throughout the period.

Course design

  • Credits: The total number of higher education credits (HEC) for the course is 7.5. The course has two sub-courses
    • Written examination (Skriftlig tentamen), 3.0 higher education credits 
Grading scale: Pass with distinction (5), Pass with credit (4), Pass (3), Fail
    • Written assignments (Skriftliga inlämningsuppgifter), 4.5 higher education credits Grading scale: Pass (G) and Fail (U)
    • The course grade is determined by the written exam, assuming the student received a pass in the assignments.
    • There will be non-obligatory individual assignments which grant bonus points for the written exam. These bonus points are valid for the whole academic year.
  • Assessment: The course is examined by an individual written exam carried out in an examination hall, as well as mandatory written assignments submitted as written reports that will be carried out individually and others in groups of normally 2–3 students.  The written assignment part of the course is considered passed if the group has gathered at least 70% of the maximum points awarded for the assignments.

    If a student who has failed the same examined component twice wishes to change examiner before the next examination, a written application shall be sent to the department responsible for the course and shall be granted unless there are special reasons to the contrary (Chapter 6, Section 22 of Higher Education Ordinance).

    In cases where a course has been discontinued or has undergone major changes, the student shall normally be guaranteed at least three examination occasions (including the ordinary examination) during a period of at least one year from the last time the course was given.
  • Course Evaluation: The course is evaluated through meetings both during and after the course between teachers and student representatives. Further, an anonymous questionnaire is used to ensure written information. The outcome of the evaluations serves to improve the course by indicating which parts could be added, improved, changed or removed.

The course in Studieportalen: https://www.student.chalmers.se/sp/course?course_id=33089

The syllabus at GU: https://kursplaner.gu.se/pdf/kurs/en/DIT065

Course summary:

Date Details Due