Course syllabus
Course-PM
DAT565 / DIT407 DAT565 / DIT407 Introduction to data science and AI lp4 VT25 (7.5 hp)
Course is offered by the department of Computer Science and Engineering
Contact details
Examiner: Associate Professor (Docent) Moa Johansson <moa.johansson@chalmers.se>
Discussions about Assignments: all TAs and Oana Geman <geman@chalmers.se>
Lecturers:
- Associate Professor (Docent) Moa Johansson <moa.johansson@chalmers.se> (Weeks 1, 2)
- Technical lecturer Dr. Oana Geman <geman@chalmers.se> (Weeks 3, 4)
- Postdoc Dr. Mohammad Kakooei <kakooei@chalmers.se> (Week 5)
- Research Assistant Dr. Stefano Sarao Mannelli <s.saraomannelli@chalmers.se> (Weeks 6 - 8)
Other course staff (TA)
Postdoc Dr. Philip John Harrison <johnphi@chalmers.se>
Doctoral students:
- Filip Kronström <filipkro@chalmers.se>
- Pablo Martinez Crespo <pabloma@chalmers.se>
- Matis Rost <mathisr@chalmers.se>
- Hang Zou <hangzo@chalmers.se>
- Firooz Shahriari Mehr <firooz@chalmers.se> (Weeks 7, 8)
Student TAs:
• John Klint <gusjohn25@student.gu.se>
• Georg Kyhn <kyhngeorg@gmail.com>
• Madhumitha Venkatesan <madven@chalmers.se>
Course representatives:
• Thomas Mathiassen <thomath@student.chalmers.se>
• Ella Arvidsson <ella.arvidsson@outlook.com>
• Ella Backlund <ella@vindiseglen.com>
• Filip Hansson <hansson.filip02@gmail.com>
• Joel Wolf-Watz <joelwolfwatz017@gmail.com>
Intended learning outcomes
- describe fundamental types of problems and main approaches in data science and AI
- give examples of data science and AI applications from different contexts
- give examples of how stochastic models and machine learning (ML) are applied in data science and AI
- explain basic concepts in classical AI, and the relationship between logical and data driven, ML-based approaches within AI.
- briefly explain the historical development of AI, what is possible today and discuss possible future development.
- use appropriate programming libraries and techniques to implement basic transformations, visualizations and analyses of example data
- identify appropriate types of analysis problems for some concrete data science applications
- implement some types of stochastic models and apply them in data science and AI applications
- implement and/or use AI-tools for search, planning and problem solving
- apply simple machine learning methods implemented in a standard library
- justify which type of statistical method is applicable for the most common types of experiments in data science applications
- discuss advantages and drawbacks of different types of approaches and models within data science and AI.
- reflect on inherent limitations of data science methods and how the misuse of statistical techniques can lead to dubious conclusions
- critically analyze and discuss data science and AI applications with respect to ethics, privacy and societal impact
- show a reflective attitude in all learning
Course content
During the course, a wide selection of methods for Data Science and AI will be introduced. The course is divided into three parts:
Introduction to data science
- Implementation of data science solutions, using Python, basic data analysis and visualization.
- Introduction of the data science process, and appropriate methodology.
- Examples of core data science methods with case studies such as in clustering, classification and regression.
- Data science put in context regarding ethics, regulations and limitations.
Statistical methods for data science and AI
- Introduction of some common stochastic models with examples of applications in data science and AI (for instance, naive Bayes classifiers, topic models for text and Hidden Markov Models for sequence data).
Artificial Intelligence
- Introduction to classical AI and machine learning, including the relationship to related areas such as algorithms and optimization, and AI philosophy.
- Examples of methods and applications of AI, in classical AI (search and constraint satisfaction), and ML-based (search engines, naive Bayes and neural networks).
- Discussion of ethics and societal impact of AI.
Schedule
Lectures are to be considered mandatory, although no attendance will be taken. Lab sessions consist of independent work on the assignments, and there will be course staff available for help, but are not strictly mandatory. Lab sessions are not to be considered as a replacement for lectures: you cannot expect the TAs to give you a repeated summary of lecture contents. The TAs are there to specifically help with the assignments, not provide the assumed background knowledge from lectures and readings.
The course is organized as follows:
- Lectures - two/week at varying locations and times (check TimeEdit!) Slides are available after the lecture.
- Self-directed labs two/three times a week at varying locations and times (check TimeEdit!)
TimeEditLinks to an external site.
Lecture schedule
Week 1: Course Arrangements & Introduction and Python for data scientists
- Course Arrangements & Introduction (Reading: Skiena Chapter 1)
- Python for data scientists
Week 2: Data munging, scores & ranks and Visualization
- Data munging, scores & ranks (Reading: Skiena Chapter 3.1–3.3, Chapter 4.2–4.4)
- Visualization (Reading: Skiena Chapter 6)
Week 3: Statistical analysis and Mathematical models
- Statistical analysis (Reading: Skiena Chapters 2.1–2.2, Chapter 5, Chapter 11.1)
- Mathematical models (Reading: Skiena Chapter 7)
Week 4: Linear and logistic regression
- Linear regression (Reading: Skiena Chapter 9)
- Logistic regression (Reading: Skiena Chapter 9)
Week 5: Distance and network methods
- Distance (Reading: Skiena Chapter 11)
- Network methods
Week 6: Machine learning and Neural networks
- Machine learning (Reading: Skiena Chapter 10)
- Neural networks (Reading: Skiena Chapter 10)
Week 7: Orientation in modern AI. Classical Ai and history of AI
- Orientation in modern AI
- Classical Ai and history of AI
Week 8: Ethics of data science & AI. Big data
- Ethics of data science & AI
- Big data stance (Reading: Skiena Chapter 12)
Computer Lab Schedule
The lab schedule corresponds to the assignment for that week's lectures. Note that due to holidays, sometimes one weeks lab session is the following week. The lab sessions are non-obligatory, and you can go there to ask questions about the programming assignments to the teaching assistant present.
Discussions and questions: https://chalmers.instructure.com/courses/34129/discussion_topics
Assignment schedule
Assignment number Quiz |
Topic |
Deadline |
Resubmission deadline |
1 |
Weather |
10/04 23:59 |
22/04 23:59 |
2 |
Apartments |
14/04 23:59 |
28/04 23:59 |
3 |
Diabetes |
28/04 23:59 |
9/05 23:59 |
Q1 |
Quiz 1 |
28/04 23:59 |
30/05 23:59 18/08 23:59 |
4 |
Seeds |
13/05 23:59 |
27/05 23:59 |
5 |
Fashion MNIST |
19/05 23:59 |
03/06 23:59 |
6 |
RAG with Ollama |
26/05 23:59 |
10/06/23:59 |
Q2 |
Quiz 2 |
26/05 23:59 |
5/06 23:59 21/08 23:59 |
Course literature
Steven S. Skiena: "The Data Science Design Manual". 2017. Springer. Available through Chalmers network and library at https://link.springer.com/book/10.1007/978-3-319-55444-0
Course design
The course has eight mandatory assignments (2 quizzes: Quiz 1 will cover material from weeks 1-4 and Quiz 2 will cover material from weeks 5-8, and 6 assignments). There is always two weeks to work on one assignment, and up to two assignments will run on parallel (e.g., first assignment for weeks 1 and 2, with deadline at the end of week 2, second assignment for weeks 2 and 3, and so on).
Assignment deadlines are hard and no extensions are given. You will receive feedback for the assignments one week after the deadline, based on the version that was submitted by the deadline.
Most assignments are returned as Jupyter notebooks on CodeGrade. Some parts of the assignment will be automatically graded, which means you will get immediate feedback for those parts. Assignments may also be multiple-choice quizzes on Canvas.
Assignments are done in pairs and you need to select your group within CodeGrade.
In order to pass the course, you must pass all assignments. An assignment is considered a pass if you have obtained at least 70% of the maximum score.
There is mandatory attendance at lectures. Lab sessions consist of independent work on the assignments, and there will be course staff available for help.
There is reading attached for each lecture. You are supposed to read that before the lecture. The lectures are structured under the assumption that you have read the text in the course textbook, and the content of the textbook will not be repeated. Instead, the lectures complement the textbook and offer further examples, proofs, details etc.
There will be a Discussions (Canvas) that can be used to contact the lecturer and course staff outside lecture and lab hours, and ask for help or give suggestions.
Changes made since the last occasion
CodeGrade has been introduced and is being piloted for a large-scale course. The system automatically grades technical aspects of the reports that combine text, images, and code, in the form of a Jupyter Notebook.
Examination form
The course is graded PASS/FAIL (G/U).
There is no particular exam. Instead, the course is considered PASS when one has passed all eight mandatory assignments. Assignments are done in groups of two students.
Quizzes are done individually (see https://chalmers.instructure.com/courses/34129/discussion_topics/202931).
Policy on generative AI
The use of text-generating tools for generating assignment reports is forbidden. You are supposed to write your reports yourself.
Furthermore, it is discouraged that you use tools such as ChatGPT for "searching for content". Generative AI is very prone to hallucinating and producing convincing, yet completely wrong answers. Do not rely on such tools. You cannot know whether the answer is correct or not, unless you already know the answer. We will discuss this problem towards the end of the course.
Here's what ChatGPT has to say about this:
Prompt: Explain why students should not use ChatGPT to complete their mandatory programming exercises in an introductory data science class.
Students should not use ChatGPT to complete their mandatory programming exercises in an introductory data science class for several reasons:
1. Hinders Learning and Skill Development
-
- Hands-On Practice: Programming and data science require hands-on practice to build problem-solving skills, understand concepts, and gain confidence. Relying on ChatGPT deprives students of the opportunity to engage actively with the material.
- Conceptual Gaps: Without doing the exercises themselves, students miss the chance to deeply understand the foundational concepts and techniques essential for future success.
2. Academic Integrity
-
- Ethical Considerations: Submitting work generated by ChatGPT as one’s own violates principles of academic honesty. It’s akin to plagiarism, which undermines the educational process and devalues the credentials earned.
- Consequences of Misconduct: Most institutions have strict policies regarding academic dishonesty, and violations can lead to penalties such as failing grades or disciplinary actions.
3. Preparation for Real-World Challenges
-
- Building Problem-Solving Skills: The purpose of exercises is to prepare students for real-world data science tasks, which require logical reasoning and the ability to debug code. These skills can only be developed through practice.
- Adapting to Complex Scenarios: Real-world problems often require iterative thinking and creativity, which cannot be outsourced. Students who rely on AI tools will struggle when faced with such challenges.
4. Missed Learning Objectives
-
- Core Competencies: Introductory courses are designed to teach students how to think algorithmically, write clean code, and analyze data critically. These goals cannot be achieved if students bypass the process using AI.
- Foundational Knowledge: Skills like debugging, error handling, and understanding algorithms are foundational for advanced topics. Skipping these basics creates knowledge gaps that are hard to address later.
5. Dependency on AI Tools
-
- Over-Reliance: Relying on ChatGPT discourages independent thinking and problem-solving, which are critical for success in any technical field.
- Limitations of AI: ChatGPT can sometimes provide incorrect or incomplete solutions, and students who depend on it may fail to recognize errors or learn how to fix them.
6. Learning How to Learn
-
- Iterative Learning Process: Struggling with coding challenges helps students learn how to research, ask the right questions, and find solutions—a critical skill in any career.
- Building Confidence: Completing exercises independently builds confidence and demonstrates that students can overcome obstacles on their own.
While ChatGPT can be a helpful tool for clarifying concepts, providing guidance, or debugging code, it should not replace the learning process. Instead, students should use it as a supplementary resource to enhance their understanding rather than as a shortcut to completing assignments.
Plagiarism policy
You are not allowed to copy pieces of code from students of other groups. You may discuss the problems, but you may not share code.
You may not publish your solutions. Do not put your code into a public GitHub repository, for example.
If you use materials you find in the Internet (e.g., Wikipedia, Stack Overflow, Reddit discussions), you must attribute the source. Finding matching code snippets without proper attribution means you are presenting others’ work as your own, and is considered plagiarism.
Cases where plagiarism is spotted will be deferred to the disciplinary committee of the university and may lead to suspension.
Tools
Python libraries
The following Python libraries are going to be used on this course:
If you want to run these tools on your own computer, it is strongly recommended that you acquire an environment through Anaconda. Anaconda provides support for multiple environments and package management, and can be run on the most popular operating systems (Microsoft Windows, Apple MacOS, GNU/Linux). Anaconda installation also enables easy use of Jupyter Notebooks.
JupyterHub
We have a shared JupyterHub that allows you to work on Jupyter notebooks on the department servers. This is very convenient especially for assignments 6 and 7 which require the installation of some packages, as the packages are already available on the server!
You can find out more about the cluster and how to apply for access here: https://git.chalmers.se/karppa/minervaLinks to an external site.
Once you've been given access, you can login with your CID here: http://minerva.cse.chalmers.se/jupyterLinks to an external site.
Note. The server is only accessible from within Chalmers network. Accessing it from outside Chalmers requires you to use the VPN.
Course summary:
Date | Details | Due |
---|---|---|