Data-intensive Systems and Algorithms | University of Stavanger

Facts

Course code

DAT535

Version

Credits (ECTS)

Semester tution start

Autumn

Number of semesters

Exam semester

Autumn

Language of instruction

English

Offered by

Faculty of Science and Technology,

Department of Electrical Engineering and Computer Science

Time table

View course schedule

Content

The emergence of Big Data and Data-intensive Systems as specialized fields in computing has been motivating development of new techniques and technologies needed to extract knowledge from large datasets. Since Hadoop was conceived in 2005, popular interest in data-intensive systems began to grow. It resulted - over time - in a collection of technologies, methodologies, and practices to cover the complete data lifecycle.

This course is a first step to a variety of roles related to data-intensive systems. The core tasks in these roles that we will address are: roles in a data team, low-level algorithm design and implementation (direct implementation of MapReduce jobs), high-level algorithm design and implementation (utilizing one of data processing frameworks e.g. SparkSQL, MLlib), dataflow design (data pipelines), algorithm optimisation, advocating technology application both in technical and non-technical setting, providing introductory training to coworkers.

Learning outcome

Knowledge

Characterize Hadoop architecture incl. job tracker, task tracker, scheduling issues, communications, and resource management, etc.
Characterize Spark/Databricks architecture incl. context, cluster manager, worker node, executor, etc.
Describe elements of Hadoop/Spark ecosystem and identify their applicability
Describe and compare RDBMS, NOSQL databases, data warehouse, unstructured big data, and keyed files, and show how to apply them to typical data processing problems

Skills

Assume various roles in a data team
Use and reconfigure a data processing setup (based on Hadoop/Spark/DataBricks, OpenStack, or other Cloud setup)
Analyze real-life problems and propose suitable solutions
Construct and optimize algorithms and dataflows based on relevant tools for typical problems

General qualifications:

Evaluate, communicate and defend a data-intensive solution w.r.t. relevant criteria

Required prerequisite knowledge

Python programming

Recommended prerequisites

DAT220 Database Systems, DAT320 Operating Systems and Systems Programming, DAT515 Cloud Computing Technologies

Bash programming

Administration of Cloud and container-based environments

Databases, SQL

Exam

Form of assessment	Weight	Duration	Marks	Aid
Project	1/1		Letter grades

Project is completed in groups. If a student fails the project, she/he has to take this next time the course is given.

Coursework requirements

Oral presentation, Mandatory Assignments

Three assignments

Students start with 3 mandatory assignments that contain programing and system administration. Assignments are to be completed individually. All mandatory assignments must be passed within deadline so that the student has the right to start with the project. The obligatory assignments give access to the project only in the current semester.

Completion of mandatory lab assignments is to be made at the times and in the groups that are assigned and published. Absence due to illness or for other reasons must be communicated as soon as possible to the laboratory personnel. One cannot expect that provisions for completion of the lab assignments at other times are made unless prior arrangements with the laboratory personnel have been agreed upon.

All group members must participate in the project presentation.

Course teacher(s)

Course coordinator:

Tomasz Wiktorski

Laboratory Engineer:

Jayachander Surbiryala

Head of Department:

Tom Ryen

Method of work

The work will consist of 6 hours of lecture, scheduled laboratory, supervised group work per week in the second half of the semester. Students are expected to spend additional 6-8 hours a week on self-study, group discussions, and development work (open laboratory).

Overlapping courses

Course	Reduction (SP)
Data-intensive Systems (DAT500_1)	5

Open for

Data Science - Master of Science Degree Programme Computer Science - Master of Science Degree Programme Computer Science - Master of Science Degree Programme, Part-Time

Exchange programme at Faculty of Science and Technology

Course assessment

There must be an early dialogue between the course supervisor, the student union representative and the students. The purpose is feedback from the students for changes and adjustments in the course for the current semester.In addition, a digital subject evaluation must be carried out at least every three years. Its purpose is to gather the students experiences with the course.

Literature

The syllabus can be found in Leganto

Data-intensive Systems and Algorithms (DAT535)