Saarland University Saarland University | Department of Computer Science

Big Data Analytics Group

Database Systems WS 08/09

This core lecture covers fundamental data managing techniques which are used to build (not only) database systems. These techniques are also commonly used to build any other kind of information systems including search engines, file systems, data warehouses, publish/subscribe systems, streaming systems, map services (google maps et.al.), etc. Therefore a better name for this lecture might be "Foundations of Data Management".

Topics will include:

  • storage media (tape, disk, flash, main memory)
  • storage management (DB-file systems, raw devices, write-strategies, differential files, buffer management)
  • data layout (horizontal and vertical partitioning, columns, hybrid mappings, compression, free memory management, defragmentation)
  • indexing (one- and multidimensional, tree-structured, hash-, partition-based, bulk-loading and external sorting, differential indexing, LSM and stepped merge trees, read- and write-optimized indexing, data warehouse indexing, text indexing, main-memory indexes, sparse and dense, direct and indirect, clustered and unclustered, main memory versus disk and/or flash-based)
  • operator models (push- and pull-model, block-based)
  • operator implementations (join algorithms for relational, spatial, and multidimensional data, grouping and early aggregation, filtering)
  • query processing (scanning, plan computation, SIMD)
  • query optimization (query rewrite, cost models, cost-based optimization, join order, plan enumeration)
  • data recovery (single versus multiple instance, logging, ARIES)
  • parallelization of data and queries (horizontal and vertical partitioning, replication, distributed query processing, 2PC, map-reduce, PNUTS, pig and pig latin)
  • read-optimized system concepts (search engines, data warehouses, OLAP, ad-hoc analytics)
  • write-optimized system concepts (OLTP, HStore, streaming data, moving objects)
  • management of geographical data (GIS, google maps, computer games and simulations)
  • management of high-dimensional data (OLAP, multi-dimensional indexing trade-offs)

Administrative issues:

  • Students are expected to have successfully passed the Information System introductory lecture held every summer semester covering introductory relational database management concepts.
  • time and place: Tuesdays and Thursdays from 12:15 to 14:00 in E1 3, HS II
  • Exams: there will be three exams: one mid-term exam, one end term exam, and a repetition exam. You need to pass two exams. In case you do not pass the mid-term or end-term exam, you will have to participate in a repetition exam. Your grade will be computed based on your two best individual exam scores which determine 70% of your overall grade for this course.
  • Exercises: you need to participate in the exercise groups regularly in order to pass the lecture. The material and exercises covered in the exercise sessions will be indispensable to successfully pass the exams. In the exercise sessions you will be grouped into small teams with other students and solve small data managing task: project description. You may use C, C++, or Java for coding. Your code should be able to execute on a linux server. Your individual contribution to these teams will be evaluated twice during the semester and will be graded. You need to pass both of these evaluations to pass the lecture. The grade of these evaluations will influence your overall grade by 30%.

Lecture Notes (some material accessible from Uni and MPI networks only)

  • Week 1: 2 pages 6 pages
  • Week 2: 2 pages 6 pages
  • Week 3: 2 pages 6 pages (updated: bug on slide 74)
  • Week 3b: 1/1 page (note: added on Nov 10)
  • Week 4: 1/1 page (note: updated and extended on Nov 10, formerly named "Week 4b")
  • Week 5a: 1/1 page (updated and extended on Nov 20, updated on Nov 25: bug on slides 64 and 65)
  • Week 6b: 1/1 page (Thanks to Stephen Blott for providing his VLDB 2008 slides on the VA-file)
  • Week 7: 1/1 page (update on 04/12 8pm, bug on slide 96; updated on 09/12, bug on slide 38)
  • Week 9: 1/1 page
  • Week 10: 1/1 page (updated on 06/01: some extensions, bug fixes on slides 42 and 44; updated on 11/01: slide 44)
  • Week 10b: 1/1 page (minor update on 29/01/09)
  • Week 11: 1/1 page
  • Week 12a: 1/1 page
  • Week 12b: 1/1 page
  • Week 13a: 1/1 page
  • Week 13b: 1/1 page (05/02/09, more information on Information Systems Lab in SoSe 09 available)
  • Week 14a: 1/1 page
  • Week 14b: 1/1 page
  • Week 15a: 1/1 page
  • Feb 12, 12.15pm: final exam

Exercises (material accessible from Uni and MPI networks only)

  1. Assignments

  2. Sample Solutions
    • sample solutions for the assignments in pdf will be provided one week before each exam. why "sample"?: for many assignments there may not only be a single right solution but multiple ones.

  3. Project
    • (16/03/09) Rules for handing-in your project: send us (Jens Dittrich and the tutor who is running the final oral project presentation session) a bundle (=zip-file) with the following information until March 31: (1) entire source-code (Including source provided on the contest web-site or our Java translation), (2) if you are using C or C++: build-scripts either make or ant (3) source code and compiled versions of any (non-standard) libraries you are using, (4) INSTALL.txt explaining how to install run your code (4) two-page README.pdf explaining what you did, which threads you start and how you partition the workload to different indexes.
    • Note that you need to pass the unittest and the speedtest. You should use at least the version as of today, which is for Java this version and for C/C++ this version. For the oral presentation you will soon be contacted by your tutor or Jens Dittrich to make an appointment. Please make sure that all of your team members can attend the presentation. People not attending will not receive a grade for the project.
    • (10/03/09) Update on Java translation of speed test due to update on original C-version available.
    • (04/03/09) The deadline for handing in your project is postponed to March 31.
    • (24/02/09) Small bug fix on Java translation of speed test (=benchmark) available.
    • (24/02/09) A test server to measure the performance of your code has been made available
    • (18/02/09) Java translation of speed test (=benchmark) available.
    • (05/02/09) The benchmark was made available. We are working on a Java translation.
    • (03/02/09) The second milestone will be merged with the final hand-in of the project. In other words: you do not have to (and should not) hand-in anything today. You should hand-in your project until March 15 latest (postponed, see above!). We will also do a second oral examiniation session for each team. Final hand-in and second oral examination session will determine the majority of your grade for the project (recall: project counts 30% of the overall grade)
    • (29/01/09) Improved documentation for Java interfaces and test case available
    • (28/01/09) Java Interfacs and Test case available. Note: we used a 1:1 translation from C as we want to be able to easily mirror changes that may be done to the original interfaces and tests (We know that it could be done much nicer in Java). If you are planning to submit your project in Java, please use these interfaces. Make sure that your test executes for your index implementation. For the final hand-in (March 15) make sure that you make sensible use of multi-threading. Locking all indexes to execute only one transaction in the system at a time is not an option.
    • (27/01/09) Udated Java source code from Swift team available.
    • (26/01/09) Official information on availability of the workload: "hopefully within the [current] week"
    • (20/01/09) Java source code from Swift team available.
    • (16/01/09) Specification of programming task was updated.
    • (16/12/08) Detailed specification on project available.
    • (02/12/08) There are two days for the source code evaluation: Wednesday Dec 10, 12:00 14:00 and Friday Dec 12, 12:00 16:00. Please choose one of the offered days and inform your tutor during the tutorial this week. If you are not attending a tutorial this week, please send an email to Yassen Assenov (yassen@mpi...). The evaluation is in the form of a discussion and takes 20 minutes on average. You will receive the exact time slot and location (seminar room number) by email. For the source code evaluation, all team members must be present. Teams are strongly encouraged to bring a laptop with their project set up. In case you cannot bring a laptop, please contact Yassen Assenov (yassen@mpi...) at least four days before the evaluation appointment.
    • (01/12/08) Some comments on the first milestone: Thanks to everybody for submitting your code (more or less) on time. In case you did not finish everything, you could not implement all operations, your code is not particularly fast, or you had trouble organizing your team: don't panic! This milestone will count at most 7-8% of the overall grade. Recall, that the entire project counts 30% of your overall grade. The major part of this 30% will be determined by the quality and performance of the project version you hand-in at the end of the semester. If the final hand-in is really great, it may even compensate for missed points from previous milestones...
    • Initial Project Description

  4. Other Material and Information
    • (23/04/09) Database Systems Course: if you passed the lecture, you may fetch your certificate ("Schein") from Angelika Scholl-Danopoulos during office hours.
    • (22/04/09) Final grades available.
    • (17/04/09) results of repetition exam available
    • (03/03/09) You may inspect your exam on March 12, from 12.30 to 13.30 at E1 1, room 221.
    • (24/02/09) Repetition Exam on Thursday, April 16, 12:15pm, HS 2. If you need/want to participate in the repetition exam, please register for the exam by sending an email to my assistant angelika.scholl-danopoulos "at" cs.
    • (24/02/09) results of final exam available
    • (27/01/09) Final Exam covers material from second half of the lecture, i.e., everything starting from and including Week 9.
    • (20/01/09) Final Exam on Thursday, Feb 12, 12.15-14:00
    • (05/01/09) You may inspect the corrected version of your exam on Wed, Jan 14, starting at 2:00 pm. If you need to inspect your exam, please enroll during classes (Do not send email).
    • (17/12/08) results of mid term exam
    • link to graphviz visualization software
    • Distribution to exercise groups.
    • Rooms: group Wed 12-14: room 22 at MPII, groups Thu 10-12 and Fri 12-14: room 015 at UdS