This course describes techniques for processing very large data sets that are typically stored across multiple machines in a cluster. It’s primarily a programming course, although some topics in cluster administration and configuration are also discussed. Technologies covered include Hadoop (MapReduce), Apache Spark, Apache Kafka, and other specialized technologies as time allows (e.g. Pig). Fluency with Java is required; experience with Scala is helpful but not essential.
-
School
School of Engineering and Computing
-
Number
5250
-
Subject
Computer (CIS)
-
Semester
As Required
-
Lecture/Lab/Seminar Hours
3 hours of lecture per week
-
Prerequisites
-
Credits
3