This course describes techniques for processing very large data sets that are typically stored across multiple machines in a cluster. It’s primarily a programming course, although some topics in cluster administration and configuration are also discussed. Technologies covered include Hadoop (MapReduce), Apache Spark, Apache Kafka, and other specialized technologies as time allows (e.g. Pig). Fluency with Java is required; experience with Scala is helpful but not essential.