Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Shuffling

groupByKey()
Shuffles all the keys across network to combine all the keys
reduceByKey(func: (V, V) => V): RDD[(K, V)J)
Conceptually, reduceByKey can be thought of as a combination of first doing groupByKey and then reducing on all the values grouped per key.
Reduces on the mapper side first
Reduce again after shuffling
Less data needs to be sent over the network
Non trivial gains in performance

Content Tools

Sources

There are currently no sources for this slide.