{"id":344,"date":"2020-12-01T12:07:32","date_gmt":"2020-12-01T04:07:32","guid":{"rendered":"https:\/\/blog.frost-s.tk\/?p=344"},"modified":"2021-01-11T00:07:14","modified_gmt":"2021-01-10T16:07:14","slug":"spark%e7%ae%97%e5%ad%90","status":"publish","type":"post","link":"https:\/\/blog.frost-s.com\/index.php\/2020\/12\/01\/spark%e7%ae%97%e5%ad%90\/","title":{"rendered":"spark\u7b97\u5b50"},"content":{"rendered":"\n<p style=\"font-weight:bold\" class=\"has-custom-weight\"><strong>Spark\u7684\u7b97\u5b50\u7684\u5206\u7c7b<\/strong><\/p>\n\n\n\n<p>\u4ece\u5927\u65b9\u5411\u6765\u8bf4\uff0cSpark \u7b97\u5b50\u5927\u81f4\u53ef\u4ee5\u5206\u4e3a\u4ee5\u4e0b\u4e24\u7c7b:<\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp;1\uff09Transformation \u53d8\u6362\/\u8f6c\u6362\u7b97\u5b50\uff1a\u8fd9\u79cd\u53d8\u6362\u5e76\u4e0d\u89e6\u53d1\u63d0\u4ea4\u4f5c\u4e1a\uff0c\u5b8c\u6210\u4f5c\u4e1a\u4e2d\u95f4\u8fc7\u7a0b\u5904\u7406\u3002<\/p>\n\n\n\n<p>Transformation \u64cd\u4f5c\u662f\u5ef6\u8fdf\u8ba1\u7b97\u7684\uff0c\u4e5f\u5c31\u662f\u8bf4\u4ece\u4e00\u4e2aRDD \u8f6c\u6362\u751f\u6210\u53e6\u4e00\u4e2a RDD \u7684\u8f6c\u6362\u64cd\u4f5c\u4e0d\u662f\u9a6c\u4e0a\u6267\u884c\uff0c\u9700\u8981\u7b49\u5230\u6709 Action \u64cd\u4f5c\u7684\u65f6\u5019\u624d\u4f1a\u771f\u6b63\u89e6\u53d1\u8fd0\u7b97\u3002<\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp;2\uff09Action \u884c\u52a8\u7b97\u5b50\uff1a\u8fd9\u7c7b\u7b97\u5b50\u4f1a\u89e6\u53d1 SparkContext \u63d0\u4ea4 Job \u4f5c\u4e1a\u3002<\/p>\n\n\n\n<p>Action \u7b97\u5b50\u4f1a\u89e6\u53d1 Spark \u63d0\u4ea4\u4f5c\u4e1a\uff08Job\uff09\uff0c\u5e76\u5c06\u6570\u636e\u8f93\u51fa Spark\u7cfb\u7edf\u3002&nbsp;<\/p>\n\n\n\n<p>\u4ece\u5c0f\u65b9\u5411\u6765\u8bf4\uff0cSpark \u7b97\u5b50\u5927\u81f4\u53ef\u4ee5\u5206\u4e3a\u4ee5\u4e0b\u4e09\u7c7b:<\/p>\n\n\n\n<p>1\uff09Value\u6570\u636e\u7c7b\u578b\u7684Transformation\u7b97\u5b50\uff0c\u8fd9\u79cd\u53d8\u6362\u5e76\u4e0d\u89e6\u53d1\u63d0\u4ea4\u4f5c\u4e1a\uff0c\u9488\u5bf9\u5904\u7406\u7684\u6570\u636e\u9879\u662fValue\u578b\u7684\u6570\u636e\u3002<\/p>\n\n\n\n<p>2\uff09Key-Value\u6570\u636e\u7c7b\u578b\u7684Transfromation\u7b97\u5b50\uff0c\u8fd9\u79cd\u53d8\u6362\u5e76\u4e0d\u89e6\u53d1\u63d0\u4ea4\u4f5c\u4e1a\uff0c\u9488\u5bf9\u5904\u7406\u7684\u6570\u636e\u9879\u662fKey-Value\u578b\u7684\u6570\u636e\u5bf9\u3002<\/p>\n\n\n\n<p>3\uff09Action\u7b97\u5b50\uff0c\u8fd9\u7c7b\u7b97\u5b50\u4f1a\u89e6\u53d1SparkContext\u63d0\u4ea4Job\u4f5c\u4e1a\u3002<\/p>\n\n\n\n<p>1\uff09Value\u6570\u636e\u7c7b\u578b\u7684Transformation\u7b97\u5b50<\/p>\n\n\n\n<p>\u4e00\u3001\u8f93\u5165\u5206\u533a\u4e0e\u8f93\u51fa\u5206\u533a\u4e00\u5bf9\u4e00\u578b<\/p>\n\n\n\n<p>1\u3001map\u7b97\u5b50<\/p>\n\n\n\n<p>2\u3001flatMap\u7b97\u5b50<\/p>\n\n\n\n<p>3\u3001mapPartitions\u7b97\u5b50<\/p>\n\n\n\n<p>4\u3001glom\u7b97\u5b50<\/p>\n\n\n\n<p>\u4e8c\u3001\u8f93\u5165\u5206\u533a\u4e0e\u8f93\u51fa\u5206\u533a\u591a\u5bf9\u4e00\u578b<\/p>\n\n\n\n<p>5\u3001union\u7b97\u5b50<\/p>\n\n\n\n<p>6\u3001cartesian\u7b97\u5b50<\/p>\n\n\n\n<p>\u4e09\u3001\u8f93\u5165\u5206\u533a\u4e0e\u8f93\u51fa\u5206\u533a\u591a\u5bf9\u591a\u578b<\/p>\n\n\n\n<p>7\u3001grouBy\u7b97\u5b50<\/p>\n\n\n\n<p>\u56db\u3001\u8f93\u51fa\u5206\u533a\u4e3a\u8f93\u5165\u5206\u533a\u5b50\u96c6\u578b<\/p>\n\n\n\n<p>8\u3001filter\u7b97\u5b50<\/p>\n\n\n\n<p>9\u3001distinct\u7b97\u5b50<\/p>\n\n\n\n<p>10\u3001subtract\u7b97\u5b50<\/p>\n\n\n\n<p>11\u3001sample\u7b97\u5b50<\/p>\n\n\n\n<p>12\u3001takeSample\u7b97\u5b50<\/p>\n\n\n\n<p>\u4e94\u3001Cache\u578b<\/p>\n\n\n\n<p>13\u3001cache\u7b97\u5b50<\/p>\n\n\n\n<p>14\u3001persist\u7b97\u5b50<\/p>\n\n\n\n<p>2\uff09Key-Value\u6570\u636e\u7c7b\u578b\u7684Transfromation\u7b97\u5b50<\/p>\n\n\n\n<p>\u4e00\u3001\u8f93\u5165\u5206\u533a\u4e0e\u8f93\u51fa\u5206\u533a\u4e00\u5bf9\u4e00<\/p>\n\n\n\n<p>15\u3001mapValues\u7b97\u5b50<\/p>\n\n\n\n<p>\u4e8c\u3001\u5bf9\u5355\u4e2aRDD\u6216\u4e24\u4e2aRDD\u805a\u96c6<\/p>\n\n\n\n<p>\u5355\u4e2aRDD\u805a\u96c6<\/p>\n\n\n\n<p>16\u3001combineByKey\u7b97\u5b50<\/p>\n\n\n\n<p>17\u3001reduceByKey\u7b97\u5b50<\/p>\n\n\n\n<p>18\u3001partitionBy\u7b97\u5b50<\/p>\n\n\n\n<p>&nbsp;\u3000\u3000\u4e24\u4e2aRDD\u805a\u96c6<\/p>\n\n\n\n<p>19\u3001Cogroup\u7b97\u5b50<\/p>\n\n\n\n<p>\u4e09\u3001\u8fde\u63a5<\/p>\n\n\n\n<p>20\u3001join\u7b97\u5b50<\/p>\n\n\n\n<p>21\u3001leftOutJoin\u548c rightOutJoin\u7b97\u5b50<\/p>\n\n\n\n<p>&nbsp;3\uff09Action\u7b97\u5b50<\/p>\n\n\n\n<p>\u4e00\u3001\u65e0\u8f93\u51fa<\/p>\n\n\n\n<p>22\u3001foreach\u7b97\u5b50<\/p>\n\n\n\n<p>\u4e8c\u3001HDFS<\/p>\n\n\n\n<p>23\u3001saveAsTextFile\u7b97\u5b50<\/p>\n\n\n\n<p>24\u3001saveAsObjectFile\u7b97\u5b50<\/p>\n\n\n\n<p>\u4e09\u3001Scala\u96c6\u5408\u548c\u6570\u636e\u7c7b\u578b<\/p>\n\n\n\n<p>25\u3001collect\u7b97\u5b50<\/p>\n\n\n\n<p>26\u3001collectAsMap\u7b97\u5b50<\/p>\n\n\n\n<p>27\u3001reduceByKeyLocally\u7b97\u5b50<\/p>\n\n\n\n<p>28\u3001lookup\u7b97\u5b50<\/p>\n\n\n\n<p>29\u3001count\u7b97\u5b50<\/p>\n\n\n\n<p>30\u3001top\u7b97\u5b50<\/p>\n\n\n\n<p>31\u3001reduce\u7b97\u5b50<\/p>\n\n\n\n<p>32\u3001fold\u7b97\u5b50<\/p>\n\n\n\n<p>33\u3001aggregate\u7b97\u5b50<\/p>\n\n\n\n<p><strong>&nbsp; &nbsp;&nbsp;&nbsp;1. Transformations \u7b97\u5b50<\/strong><\/p>\n\n\n\n<p><strong>\uff081\uff09&nbsp;map<\/strong><\/p>\n\n\n\n<p>\u5c06\u539f\u6765 RDD \u7684\u6bcf\u4e2a\u6570\u636e\u9879\u901a\u8fc7&nbsp;map \u4e2d\u7684\u7528\u6237\u81ea\u5b9a\u4e49\u51fd\u6570 f&nbsp;\u6620\u5c04\u8f6c\u53d8\u4e3a\u4e00\u4e2a\u65b0\u7684\u5143\u7d20\u3002\u6e90\u7801\u4e2d map \u7b97\u5b50\u76f8\u5f53\u4e8e\u521d\u59cb\u5316\u4e00\u4e2a RDD\uff0c \u65b0 RDD \u53eb\u505a MappedRDD(this, sc.clean(f))\u3002<\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp;\u56fe 1\u4e2d\u6bcf\u4e2a\u65b9\u6846\u8868\u793a\u4e00\u4e2a RDD \u5206\u533a\uff0c\u5de6\u4fa7\u7684\u5206\u533a\u7ecf\u8fc7\u7528\u6237\u81ea\u5b9a\u4e49\u51fd\u6570 f:T-&gt;U&nbsp;\u6620\u5c04\u4e3a\u53f3\u4fa7\u7684\u65b0 RDD \u5206\u533a\u3002\u4f46\u662f\uff0c\u5b9e\u9645\u53ea\u6709\u7b49\u5230 Action\u7b97\u5b50\u89e6\u53d1\u540e\uff0c\u8fd9\u4e2a f \u51fd\u6570\u624d\u4f1a\u548c\u5176\u4ed6\u51fd\u6570\u5728\u4e00\u4e2astage \u4e2d\u5bf9\u6570\u636e\u8fdb\u884c\u8fd0\u7b97\u3002\u5728\u56fe 1 \u4e2d\u7684\u7b2c\u4e00\u4e2a\u5206\u533a\uff0c\u6570\u636e\u8bb0\u5f55 V1 \u8f93\u5165 f\uff0c\u901a\u8fc7 f \u8f6c\u6362\u8f93\u51fa\u4e3a\u8f6c\u6362\u540e\u7684\u5206\u533a\u4e2d\u7684\u6570\u636e\u8bb0\u5f55 V\u20191\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-354x226.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"354\" height=\"226\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-354x226.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-346\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe1 &nbsp; &nbsp;map \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>&nbsp; &nbsp; \uff082\uff09&nbsp;flatMap<\/strong><\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp;\u5c06\u539f\u6765 RDD \u4e2d\u7684\u6bcf\u4e2a\u5143\u7d20\u901a\u8fc7\u51fd\u6570 f \u8f6c\u6362\u4e3a\u65b0\u7684\u5143\u7d20\uff0c\u5e76\u5c06\u751f\u6210\u7684 RDD \u7684\u6bcf\u4e2a\u96c6\u5408\u4e2d\u7684\u5143\u7d20\u5408\u5e76\u4e3a\u4e00\u4e2a\u96c6\u5408\uff0c\u5185\u90e8\u521b\u5efa FlatMappedRDD(this\uff0csc.clean(f))\u3002<\/p>\n\n\n\n<p>\u56fe 2 \u8868 \u793a RDD \u7684 \u4e00 \u4e2a \u5206 \u533a \uff0c\u8fdb \u884c flatMap\u51fd \u6570 \u64cd \u4f5c\uff0c flatMap \u4e2d \u4f20 \u5165 \u7684 \u51fd \u6570 \u4e3a f:T-&gt;U\uff0c&nbsp;T\u548c U \u53ef\u4ee5\u662f\u4efb\u610f\u7684\u6570\u636e\u7c7b\u578b\u3002\u5c06\u5206\u533a\u4e2d\u7684\u6570\u636e\u901a\u8fc7\u7528\u6237\u81ea\u5b9a\u4e49\u51fd\u6570 f \u8f6c\u6362\u4e3a\u65b0\u7684\u6570\u636e\u3002\u5916\u90e8\u5927\u65b9\u6846\u53ef\u4ee5\u8ba4\u4e3a\u662f\u4e00\u4e2a RDD \u5206\u533a\uff0c\u5c0f\u65b9\u6846\u4ee3\u8868\u4e00\u4e2a\u96c6\u5408\u3002 V1\u3001 V2\u3001 V3 \u5728\u4e00\u4e2a\u96c6\u5408\u4f5c\u4e3a RDD \u7684\u4e00\u4e2a\u6570\u636e\u9879\uff0c\u53ef\u80fd\u5b58\u50a8\u4e3a\u6570\u7ec4\u6216\u5176\u4ed6\u5bb9\u5668\uff0c\u8f6c\u6362\u4e3aV\u20191\u3001 V\u20192\u3001 V\u20193 \u540e\uff0c\u5c06\u539f\u6765\u7684\u6570\u7ec4\u6216\u5bb9\u5668\u7ed3\u5408\u62c6\u6563\uff0c\u62c6\u6563\u7684\u6570\u636e\u5f62\u6210\u4e3a RDD \u4e2d\u7684\u6570\u636e\u9879\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-1-368x271.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"368\" height=\"271\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-1-368x271.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-345\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe2 &nbsp; &nbsp;\u3000flapMap \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>&nbsp; &nbsp;&nbsp;\uff083\uff09&nbsp;mapPartitions<\/strong><\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp; mapPartitions \u51fd \u6570 \u83b7 \u53d6 \u5230 \u6bcf \u4e2a \u5206 \u533a \u7684 \u8fed \u4ee3\u5668\uff0c\u5728 \u51fd \u6570 \u4e2d \u901a \u8fc7 \u8fd9 \u4e2a \u5206 \u533a \u6574 \u4f53 \u7684 \u8fed \u4ee3 \u5668 \u5bf9\u6574 \u4e2a \u5206 \u533a \u7684 \u5143 \u7d20 \u8fdb \u884c \u64cd \u4f5c\u3002 \u5185 \u90e8 \u5b9e \u73b0 \u662f \u751f \u6210<\/p>\n\n\n\n<p>MapPartitionsRDD\u3002\u56fe 3 \u4e2d\u7684\u65b9\u6846\u4ee3\u8868\u4e00\u4e2a RDD \u5206\u533a\u3002\u56fe 3 \u4e2d\uff0c\u7528\u6237\u901a\u8fc7\u51fd\u6570 f (iter)=&gt;iter.f ilter(_&gt;=3) \u5bf9\u5206\u533a\u4e2d\u6240\u6709\u6570\u636e\u8fdb\u884c\u8fc7\u6ee4\uff0c\u5927\u4e8e\u548c\u7b49\u4e8e 3 \u7684\u6570\u636e\u4fdd\u7559\u3002\u4e00\u4e2a\u65b9\u5757\u4ee3\u8868\u4e00\u4e2a RDD \u5206\u533a\uff0c\u542b\u6709 1\u3001 2\u3001 3 \u7684\u5206\u533a\u8fc7\u6ee4\u53ea\u5269\u4e0b\u5143\u7d20 3\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-2-355x246.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"355\" height=\"246\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-2-355x246.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-347\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe3 &nbsp;mapPartitions \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff084\uff09glom<\/strong><\/p>\n\n\n\n<p>glom\u51fd\u6570\u5c06\u6bcf\u4e2a\u5206\u533a\u5f62\u6210\u4e00\u4e2a\u6570\u7ec4\uff0c\u5185\u90e8\u5b9e\u73b0\u662f\u8fd4\u56de\u7684GlommedRDD\u3002 \u56fe4\u4e2d\u7684\u6bcf\u4e2a\u65b9\u6846\u4ee3\u8868\u4e00\u4e2aRDD\u5206\u533a\u3002\u56fe4\u4e2d\u7684\u65b9\u6846\u4ee3\u8868\u4e00\u4e2a\u5206\u533a\u3002 \u8be5\u56fe\u8868\u793a\u542b\u6709V1\u3001 V2\u3001 V3\u7684\u5206\u533a\u901a\u8fc7\u51fd\u6570glom\u5f62\u6210\u4e00\u6570\u7ec4Array[\uff08V1\uff09\uff0c\uff08V2\uff09\uff0c\uff08V3\uff09]\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-10-545x280.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"545\" height=\"280\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-10-545x280.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-355\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe 4 &nbsp;&nbsp;glom\u7b97\u5b50\u5bf9RDD\u8f6c\u6362<\/p>\n\n\n\n<p><strong>&nbsp; &nbsp; &nbsp;\uff085\uff09&nbsp;union<\/strong><\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp; \u4f7f\u7528 union \u51fd\u6570\u65f6\u9700\u8981\u4fdd\u8bc1\u4e24\u4e2a RDD \u5143\u7d20\u7684\u6570\u636e\u7c7b\u578b\u76f8\u540c\uff0c\u8fd4\u56de\u7684 RDD \u6570\u636e\u7c7b\u578b\u548c\u88ab\u5408\u5e76\u7684 RDD \u5143\u7d20\u6570\u636e\u7c7b\u578b\u76f8\u540c\uff0c\u5e76\u4e0d\u8fdb\u884c\u53bb\u91cd\u64cd\u4f5c\uff0c\u4fdd\u5b58\u6240\u6709\u5143\u7d20\u3002\u5982\u679c\u60f3\u53bb\u91cd<\/p>\n\n\n\n<p>\u53ef\u4ee5\u4f7f\u7528 distinct()\u3002\u540c\u65f6 Spark \u8fd8\u63d0\u4f9b\u66f4\u4e3a\u7b80\u6d01\u7684\u4f7f\u7528 union \u7684 API\uff0c\u901a\u8fc7 ++ \u7b26\u53f7\u76f8\u5f53\u4e8e union \u51fd\u6570\u64cd\u4f5c\u3002<\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp;\u56fe 5 \u4e2d\u5de6\u4fa7\u5927\u65b9\u6846\u4ee3\u8868\u4e24\u4e2a RDD\uff0c\u5927\u65b9\u6846\u5185\u7684\u5c0f\u65b9\u6846\u4ee3\u8868 RDD \u7684\u5206\u533a\u3002\u53f3\u4fa7\u5927\u65b9\u6846\u4ee3\u8868\u5408\u5e76\u540e\u7684 RDD\uff0c\u5927\u65b9\u6846\u5185\u7684\u5c0f\u65b9\u6846\u4ee3\u8868\u5206\u533a\u3002<\/p>\n\n\n\n<p>\u542b\u6709V1\u3001V2\u3001U1\u3001U2\u3001U3\u3001U4\u7684RDD\u548c\u542b\u6709V1\u3001V8\u3001U5\u3001U6\u3001U7\u3001U8\u7684RDD\u5408\u5e76\u6240\u6709\u5143\u7d20\u5f62\u6210\u4e00\u4e2aRDD\u3002V1\u3001V1\u3001V2\u3001V8\u5f62\u6210\u4e00\u4e2a\u5206\u533a\uff0cU1\u3001U2\u3001U3\u3001U4\u3001U5\u3001U6\u3001U7\u3001U8\u5f62\u6210\u4e00\u4e2a\u5206\u533a\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-9-302x453.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"302\" height=\"453\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-9-302x453.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-354\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe 5 &nbsp;union \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff086\uff09&nbsp;cartesian<\/strong><\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp; &nbsp;\u5bf9 \u4e24 \u4e2a RDD \u5185 \u7684 \u6240 \u6709 \u5143 \u7d20&nbsp;\u8fdb \u884c \u7b1b \u5361 \u5c14 \u79ef \u64cd \u4f5c\u3002 \u64cd \u4f5c \u540e\uff0c \u5185 \u90e8 \u5b9e \u73b0 \u8fd4 \u56deCartesianRDD\u3002\u56fe6\u4e2d\u5de6\u4fa7\u5927\u65b9\u6846\u4ee3\u8868\u4e24\u4e2a RDD\uff0c\u5927\u65b9\u6846\u5185\u7684\u5c0f\u65b9\u6846\u4ee3\u8868 RDD \u7684\u5206\u533a\u3002\u53f3\u4fa7\u5927\u65b9\u6846\u4ee3\u8868\u5408\u5e76\u540e\u7684 RDD\uff0c\u5927\u65b9\u6846\u5185\u7684\u5c0f\u65b9\u6846\u4ee3\u8868\u5206\u533a\u3002\u56fe6\u4e2d\u7684\u5927\u65b9\u6846\u4ee3\u8868RDD\uff0c\u5927\u65b9\u6846\u4e2d\u7684\u5c0f\u65b9\u6846\u4ee3\u8868RDD\u5206\u533a\u3002<\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp; \u4f8b \u5982\uff1a V1 \u548c \u53e6 \u4e00 \u4e2a RDD \u4e2d \u7684 W1\u3001 W2\u3001 Q5 \u8fdb \u884c \u7b1b \u5361 \u5c14 \u79ef \u8fd0 \u7b97 \u5f62 \u6210 (V1,W1)\u3001(V1,W2)\u3001 (V1,Q5)\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-16-385x479.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"385\" height=\"479\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-16-385x479.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-365\"\/><\/div><\/figure>\n\n\n\n<p>&nbsp; &nbsp; &nbsp; &nbsp;\u56fe 6 &nbsp;cartesian \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff087\uff09&nbsp;groupBy<\/strong><\/p>\n\n\n\n<p>groupBy \uff1a\u5c06\u5143\u7d20\u901a\u8fc7\u51fd\u6570\u751f\u6210\u76f8\u5e94\u7684 Key\uff0c\u6570\u636e\u5c31\u8f6c\u5316\u4e3a Key-Value \u683c\u5f0f\uff0c\u4e4b\u540e\u5c06 Key \u76f8\u540c\u7684\u5143\u7d20\u5206\u4e3a\u4e00\u7ec4\u3002<\/p>\n\n\n\n<p>\u51fd\u6570\u5b9e\u73b0\u5982\u4e0b\uff1a<\/p>\n\n\n\n<p>1\uff09\u5c06\u7528\u6237\u51fd\u6570\u9884\u5904\u7406\uff1a<\/p>\n\n\n\n<p>val cleanF = sc.clean(f)<\/p>\n\n\n\n<p>2\uff09\u5bf9\u6570\u636e map \u8fdb\u884c\u51fd\u6570\u64cd\u4f5c\uff0c\u6700\u540e\u518d\u8fdb\u884c groupByKey \u5206\u7ec4\u64cd\u4f5c\u3002<\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp;this.map(t =&gt; (cleanF(t), t)).groupByKey(p)<\/p>\n\n\n\n<p>\u5176\u4e2d\uff0c p \u786e\u5b9a\u4e86\u5206\u533a\u4e2a\u6570\u548c\u5206\u533a\u51fd\u6570\uff0c\u4e5f\u5c31\u51b3\u5b9a\u4e86\u5e76\u884c\u5316\u7684\u7a0b\u5ea6\u3002<\/p>\n\n\n\n<p>\u56fe7 \u4e2d\u65b9\u6846\u4ee3\u8868\u4e00\u4e2a RDD \u5206\u533a\uff0c\u76f8\u540ckey \u7684\u5143\u7d20\u5408\u5e76\u5230\u4e00\u4e2a\u7ec4\u3002\u4f8b\u5982 V1 \u548c V2 \u5408\u5e76\u4e3a V\uff0c Value \u4e3a V1,V2\u3002\u5f62\u6210 V,Seq(V1,V2)\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-11-410x316.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"410\" height=\"316\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-11-410x316.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-356\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe 7&nbsp;groupBy \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff088\uff09&nbsp;filter<\/strong><\/p>\n\n\n\n<p>&nbsp; &nbsp; filter \u51fd\u6570\u529f\u80fd\u662f\u5bf9\u5143\u7d20\u8fdb\u884c\u8fc7\u6ee4\uff0c\u5bf9\u6bcf\u4e2a \u5143 \u7d20 \u5e94 \u7528 f \u51fd \u6570\uff0c \u8fd4 \u56de \u503c \u4e3a true \u7684 \u5143 \u7d20 \u5728RDD \u4e2d\u4fdd\u7559\uff0c\u8fd4\u56de\u503c\u4e3a false \u7684\u5143\u7d20\u5c06\u88ab\u8fc7\u6ee4\u6389\u3002 \u5185 \u90e8 \u5b9e \u73b0 \u76f8 \u5f53 \u4e8e \u751f \u6210 FilteredRDD(this\uff0csc.clean(f))\u3002<\/p>\n\n\n\n<p>&nbsp; &nbsp; \u4e0b\u9762\u4ee3\u7801\u4e3a\u51fd\u6570\u7684\u672c\u8d28\u5b9e\u73b0\uff1a<\/p>\n\n\n\n<p>&nbsp; &nbsp;&nbsp;deffilter(f:T=&gt;Boolean):RDD[T]=newFilteredRDD(this,sc.clean(f))<\/p>\n\n\n\n<p>\u56fe 8 \u4e2d\u6bcf\u4e2a\u65b9\u6846\u4ee3\u8868\u4e00\u4e2a RDD \u5206\u533a\uff0c T \u53ef\u4ee5\u662f\u4efb\u610f\u7684\u7c7b\u578b\u3002\u901a\u8fc7\u7528\u6237\u81ea\u5b9a\u4e49\u7684\u8fc7\u6ee4\u51fd\u6570 f\uff0c\u5bf9\u6bcf\u4e2a\u6570\u636e\u9879\u64cd\u4f5c\uff0c\u5c06\u6ee1\u8db3\u6761\u4ef6\u3001\u8fd4\u56de\u7ed3\u679c\u4e3a true \u7684\u6570\u636e\u9879\u4fdd\u7559\u3002\u4f8b\u5982\uff0c\u8fc7\u6ee4\u6389 V2 \u548c V3 \u4fdd\u7559\u4e86 V1\uff0c\u4e3a\u533a\u5206\u547d\u540d\u4e3a V\u20191\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-5-334x261.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"334\" height=\"261\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-5-334x261.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-350\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe 8 &nbsp;filter \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff089\uff09distinct<\/strong><\/p>\n\n\n\n<p>distinct\u5c06RDD\u4e2d\u7684\u5143\u7d20\u8fdb\u884c\u53bb\u91cd\u64cd\u4f5c\u3002\u56fe9\u4e2d\u7684\u6bcf\u4e2a\u65b9\u6846\u4ee3\u8868\u4e00\u4e2aRDD\u5206\u533a\uff0c\u901a\u8fc7distinct\u51fd\u6570\uff0c\u5c06\u6570\u636e\u53bb\u91cd\u3002 \u4f8b\u5982\uff0c\u91cd\u590d\u6570\u636eV1\u3001 V1\u53bb\u91cd\u540e\u53ea\u4fdd\u7559\u4e00\u4efdV1\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-15-451x269.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"451\" height=\"269\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-15-451x269.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-361\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe9 &nbsp;distinct\u7b97\u5b50\u5bf9RDD\u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0810\uff09subtract<\/strong><\/p>\n\n\n\n<p>subtract\u76f8\u5f53\u4e8e\u8fdb\u884c\u96c6\u5408\u7684\u5dee\u64cd\u4f5c\uff0cRDD 1\u53bb\u9664RDD 1\u548cRDD 2\u4ea4\u96c6\u4e2d\u7684\u6240\u6709\u5143\u7d20\u3002\u56fe10\u4e2d\u5de6\u4fa7\u7684\u5927\u65b9\u6846\u4ee3\u8868\u4e24\u4e2aRDD\uff0c\u5927\u65b9\u6846\u5185\u7684\u5c0f\u65b9\u6846\u4ee3\u8868RDD\u7684\u5206\u533a\u3002 \u53f3\u4fa7\u5927\u65b9\u6846<\/p>\n\n\n\n<p>\u4ee3\u8868\u5408\u5e76\u540e\u7684RDD\uff0c\u5927\u65b9\u6846\u5185\u7684\u5c0f\u65b9\u6846\u4ee3\u8868\u5206\u533a\u3002 V1\u5728\u4e24\u4e2aRDD\u4e2d\u5747\u6709\uff0c\u6839\u636e\u5dee\u96c6\u8fd0\u7b97\u89c4\u5219\uff0c\u65b0RDD\u4e0d\u4fdd\u7559\uff0cV2\u5728\u7b2c\u4e00\u4e2aRDD\u6709\uff0c\u7b2c\u4e8c\u4e2aRDD\u6ca1\u6709\uff0c\u5219\u5728\u65b0RDD\u5143\u7d20\u4e2d\u5305\u542bV2\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-21-616x494.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"616\" height=\"494\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-21-616x494.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-372\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe10 &nbsp; subtract\u7b97\u5b50\u5bf9RDD\u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0811\uff09&nbsp;sample<\/strong><\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp; &nbsp;sample \u5c06 RDD \u8fd9\u4e2a\u96c6\u5408\u5185\u7684\u5143\u7d20\u8fdb\u884c\u91c7\u6837\uff0c\u83b7\u53d6\u6240\u6709\u5143\u7d20\u7684\u5b50\u96c6\u3002\u7528\u6237\u53ef\u4ee5\u8bbe\u5b9a\u662f\u5426\u6709\u653e\u56de\u7684\u62bd\u6837\u3001\u767e\u5206\u6bd4\u3001\u968f\u673a\u79cd\u5b50\uff0c\u8fdb\u800c\u51b3\u5b9a\u91c7\u6837\u65b9\u5f0f\u3002\u5185\u90e8\u5b9e\u73b0\u662f\u751f\u6210 SampledRDD(withReplacement\uff0c fraction\uff0c seed)\u3002<\/p>\n\n\n\n<p>\u51fd\u6570\u53c2\u6570\u8bbe\u7f6e\uff1a<\/p>\n\n\n\n<p>\u2030 \u3000\u3000withReplacement=true\uff0c\u8868\u793a\u6709\u653e\u56de\u7684\u62bd\u6837\u3002<\/p>\n\n\n\n<p>\u2030 \u3000\u3000withReplacement=false\uff0c\u8868\u793a\u65e0\u653e\u56de\u7684\u62bd\u6837\u3002<\/p>\n\n\n\n<p>\u56fe 11\u4e2d \u7684 \u6bcf \u4e2a \u65b9 \u6846 \u662f \u4e00 \u4e2a RDD \u5206 \u533a\u3002 \u901a \u8fc7 sample \u51fd \u6570\uff0c \u91c7 \u6837 50% \u7684 \u6570 \u636e\u3002V1\u3001 V2\u3001 U1\u3001 U2\u3001U3\u3001U4 \u91c7\u6837\u51fa\u6570\u636e V1 \u548c U1\u3001 U2 \u5f62\u6210\u65b0\u7684 RDD\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-3-351x264.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"351\" height=\"264\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-3-351x264.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-351\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe11 &nbsp;sample \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0812\uff09takeSample<\/strong><\/p>\n\n\n\n<p>takeSample\uff08\uff09\u51fd\u6570\u548c\u4e0a\u9762\u7684sample\u51fd\u6570\u662f\u4e00\u4e2a\u539f\u7406\uff0c\u4f46\u662f\u4e0d\u4f7f\u7528\u76f8\u5bf9\u6bd4\u4f8b\u91c7\u6837\uff0c\u800c\u662f\u6309\u8bbe\u5b9a\u7684\u91c7\u6837\u4e2a\u6570\u8fdb\u884c\u91c7\u6837\uff0c\u540c\u65f6\u8fd4\u56de\u7ed3\u679c\u4e0d\u518d\u662fRDD\uff0c\u800c\u662f\u76f8\u5f53\u4e8e\u5bf9\u91c7\u6837\u540e\u7684\u6570\u636e\u8fdb\u884c<\/p>\n\n\n\n<p>Collect\uff08\uff09\uff0c\u8fd4\u56de\u7ed3\u679c\u7684\u96c6\u5408\u4e3a\u5355\u673a\u7684\u6570\u7ec4\u3002<\/p>\n\n\n\n<p>\u56fe12\u4e2d\u5de6\u4fa7\u7684\u65b9\u6846\u4ee3\u8868\u5206\u5e03\u5f0f\u7684\u5404\u4e2a\u8282\u70b9\u4e0a\u7684\u5206\u533a\uff0c\u53f3\u4fa7\u65b9\u6846\u4ee3\u8868\u5355\u673a\u4e0a\u8fd4\u56de\u7684\u7ed3\u679c\u6570\u7ec4\u3002 \u901a\u8fc7takeSample\u5bf9\u6570\u636e\u91c7\u6837\uff0c\u8bbe\u7f6e\u4e3a\u91c7\u6837\u4e00\u4efd\u6570\u636e\uff0c\u8fd4\u56de\u7ed3\u679c\u4e3aV1\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-15-451x269.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"451\" height=\"269\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-15-451x269.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-363\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe12 &nbsp;\u3000\u3000takeSample\u7b97\u5b50\u5bf9RDD\u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0813\uff09&nbsp;cache<\/strong><\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp;cache&nbsp;\u5c06 RDD \u5143\u7d20\u4ece\u78c1\u76d8\u7f13\u5b58\u5230\u5185\u5b58\u3002 \u76f8\u5f53\u4e8e persist(MEMORY_ONLY) \u51fd\u6570\u7684\u529f\u80fd\u3002<\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp;\u56fe13 \u4e2d\u6bcf\u4e2a\u65b9\u6846\u4ee3\u8868\u4e00\u4e2a RDD \u5206\u533a\uff0c\u5de6\u4fa7\u76f8\u5f53\u4e8e\u6570\u636e\u5206\u533a\u90fd\u5b58\u50a8\u5728\u78c1\u76d8\uff0c\u901a\u8fc7 cache \u7b97\u5b50\u5c06\u6570\u636e\u7f13\u5b58\u5728\u5185\u5b58\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-4-339x246.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"339\" height=\"246\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-4-339x246.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-348\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe 13 Cache \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0814\uff09&nbsp;persist<\/strong><\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp; persist \u51fd\u6570\u5bf9&nbsp;RDD \u8fdb\u884c\u7f13\u5b58\u64cd\u4f5c\u3002\u6570\u636e\u7f13\u5b58\u5728\u54ea\u91cc\u4f9d\u636e StorageLevel \u8fd9\u4e2a\u679a\u4e3e\u7c7b\u578b\u8fdb\u884c\u786e\u5b9a\u3002 \u6709\u4ee5\u4e0b\u51e0\u79cd\u7c7b\u578b\u7684\u7ec4\u5408\uff08\u89c110\uff09\uff0c DISK \u4ee3\u8868\u78c1\u76d8\uff0cMEMORY \u4ee3\u8868\u5185\u5b58\uff0c SER \u4ee3\u8868\u6570\u636e\u662f\u5426\u8fdb\u884c\u5e8f\u5217\u5316\u5b58\u50a8\u3002<\/p>\n\n\n\n<p>\u4e0b\u9762\u4e3a\u51fd\u6570\u5b9a\u4e49\uff0c StorageLevel \u662f\u679a\u4e3e\u7c7b\u578b\uff0c\u4ee3\u8868\u5b58\u50a8\u6a21\u5f0f\uff0c\u7528\u6237\u53ef\u4ee5\u901a\u8fc7\u56fe 14-1 \u6309\u9700\u8fdb\u884c\u9009\u62e9\u3002<\/p>\n\n\n\n<p>persist(newLevel:StorageLevel)<\/p>\n\n\n\n<p>\u56fe 14-1 \u4e2d\u5217\u51fapersist \u51fd\u6570\u53ef\u4ee5\u8fdb\u884c\u7f13\u5b58\u7684\u6a21\u5f0f\u3002\u4f8b\u5982\uff0cMEMORY_AND_DISK_SER \u4ee3\u8868\u6570\u636e\u53ef\u4ee5\u5b58\u50a8\u5728\u5185\u5b58\u548c\u78c1\u76d8\uff0c\u5e76\u4e14\u4ee5\u5e8f\u5217\u5316\u7684\u65b9\u5f0f\u5b58\u50a8\uff0c\u5176\u4ed6\u540c\u7406\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-medium\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-14-300x300.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"300\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-14-300x300.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-359\"  sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/div><\/figure>\n\n\n\n<p>\u56fe 14-1 &nbsp;persist \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p>\u56fe 14-2 \u4e2d\u65b9\u6846\u4ee3\u8868 RDD \u5206\u533a\u3002 disk \u4ee3\u8868\u5b58\u50a8\u5728\u78c1\u76d8\uff0c mem \u4ee3\u8868\u5b58\u50a8\u5728\u5185\u5b58\u3002\u6570\u636e\u6700\u521d\u5168\u90e8\u5b58\u50a8\u5728\u78c1\u76d8\uff0c\u901a\u8fc7 persist(MEMORY_AND_DISK) \u5c06\u6570\u636e\u7f13\u5b58\u5230\u5185\u5b58\uff0c\u4f46\u662f\u6709\u7684\u5206\u533a\u65e0\u6cd5\u5bb9\u7eb3\u5728\u5185\u5b58\uff0c\u5c06\u542b\u6709 V1\u3001 V2\u3001 V3 \u7684RDD\u5b58\u50a8\u5230\u78c1\u76d8\uff0c\u5c06\u542b\u6709U1\uff0cU2\u7684RDD\u4ecd\u65e7\u5b58\u50a8\u5728\u5185\u5b58\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-medium\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-14-300x300.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"300\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-14-300x300.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-360\"  sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/div><\/figure>\n\n\n\n<p>&nbsp; &nbsp; &nbsp; \u56fe 14-2 &nbsp; Persist \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0815\uff09&nbsp;mapValues<\/strong><\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp; mapValues \uff1a\u9488\u5bf9\uff08Key\uff0c Value\uff09\u578b\u6570\u636e\u4e2d\u7684 Value \u8fdb\u884c Map \u64cd\u4f5c\uff0c\u800c\u4e0d\u5bf9 Key \u8fdb\u884c\u5904\u7406\u3002<\/p>\n\n\n\n<p>&nbsp; &nbsp; \u56fe 15 \u4e2d\u7684\u65b9\u6846\u4ee3\u8868 RDD \u5206\u533a\u3002 a=&gt;a+2 \u4ee3\u8868\u5bf9 (V1,1) \u8fd9\u6837\u7684 Key Value \u6570\u636e\u5bf9\uff0c\u6570\u636e\u53ea\u5bf9 Value \u4e2d\u7684 1 \u8fdb\u884c\u52a0 2 \u64cd\u4f5c\uff0c\u8fd4\u56de\u7ed3\u679c\u4e3a 3\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-8-360x255.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"360\" height=\"255\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-8-360x255.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-353\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe 15 &nbsp; mapValues \u7b97\u5b50 RDD \u5bf9\u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0816\uff09&nbsp;combineByKey<\/strong><\/p>\n\n\n\n<p>\u4e0b\u9762\u4ee3\u7801\u4e3a combineByKey \u51fd\u6570\u7684\u5b9a\u4e49\uff1a<\/p>\n\n\n\n<p>combineByKey[C](createCombiner:(V) C,<\/p>\n\n\n\n<p>mergeValue:(C, V) C,<\/p>\n\n\n\n<p>mergeCombiners:(C, C) C,<\/p>\n\n\n\n<p>partitioner:Partitioner,<\/p>\n\n\n\n<p>mapSideCombine:Boolean=true,<\/p>\n\n\n\n<p>serializer:Serializer=null):RDD[(K,C)]<\/p>\n\n\n\n<p>\u8bf4\u660e\uff1a<\/p>\n\n\n\n<p>\u2030 \u3000\u3000createCombiner\uff1a V =&gt; C\uff0c C \u4e0d\u5b58\u5728\u7684\u60c5\u51b5\u4e0b\uff0c\u6bd4\u5982\u901a\u8fc7 V \u521b\u5efa seq C\u3002<\/p>\n\n\n\n<p>\u2030\u3000\u3000 mergeValue\uff1a (C\uff0c V) =&gt; C\uff0c\u5f53 C \u5df2\u7ecf\u5b58\u5728\u7684\u60c5\u51b5\u4e0b\uff0c\u9700\u8981 merge\uff0c\u6bd4\u5982\u628a item V<\/p>\n\n\n\n<p>\u52a0\u5230 seq C \u4e2d\uff0c\u6216\u8005\u53e0\u52a0\u3002<\/p>\n\n\n\n<p>&nbsp;mergeCombiners\uff1a (C\uff0c C) =&gt; C\uff0c\u5408\u5e76\u4e24\u4e2a C\u3002<\/p>\n\n\n\n<p>\u2030 \u3000\u3000partitioner\uff1a Partitioner, Shuff le \u65f6\u9700\u8981\u7684 Partitioner\u3002<\/p>\n\n\n\n<p>\u2030 \u3000\u3000mapSideCombine \uff1a Boolean = true\uff0c\u4e3a\u4e86\u51cf\u5c0f\u4f20\u8f93\u91cf\uff0c\u5f88\u591a combine \u53ef\u4ee5\u5728 map<\/p>\n\n\n\n<p>\u7aef\u5148\u505a\uff0c\u6bd4\u5982\u53e0\u52a0\uff0c\u53ef\u4ee5\u5148\u5728\u4e00\u4e2a partition \u4e2d\u628a\u6240\u6709\u76f8\u540c\u7684 key \u7684 value \u53e0\u52a0\uff0c<\/p>\n\n\n\n<p>\u518d shuff le\u3002<\/p>\n\n\n\n<p>\u2030 \u3000\u3000serializerClass\uff1a String = null\uff0c\u4f20\u8f93\u9700\u8981\u5e8f\u5217\u5316\uff0c\u7528\u6237\u53ef\u4ee5\u81ea\u5b9a\u4e49\u5e8f\u5217\u5316\u7c7b\uff1a<\/p>\n\n\n\n<p>\u4f8b\u5982\uff0c\u76f8\u5f53\u4e8e\u5c06\u5143\u7d20\u4e3a (Int\uff0c Int) \u7684 RDD \u8f6c\u53d8\u4e3a\u4e86 (Int\uff0c Seq[Int]) \u7c7b\u578b\u5143\u7d20\u7684 RDD\u3002\u56fe 16\u4e2d\u7684\u65b9\u6846\u4ee3\u8868 RDD \u5206\u533a\u3002\u5982\u56fe\uff0c\u901a\u8fc7 combineByKey\uff0c \u5c06 (V1,2)\uff0c (V1,1)\u6570\u636e\u5408\u5e76\u4e3a\uff08 V1,Seq(2,1)\uff09\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-15-451x269.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"451\" height=\"269\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-15-451x269.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-364\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe 16&nbsp; comBineByKey \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0817\uff09&nbsp;reduceByKey<\/strong><\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp;reduceByKey \u662f\u6bd4 combineByKey \u66f4\u7b80\u5355\u7684\u4e00\u79cd\u60c5\u51b5\uff0c\u53ea\u662f\u4e24\u4e2a\u503c\u5408\u5e76\u6210\u4e00\u4e2a\u503c\uff0c\uff08 Int\uff0c Int V\uff09to \uff08Int\uff0c Int C\uff09\uff0c\u6bd4\u5982\u53e0\u52a0\u3002\u6240\u4ee5 createCombiner reduceBykey \u5f88\u7b80\u5355\uff0c\u5c31\u662f\u76f4\u63a5\u8fd4\u56de v\uff0c\u800c mergeValue\u548c mergeCombiners \u903b\u8f91\u662f\u76f8\u540c\u7684\uff0c\u6ca1\u6709\u533a\u522b\u3002<\/p>\n\n\n\n<p>&nbsp; &nbsp; \u51fd\u6570\u5b9e\u73b0\uff1a<\/p>\n\n\n\n<p>&nbsp; &nbsp; def reduceByKey(partitioner: Partitioner, func: (V, V) =&gt; V): RDD[(K, V)]<\/p>\n\n\n\n<p>= {<\/p>\n\n\n\n<p>combineByKey[V]((v: V) =&gt; v, func, func, partitioner)<\/p>\n\n\n\n<p>}<\/p>\n\n\n\n<p>\u56fe17\u4e2d\u7684\u65b9\u6846\u4ee3\u8868 RDD \u5206\u533a\u3002\u901a\u8fc7\u7528\u6237\u81ea\u5b9a\u4e49\u51fd\u6570 (A,B) =&gt; (A + B) \u51fd\u6570\uff0c\u5c06\u76f8\u540c key \u7684\u6570\u636e (V1,2) \u548c (V1,1) \u7684 value \u76f8\u52a0\u8fd0\u7b97\uff0c\u7ed3\u679c\u4e3a\uff08 V1,3\uff09\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-12-412x282.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"412\" height=\"282\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-12-412x282.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-357\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe 17&nbsp;reduceByKey \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0818\uff09partitionBy<\/strong><\/p>\n\n\n\n<p>partitionBy\u51fd\u6570\u5bf9RDD\u8fdb\u884c\u5206\u533a\u64cd\u4f5c\u3002<\/p>\n\n\n\n<p>\u51fd\u6570\u5b9a\u4e49\u5982\u4e0b\u3002<\/p>\n\n\n\n<p>partitionBy\uff08partitioner\uff1aPartitioner\uff09<\/p>\n\n\n\n<p>\u5982\u679c\u539f\u6709RDD\u7684\u5206\u533a\u5668\u548c\u73b0\u6709\u5206\u533a\u5668\uff08partitioner\uff09\u4e00\u81f4\uff0c\u5219\u4e0d\u91cd\u5206\u533a\uff0c\u5982\u679c\u4e0d\u4e00\u81f4\uff0c\u5219\u76f8\u5f53\u4e8e\u6839\u636e\u5206\u533a\u5668\u751f\u6210\u4e00\u4e2a\u65b0\u7684ShuffledRDD\u3002<\/p>\n\n\n\n<p>\u56fe18\u4e2d\u7684\u65b9\u6846\u4ee3\u8868RDD\u5206\u533a\u3002 \u901a\u8fc7\u65b0\u7684\u5206\u533a\u7b56\u7565\u5c06\u539f\u6765\u5728\u4e0d\u540c\u5206\u533a\u7684V1\u3001 V2\u6570\u636e\u90fd\u5408\u5e76\u5230\u4e86\u4e00\u4e2a\u5206\u533a\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-18-722x535.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"722\" height=\"535\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-18-722x535.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-369\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe18\u3000\u3000partitionBy\u7b97\u5b50\u5bf9RDD\u8f6c\u6362<\/p>\n\n\n\n<p><strong>&nbsp;\uff0819\uff09Cogroup<\/strong><\/p>\n\n\n\n<p>&nbsp;\u3000\u3000cogroup\u51fd\u6570\u5c06\u4e24\u4e2aRDD\u8fdb\u884c\u534f\u540c\u5212\u5206\uff0ccogroup\u51fd\u6570\u7684\u5b9a\u4e49\u5982\u4e0b\u3002<\/p>\n\n\n\n<p>cogroup[W]\uff08other\uff1a RDD[\uff08K\uff0c W\uff09]\uff0c numPartitions\uff1a Int\uff09\uff1a RDD[\uff08K\uff0c \uff08Iterable[V]\uff0c Iterable[W]\uff09\uff09]<\/p>\n\n\n\n<p>\u5bf9\u5728\u4e24\u4e2aRDD\u4e2d\u7684Key-Value\u7c7b\u578b\u7684\u5143\u7d20\uff0c\u6bcf\u4e2aRDD\u76f8\u540cKey\u7684\u5143\u7d20\u5206\u522b\u805a\u5408\u4e3a\u4e00\u4e2a\u96c6\u5408\uff0c\u5e76\u4e14\u8fd4\u56de\u4e24\u4e2aRDD\u4e2d\u5bf9\u5e94Key\u7684\u5143\u7d20\u96c6\u5408\u7684\u8fed\u4ee3\u5668\u3002<\/p>\n\n\n\n<p>\uff08K\uff0c \uff08Iterable[V]\uff0c Iterable[W]\uff09\uff09<\/p>\n\n\n\n<p>\u5176\u4e2d\uff0cKey\u548cValue\uff0cValue\u662f\u4e24\u4e2aRDD\u4e0b\u76f8\u540cKey\u7684\u4e24\u4e2a\u6570\u636e\u96c6\u5408\u7684\u8fed\u4ee3\u5668\u6240\u6784\u6210\u7684\u5143\u7ec4\u3002<\/p>\n\n\n\n<p>\u56fe19\u4e2d\u7684\u5927\u65b9\u6846\u4ee3\u8868RDD\uff0c\u5927\u65b9\u6846\u5185\u7684\u5c0f\u65b9\u6846\u4ee3\u8868RDD\u4e2d\u7684\u5206\u533a\u3002 \u5c06RDD1\u4e2d\u7684\u6570\u636e\uff08U1\uff0c1\uff09\u3001 \uff08U1\uff0c2\uff09\u548cRDD2\u4e2d\u7684\u6570\u636e\uff08U1\uff0c2\uff09\u5408\u5e76\u4e3a\uff08U1\uff0c\uff08\uff081\uff0c2\uff09\uff0c\uff082\uff09\uff09\uff09\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-23-565x421.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"565\" height=\"421\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-23-565x421.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-374\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe19 &nbsp;Cogroup\u7b97\u5b50\u5bf9RDD\u8f6c\u6362<\/p>\n\n\n\n<p><strong>&nbsp;\u3000\u3000\uff0820\uff09&nbsp;join<\/strong><\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp; &nbsp;join \u5bf9\u4e24\u4e2a\u9700\u8981\u8fde\u63a5\u7684 RDD \u8fdb\u884c cogroup\u51fd\u6570\u64cd\u4f5c\uff0c\u5c06\u76f8\u540c key \u7684\u6570\u636e\u80fd\u591f\u653e\u5230\u4e00\u4e2a\u5206\u533a\uff0c\u5728 cogroup \u64cd\u4f5c\u4e4b\u540e\u5f62\u6210\u7684\u65b0 RDD \u5bf9\u6bcf\u4e2akey \u4e0b\u7684\u5143\u7d20\u8fdb\u884c\u7b1b\u5361\u5c14\u79ef\u7684\u64cd\u4f5c\uff0c\u8fd4\u56de\u7684\u7ed3\u679c\u518d\u5c55\u5e73\uff0c\u5bf9\u5e94 key \u4e0b\u7684\u6240\u6709\u5143\u7ec4\u5f62\u6210\u4e00\u4e2a\u96c6\u5408\u3002\u6700\u540e\u8fd4\u56de RDD[(K\uff0c (V\uff0c W))]\u3002<\/p>\n\n\n\n<p>\u4e0b \u9762 \u4ee3 \u7801 \u4e3a join \u7684 \u51fd \u6570 \u5b9e \u73b0\uff0c \u672c \u8d28 \u662f\u901a \u8fc7 cogroup \u7b97 \u5b50 \u5148 \u8fdb \u884c \u534f \u540c \u5212 \u5206\uff0c \u518d \u901a \u8fc7flatMapValues \u5c06\u5408\u5e76\u7684\u6570\u636e\u6253\u6563\u3002<\/p>\n\n\n\n<p>&nbsp; &nbsp; &nbsp; &nbsp;this.cogroup(other,partitioner).f latMapValues{case(vs,ws) =&gt;&nbsp;for(v&lt;-vs;w&lt;-ws)yield(v,w) }<\/p>\n\n\n\n<p>\u56fe 20\u662f\u5bf9\u4e24\u4e2a RDD \u7684 join \u64cd\u4f5c\u793a\u610f\u56fe\u3002\u5927\u65b9\u6846\u4ee3\u8868 RDD\uff0c\u5c0f\u65b9\u6846\u4ee3\u8868 RDD \u4e2d\u7684\u5206\u533a\u3002\u51fd\u6570\u5bf9\u76f8\u540c key \u7684\u5143\u7d20\uff0c\u5982 V1 \u4e3a key \u505a\u8fde\u63a5\u540e\u7ed3\u679c\u4e3a (V1,(1,1)) \u548c (V1,(1,2))\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-18-722x535.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"722\" height=\"535\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-18-722x535.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-366\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe 20 &nbsp; join \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0821\uff09eftOutJoin\u548crightOutJoin<\/strong><\/p>\n\n\n\n<p>LeftOutJoin\uff08\u5de6\u5916\u8fde\u63a5\uff09\u548cRightOutJoin\uff08\u53f3\u5916\u8fde\u63a5\uff09\u76f8\u5f53\u4e8e\u5728join\u7684\u57fa\u7840\u4e0a\u5148\u5224\u65ad\u4e00\u4fa7\u7684RDD\u5143\u7d20\u662f\u5426\u4e3a\u7a7a\uff0c\u5982\u679c\u4e3a\u7a7a\uff0c\u5219\u586b\u5145\u4e3a\u7a7a\u3002 \u5982\u679c\u4e0d\u4e3a\u7a7a\uff0c\u5219\u5c06\u6570\u636e\u8fdb\u884c\u8fde\u63a5\u8fd0\u7b97\uff0c\u5e76<\/p>\n\n\n\n<p>\u8fd4\u56de\u7ed3\u679c\u3002<\/p>\n\n\n\n<p>\u4e0b\u9762\u4ee3\u7801\u662fleftOutJoin\u7684\u5b9e\u73b0\u3002<\/p>\n\n\n\n<p>if \uff08ws.isEmpty\uff09 {<\/p>\n\n\n\n<p>vs.map\uff08v =&gt; \uff08v\uff0c None\uff09\uff09<\/p>\n\n\n\n<p>} else {<\/p>\n\n\n\n<p>for \uff08v &lt;- vs\uff1b w &lt;- ws\uff09 yield \uff08v\uff0c Some\uff08w\uff09\uff09<\/p>\n\n\n\n<p>}<\/p>\n\n\n\n<p><strong>2. Actions \u7b97\u5b50<\/strong><\/p>\n\n\n\n<p>\u672c\u8d28\u4e0a\u5728 Action \u7b97\u5b50\u4e2d\u901a\u8fc7 SparkContext \u8fdb\u884c\u4e86\u63d0\u4ea4\u4f5c\u4e1a\u7684 runJob \u64cd\u4f5c\uff0c\u89e6\u53d1\u4e86RDD DAG \u7684\u6267\u884c\u3002<\/p>\n\n\n\n<p>\u4f8b\u5982\uff0c Action \u7b97\u5b50 collect \u51fd\u6570\u7684\u4ee3\u7801\u5982\u4e0b\uff0c\u611f\u5174\u8da3\u7684\u8bfb\u8005\u53ef\u4ee5\u987a\u7740\u8fd9\u4e2a\u5165\u53e3\u8fdb\u884c\u6e90\u7801\u5256\u6790\uff1a<\/p>\n\n\n\n<p>\/**<\/p>\n\n\n\n<p>* Return an array that contains all of the elements in this RDD.<\/p>\n\n\n\n<p>*\/<\/p>\n\n\n\n<p>def collect(): Array[T] = {<\/p>\n\n\n\n<p>\/* \u63d0\u4ea4 Job*\/<\/p>\n\n\n\n<p>val results = sc.runJob(this, (iter: Iterator[T]) =&gt; iter.toArray)<\/p>\n\n\n\n<p>Array.concat(results: _*)<\/p>\n\n\n\n<p>}<\/p>\n\n\n\n<p><strong>\uff0822\uff09&nbsp;foreach<\/strong><\/p>\n\n\n\n<p>foreach \u5bf9 RDD \u4e2d\u7684\u6bcf\u4e2a\u5143\u7d20\u90fd\u5e94\u7528 f \u51fd\u6570\u64cd\u4f5c\uff0c\u4e0d\u8fd4\u56de RDD \u548c Array\uff0c \u800c\u662f\u8fd4\u56deUint\u3002\u56fe22\u8868\u793a foreach \u7b97\u5b50\u901a\u8fc7\u7528\u6237\u81ea\u5b9a\u4e49\u51fd\u6570\u5bf9\u6bcf\u4e2a\u6570\u636e\u9879\u8fdb\u884c\u64cd\u4f5c\u3002\u672c\u4f8b\u4e2d\u81ea\u5b9a\u4e49\u51fd\u6570\u4e3a println()\uff0c\u63a7\u5236\u53f0\u6253\u5370\u6240\u6709\u6570\u636e\u9879\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-7-312x237.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"312\" height=\"237\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-7-312x237.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-349\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe 22 foreach \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0823\uff09&nbsp;saveAsTextFile<\/strong><\/p>\n\n\n\n<p>\u51fd\u6570\u5c06\u6570\u636e\u8f93\u51fa\uff0c\u5b58\u50a8\u5230 HDFS \u7684\u6307\u5b9a\u76ee\u5f55\u3002<\/p>\n\n\n\n<p>\u4e0b\u9762\u4e3a saveAsTextFile \u51fd\u6570\u7684\u5185\u90e8\u5b9e\u73b0\uff0c\u5176\u5185\u90e8<\/p>\n\n\n\n<p>\u901a\u8fc7\u8c03\u7528 saveAsHadoopFile \u8fdb\u884c\u5b9e\u73b0\uff1a<\/p>\n\n\n\n<p>this.map(x =&gt; (NullWritable.get(), new Text(x.toString))).saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](path)<\/p>\n\n\n\n<p>\u5c06 RDD \u4e2d\u7684\u6bcf\u4e2a\u5143\u7d20\u6620\u5c04\u8f6c\u53d8\u4e3a (null\uff0c x.toString)\uff0c\u7136\u540e\u518d\u5c06\u5176\u5199\u5165 HDFS\u3002<\/p>\n\n\n\n<p>\u56fe 23\u4e2d\u5de6\u4fa7\u65b9\u6846\u4ee3\u8868 RDD \u5206\u533a\uff0c\u53f3\u4fa7\u65b9\u6846\u4ee3\u8868 HDFS \u7684 Block\u3002\u901a\u8fc7\u51fd\u6570\u5c06RDD \u7684\u6bcf\u4e2a\u5206\u533a\u5b58\u50a8\u4e3a HDFS \u4e2d\u7684\u4e00\u4e2a Block\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-13-438x213.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"438\" height=\"213\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-13-438x213.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-358\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe 23 &nbsp; saveAsHadoopFile \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>&nbsp; \uff0824\uff09saveAsObjectFile<\/strong><\/p>\n\n\n\n<p>saveAsObjectFile\u5c06\u5206\u533a\u4e2d\u7684\u6bcf10\u4e2a\u5143\u7d20\u7ec4\u6210\u4e00\u4e2aArray\uff0c\u7136\u540e\u5c06\u8fd9\u4e2aArray\u5e8f\u5217\u5316\uff0c\u6620\u5c04\u4e3a\uff08Null\uff0cBytesWritable\uff08Y\uff09\uff09\u7684\u5143\u7d20\uff0c\u5199\u5165HDFS\u4e3aSequenceFile\u7684\u683c\u5f0f\u3002<\/p>\n\n\n\n<p>\u4e0b\u9762\u4ee3\u7801\u4e3a\u51fd\u6570\u5185\u90e8\u5b9e\u73b0\u3002<\/p>\n\n\n\n<p>map\uff08x=&gt;\uff08NullWritable.get\uff08\uff09\uff0cnew BytesWritable\uff08Utils.serialize\uff08x\uff09\uff09\uff09\uff09<\/p>\n\n\n\n<p>\u56fe24\u4e2d\u7684\u5de6\u4fa7\u65b9\u6846\u4ee3\u8868RDD\u5206\u533a\uff0c\u53f3\u4fa7\u65b9\u6846\u4ee3\u8868HDFS\u7684Block\u3002 \u901a\u8fc7\u51fd\u6570\u5c06RDD\u7684\u6bcf\u4e2a\u5206\u533a\u5b58\u50a8\u4e3aHDFS\u4e0a\u7684\u4e00\u4e2aBlock\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-19-630x740.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"630\" height=\"740\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-19-630x740.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-370\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe24 saveAsObjectFile\u7b97\u5b50\u5bf9RDD\u8f6c\u6362<\/p>\n\n\n\n<p><strong>&nbsp;\uff0825\uff09&nbsp;collect<\/strong><\/p>\n\n\n\n<p>collect \u76f8\u5f53\u4e8e toArray\uff0c toArray \u5df2\u7ecf\u8fc7\u65f6\u4e0d\u63a8\u8350\u4f7f\u7528\uff0c collect \u5c06\u5206\u5e03\u5f0f\u7684 RDD \u8fd4\u56de\u4e3a\u4e00\u4e2a\u5355\u673a\u7684 scala Array \u6570\u7ec4\u3002\u5728\u8fd9\u4e2a\u6570\u7ec4\u4e0a\u8fd0\u7528 scala \u7684\u51fd\u6570\u5f0f\u64cd\u4f5c\u3002<\/p>\n\n\n\n<p>\u56fe 25\u4e2d\u5de6\u4fa7\u65b9\u6846\u4ee3\u8868 RDD \u5206\u533a\uff0c\u53f3\u4fa7\u65b9\u6846\u4ee3\u8868\u5355\u673a\u5185\u5b58\u4e2d\u7684\u6570\u7ec4\u3002\u901a\u8fc7\u51fd\u6570\u64cd\u4f5c\uff0c\u5c06\u7ed3\u679c\u8fd4\u56de\u5230 Driver \u7a0b\u5e8f\u6240\u5728\u7684\u8282\u70b9\uff0c\u4ee5\u6570\u7ec4\u5f62\u5f0f\u5b58\u50a8\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-15-451x269.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"451\" height=\"269\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-15-451x269.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-362\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe 25 &nbsp; Collect \u7b97\u5b50\u5bf9 RDD \u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0826\uff09collectAsMap<\/strong><\/p>\n\n\n\n<p>collectAsMap\u5bf9\uff08K\uff0cV\uff09\u578b\u7684RDD\u6570\u636e\u8fd4\u56de\u4e00\u4e2a\u5355\u673aHashMap\u3002 \u5bf9\u4e8e\u91cd\u590dK\u7684RDD\u5143\u7d20\uff0c\u540e\u9762\u7684\u5143\u7d20\u8986\u76d6\u524d\u9762\u7684\u5143\u7d20\u3002<\/p>\n\n\n\n<p>\u56fe26\u4e2d\u7684\u5de6\u4fa7\u65b9\u6846\u4ee3\u8868RDD\u5206\u533a\uff0c\u53f3\u4fa7\u65b9\u6846\u4ee3\u8868\u5355\u673a\u6570\u7ec4\u3002 \u6570\u636e\u901a\u8fc7collectAsMap\u51fd\u6570\u8fd4\u56de\u7ed9Driver\u7a0b\u5e8f\u8ba1\u7b97\u7ed3\u679c\uff0c\u7ed3\u679c\u4ee5HashMap\u5f62\u5f0f\u5b58\u50a8\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-17-441x276.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"441\" height=\"276\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-17-441x276.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-368\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe26 CollectAsMap\u7b97\u5b50\u5bf9RDD\u8f6c\u6362<\/p>\n\n\n\n<p><strong>&nbsp;\u3000\u3000\uff0827\uff09reduceByKeyLocally<\/strong><\/p>\n\n\n\n<p>\u5b9e\u73b0\u7684\u662f\u5148reduce\u518dcollectAsMap\u7684\u529f\u80fd\uff0c\u5148\u5bf9RDD\u7684\u6574\u4f53\u8fdb\u884creduce\u64cd\u4f5c\uff0c\u7136\u540e\u518d\u6536\u96c6\u6240\u6709\u7ed3\u679c\u8fd4\u56de\u4e3a\u4e00\u4e2aHashMap\u3002<\/p>\n\n\n\n<p><strong>&nbsp;\u3000\u3000\uff0828\uff09lookup<\/strong><\/p>\n\n\n\n<p>\u4e0b\u9762\u4ee3\u7801\u4e3alookup\u7684\u58f0\u660e\u3002<\/p>\n\n\n\n<p>lookup\uff08key\uff1aK\uff09\uff1aSeq[V]<\/p>\n\n\n\n<p>Lookup\u51fd\u6570\u5bf9\uff08Key\uff0cValue\uff09\u578b\u7684RDD\u64cd\u4f5c\uff0c\u8fd4\u56de\u6307\u5b9aKey\u5bf9\u5e94\u7684\u5143\u7d20\u5f62\u6210\u7684Seq\u3002 \u8fd9\u4e2a\u51fd\u6570\u5904\u7406\u4f18\u5316\u7684\u90e8\u5206\u5728\u4e8e\uff0c\u5982\u679c\u8fd9\u4e2aRDD\u5305\u542b\u5206\u533a\u5668\uff0c\u5219\u53ea\u4f1a\u5bf9\u5e94\u5904\u7406K\u6240\u5728\u7684\u5206\u533a\uff0c\u7136\u540e\u8fd4\u56de\u7531\uff08K\uff0cV\uff09\u5f62\u6210\u7684Seq\u3002 \u5982\u679cRDD\u4e0d\u5305\u542b\u5206\u533a\u5668\uff0c\u5219\u9700\u8981\u5bf9\u5168RDD\u5143\u7d20\u8fdb\u884c\u66b4\u529b\u626b\u63cf\u5904\u7406\uff0c\u641c\u7d22\u6307\u5b9aK\u5bf9\u5e94\u7684\u5143\u7d20\u3002<\/p>\n\n\n\n<p>\u56fe28\u4e2d\u7684\u5de6\u4fa7\u65b9\u6846\u4ee3\u8868RDD\u5206\u533a\uff0c\u53f3\u4fa7\u65b9\u6846\u4ee3\u8868Seq\uff0c\u6700\u540e\u7ed3\u679c\u8fd4\u56de\u5230Driver\u6240\u5728\u8282\u70b9\u7684\u5e94\u7528\u4e2d\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-18-722x535.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"722\" height=\"535\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-18-722x535.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-367\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe28 &nbsp;lookup\u5bf9RDD\u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0829\uff09&nbsp;count<\/strong><\/p>\n\n\n\n<p>count \u8fd4\u56de\u6574\u4e2a RDD \u7684\u5143\u7d20\u4e2a\u6570\u3002<\/p>\n\n\n\n<p>\u5185\u90e8\u51fd\u6570\u5b9e\u73b0\u4e3a\uff1a<\/p>\n\n\n\n<p>defcount():Long=sc.runJob(this,Utils.getIteratorSize_).sum<\/p>\n\n\n\n<p>\u56fe 29\u4e2d\uff0c\u8fd4\u56de\u6570\u636e\u7684\u4e2a\u6570\u4e3a 5\u3002\u4e00\u4e2a\u65b9\u5757\u4ee3\u8868\u4e00\u4e2a RDD \u5206\u533a\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-6-381x233.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"381\" height=\"233\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-6-381x233.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-352\"\/><\/div><\/figure>\n\n\n\n<p>&nbsp;\u56fe29 count \u5bf9 RDD \u7b97\u5b50\u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0830\uff09top<\/strong><\/p>\n\n\n\n<p>top\u53ef\u8fd4\u56de\u6700\u5927\u7684k\u4e2a\u5143\u7d20\u3002 \u51fd\u6570\u5b9a\u4e49\u5982\u4e0b\u3002<\/p>\n\n\n\n<p>top\uff08num\uff1aInt\uff09\uff08implicit ord\uff1aOrdering[T]\uff09\uff1aArray[T]<\/p>\n\n\n\n<p>\u76f8\u8fd1\u51fd\u6570\u8bf4\u660e\u5982\u4e0b\u3002<\/p>\n\n\n\n<p>\u00b7top\u8fd4\u56de\u6700\u5927\u7684k\u4e2a\u5143\u7d20\u3002<\/p>\n\n\n\n<p>\u00b7take\u8fd4\u56de\u6700\u5c0f\u7684k\u4e2a\u5143\u7d20\u3002<\/p>\n\n\n\n<p>\u00b7takeOrdered\u8fd4\u56de\u6700\u5c0f\u7684k\u4e2a\u5143\u7d20\uff0c\u5e76\u4e14\u5728\u8fd4\u56de\u7684\u6570\u7ec4\u4e2d\u4fdd\u6301\u5143\u7d20\u7684\u987a\u5e8f\u3002<\/p>\n\n\n\n<p>\u00b7first\u76f8\u5f53\u4e8etop\uff081\uff09\u8fd4\u56de\u6574\u4e2aRDD\u4e2d\u7684\u524dk\u4e2a\u5143\u7d20\uff0c\u53ef\u4ee5\u5b9a\u4e49\u6392\u5e8f\u7684\u65b9\u5f0fOrdering[T]\u3002<\/p>\n\n\n\n<p>\u8fd4\u56de\u7684\u662f\u4e00\u4e2a\u542b\u524dk\u4e2a\u5143\u7d20\u7684\u6570\u7ec4\u3002<\/p>\n\n\n\n<p><strong>\uff0831\uff09reduce<\/strong><\/p>\n\n\n\n<p>reduce\u51fd\u6570\u76f8\u5f53\u4e8e\u5bf9RDD\u4e2d\u7684\u5143\u7d20\u8fdb\u884creduceLeft\u51fd\u6570\u7684\u64cd\u4f5c\u3002 \u51fd\u6570\u5b9e\u73b0\u5982\u4e0b\u3002<\/p>\n\n\n\n<p>Some\uff08iter.reduceLeft\uff08cleanF\uff09\uff09<\/p>\n\n\n\n<p>reduceLeft\u5148\u5bf9\u4e24\u4e2a\u5143\u7d20&lt;K\uff0cV&gt;\u8fdb\u884creduce\u51fd\u6570\u64cd\u4f5c\uff0c\u7136\u540e\u5c06\u7ed3\u679c\u548c\u8fed\u4ee3\u5668\u53d6\u51fa\u7684\u4e0b\u4e00\u4e2a\u5143\u7d20&lt;k\uff0cV&gt;\u8fdb\u884creduce\u51fd\u6570\u64cd\u4f5c\uff0c\u76f4\u5230\u8fed\u4ee3\u5668\u904d\u5386\u5b8c\u6240\u6709\u5143\u7d20\uff0c\u5f97\u5230\u6700\u540e\u7ed3\u679c\u3002\u5728RDD\u4e2d\uff0c\u5148\u5bf9\u6bcf\u4e2a\u5206\u533a\u4e2d\u7684\u6240\u6709\u5143\u7d20&lt;K\uff0cV&gt;\u7684\u96c6\u5408\u5206\u522b\u8fdb\u884creduceLeft\u3002 \u6bcf\u4e2a\u5206\u533a\u5f62\u6210\u7684\u7ed3\u679c\u76f8\u5f53\u4e8e\u4e00\u4e2a\u5143\u7d20&lt;K\uff0cV&gt;\uff0c\u518d\u5bf9\u8fd9\u4e2a\u7ed3\u679c\u96c6\u5408\u8fdb\u884creduceleft\u64cd\u4f5c\u3002<\/p>\n\n\n\n<p>\u4f8b\u5982\uff1a\u7528\u6237\u81ea\u5b9a\u4e49\u51fd\u6570\u5982\u4e0b\u3002<\/p>\n\n\n\n<p>f\uff1a\uff08A\uff0cB\uff09=&gt;\uff08A._1+\u201d@\u201d+B._1\uff0cA._2+B._2\uff09<\/p>\n\n\n\n<p>\u56fe31\u4e2d\u7684\u65b9\u6846\u4ee3\u8868\u4e00\u4e2aRDD\u5206\u533a\uff0c\u901a\u8fc7\u7528\u6237\u81ea\u5b9a\u51fd\u6570f\u5c06\u6570\u636e\u8fdb\u884creduce\u8fd0\u7b97\u3002 \u793a\u4f8b<\/p>\n\n\n\n<p>\u6700\u540e\u7684\u8fd4\u56de\u7ed3\u679c\u4e3aV1@[1]V2U\uff01@U2@U3@U4\uff0c12\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-22-617x290.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"617\" height=\"290\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-22-617x290.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-373\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe31 reduce\u7b97\u5b50\u5bf9RDD\u8f6c\u6362<\/p>\n\n\n\n<p><strong>\uff0832\uff09fold<\/strong><\/p>\n\n\n\n<p>fold\u548creduce\u7684\u539f\u7406\u76f8\u540c\uff0c\u4f46\u662f\u4e0ereduce\u4e0d\u540c\uff0c\u76f8\u5f53\u4e8e\u6bcf\u4e2areduce\u65f6\uff0c\u8fed\u4ee3\u5668\u53d6\u7684\u7b2c\u4e00\u4e2a\u5143\u7d20\u662fzeroValue\u3002<\/p>\n\n\n\n<p>\u56fe32\u4e2d\u901a\u8fc7\u4e0b\u9762\u7684\u7528\u6237\u81ea\u5b9a\u4e49\u51fd\u6570\u8fdb\u884cfold\u8fd0\u7b97\uff0c\u56fe\u4e2d\u7684\u4e00\u4e2a\u65b9\u6846\u4ee3\u8868\u4e00\u4e2aRDD\u5206\u533a\u3002 \u8bfb\u8005\u53ef\u4ee5\u53c2\u7167reduce\u51fd\u6570\u7406\u89e3\u3002<\/p>\n\n\n\n<p>fold\uff08\uff08\u201dV0@\u201d\uff0c2\uff09\uff09\uff08 \uff08A\uff0cB\uff09=&gt;\uff08A._1+\u201d@\u201d+B._1\uff0cA._2+B._2\uff09\uff09<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-20-658x296.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"658\" height=\"296\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-20-658x296.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-371\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe32 &nbsp;fold\u7b97\u5b50\u5bf9RDD\u8f6c\u6362<\/p>\n\n\n\n<p><strong>&nbsp;\u3000\u3000\uff0833\uff09aggregate<\/strong><\/p>\n\n\n\n<p>&nbsp;\u3000\u3000aggregate\u5148\u5bf9\u6bcf\u4e2a\u5206\u533a\u7684\u6240\u6709\u5143\u7d20\u8fdb\u884caggregate\u64cd\u4f5c\uff0c\u518d\u5bf9\u5206\u533a\u7684\u7ed3\u679c\u8fdb\u884cfold\u64cd\u4f5c\u3002<\/p>\n\n\n\n<p>aggreagate\u4e0efold\u548creduce\u7684\u4e0d\u540c\u4e4b\u5904\u5728\u4e8e\uff0caggregate\u76f8\u5f53\u4e8e\u91c7\u7528\u5f52\u5e76\u7684\u65b9\u5f0f\u8fdb\u884c\u6570\u636e\u805a\u96c6\uff0c\u8fd9\u79cd\u805a\u96c6\u662f\u5e76\u884c\u5316\u7684\u3002 \u800c\u5728fold\u548creduce\u51fd\u6570\u7684\u8fd0\u7b97\u8fc7\u7a0b\u4e2d\uff0c\u6bcf\u4e2a\u5206\u533a\u4e2d\u9700\u8981\u8fdb\u884c\u4e32\u884c\u5904\u7406\uff0c\u6bcf\u4e2a\u5206\u533a\u4e32\u884c\u8ba1\u7b97\u5b8c\u7ed3\u679c\uff0c\u7ed3\u679c\u518d\u6309\u4e4b\u524d\u7684\u65b9\u5f0f\u8fdb\u884c\u805a\u96c6\uff0c\u5e76\u8fd4\u56de\u6700\u7ec8\u805a\u96c6\u7ed3\u679c\u3002<\/p>\n\n\n\n<p>\u51fd\u6570\u7684\u5b9a\u4e49\u5982\u4e0b\u3002<\/p>\n\n\n\n<p>aggregate[B]\uff08z\uff1a B\uff09\uff08seqop\uff1a \uff08B\uff0cA\uff09 =&gt; B\uff0ccombop\uff1a \uff08B\uff0cB\uff09 =&gt; B\uff09\uff1a B<\/p>\n\n\n\n<p>\u56fe33\u901a\u8fc7\u7528\u6237\u81ea\u5b9a\u4e49\u51fd\u6570\u5bf9RDD \u8fdb\u884caggregate\u7684\u805a\u96c6\u64cd\u4f5c\uff0c\u56fe\u4e2d\u7684\u6bcf\u4e2a\u65b9\u6846\u4ee3\u8868\u4e00\u4e2aRDD\u5206\u533a\u3002<\/p>\n\n\n\n<p>rdd.aggregate\uff08\u201dV0@\u201d\uff0c2\uff09\uff08\uff08A\uff0cB\uff09=&gt;\uff08A._1+\u201d@\u201d+B._1\uff0cA._2+B._2\uff09\uff09\uff0c\uff08A\uff0cB\uff09=&gt;\uff08A._1+\u201d@\u201d+B_1\uff0cA._@+B_.2\uff09\uff09<\/p>\n\n\n\n<p>\u6700\u540e\uff0c\u4ecb\u7ecd\u4e24\u4e2a\u8ba1\u7b97\u6a21\u578b\u4e2d\u7684\u4e24\u4e2a\u7279\u6b8a\u53d8\u91cf\u3002<\/p>\n\n\n\n<p>\u5e7f\u64ad\uff08broadcast\uff09\u53d8\u91cf\uff1a\u5176\u5e7f\u6cdb\u7528\u4e8e\u5e7f\u64adMap Side Join\u4e2d\u7684\u5c0f\u8868\uff0c\u4ee5\u53ca\u5e7f\u64ad\u5927\u53d8\u91cf\u7b49\u573a\u666f\u3002 \u8fd9\u4e9b\u6570\u636e\u96c6\u5408\u5728\u5355\u8282\u70b9\u5185\u5b58\u80fd\u591f\u5bb9\u7eb3\uff0c\u4e0d\u9700\u8981\u50cfRDD\u90a3\u6837\u5728\u8282\u70b9\u4e4b\u95f4\u6253\u6563\u5b58\u50a8\u3002<\/p>\n\n\n\n<p>Spark\u8fd0\u884c\u65f6\u628a\u5e7f\u64ad\u53d8\u91cf\u6570\u636e\u53d1\u5230\u5404\u4e2a\u8282\u70b9\uff0c\u5e76\u4fdd\u5b58\u4e0b\u6765\uff0c\u540e\u7eed\u8ba1\u7b97\u53ef\u4ee5\u590d\u7528\u3002 \u76f8\u6bd4Hadoo\u7684distributed cache\uff0c\u5e7f\u64ad\u7684\u5185\u5bb9\u53ef\u4ee5\u8de8\u4f5c\u4e1a\u5171\u4eab\u3002 Broadcast\u7684\u5e95\u5c42\u5b9e\u73b0\u91c7\u7528\u4e86BT\u673a\u5236\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><div class='fancybox-wrapper lazyload-container-unload' data-fancybox='post-images' href='https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-24-621x287.png'><img class=\"lazyload lazyload-style-2\" src=\"data:image\/svg+xml;base64,PCEtLUFyZ29uTG9hZGluZy0tPgo8c3ZnIHdpZHRoPSIxIiBoZWlnaHQ9IjEiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIgc3Ryb2tlPSIjZmZmZmZmMDAiPjxnPjwvZz4KPC9zdmc+\"  loading=\"lazy\" decoding=\"async\" width=\"621\" height=\"287\" data-original=\"https:\/\/blog.frost-s.com\/wp-content\/uploads\/2020\/12\/image-24-621x287.png\" src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB\/AAffA0nNPuCLAAAAAElFTkSuQmCC\" alt=\"\" class=\"wp-image-375\"\/><\/div><\/figure>\n\n\n\n<p>\u56fe33 &nbsp;aggregate\u7b97\u5b50\u5bf9RDD\u8f6c\u6362<\/p>\n\n\n\n<p>\u2461\u4ee3\u8868V\u3002<\/p>\n\n\n\n<p>\u2462\u4ee3\u8868U\u3002<\/p>\n\n\n\n<p>accumulator\u53d8\u91cf\uff1a\u5141\u8bb8\u505a\u5168\u5c40\u7d2f\u52a0\u64cd\u4f5c\uff0c\u5982accumulator\u53d8\u91cf\u5e7f\u6cdb\u4f7f\u7528\u5728\u5e94\u7528\u4e2d\u8bb0\u5f55\u5f53\u524d\u7684\u8fd0\u884c\u6307\u6807\u7684\u60c5\u666f\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Spark\u7684\u7b97\u5b50\u7684\u5206\u7c7b \u4ece\u5927\u65b9\u5411\u6765\u8bf4\uff0cSpark \u7b97\u5b50\u5927\u81f4\u53ef\u4ee5\u5206\u4e3a\u4ee5\u4e0b\u4e24\u7c7b: &nbsp; &nbsp; &#038;n [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":457,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,19],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.frost-s.com\/index.php\/wp-json\/wp\/v2\/posts\/344"}],"collection":[{"href":"https:\/\/blog.frost-s.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.frost-s.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.frost-s.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.frost-s.com\/index.php\/wp-json\/wp\/v2\/comments?post=344"}],"version-history":[{"count":3,"href":"https:\/\/blog.frost-s.com\/index.php\/wp-json\/wp\/v2\/posts\/344\/revisions"}],"predecessor-version":[{"id":458,"href":"https:\/\/blog.frost-s.com\/index.php\/wp-json\/wp\/v2\/posts\/344\/revisions\/458"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.frost-s.com\/index.php\/wp-json\/wp\/v2\/media\/457"}],"wp:attachment":[{"href":"https:\/\/blog.frost-s.com\/index.php\/wp-json\/wp\/v2\/media?parent=344"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.frost-s.com\/index.php\/wp-json\/wp\/v2\/categories?post=344"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.frost-s.com\/index.php\/wp-json\/wp\/v2\/tags?post=344"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}