1ã Spark VSHadoopæåªäºå¼åç¹ï¼
Hadoop:åå¸å¼æ¹å¤ç计ç®ï¼å¼ºè°æ¹å¤çï¼å¸¸ç¨äºæ°æ®ææãåæ
Spark:æ¯ä¸ä¸ªåºäºå
å计ç®çå¼æºçé群计ç®ç³»ç»ï¼ç®çæ¯è®©æ°æ®åææ´å å¿«é, Spark æ¯ä¸ç§ä¸ Hadoop ç¸ä¼¼çå¼æºé群计ç®ç¯å¢ï¼ä½æ¯ä¸¤è
ä¹é´è¿åå¨ä¸äºä¸åä¹å¤ï¼è¿äºæç¨çä¸åä¹å¤ä½¿ Spark å¨æäºå·¥ä½è´è½½æ¹é¢è¡¨ç°å¾æ´å ä¼è¶ï¼æ¢å¥è¯è¯´ï¼Spark å¯ç¨äºå
ååå¸æ°æ®éï¼é¤äºè½å¤æä¾äº¤äºå¼æ¥è¯¢å¤ï¼å®è¿å¯ä»¥ä¼åè¿ä»£å·¥ä½è´è½½ã
Spark æ¯å¨ Scala è¯è¨ä¸å®ç°çï¼å®å° Scala ç¨ä½å
¶åºç¨ç¨åºæ¡æ¶ãä¸ Hadoop ä¸åï¼Spark å Scala è½å¤ç´§å¯éæï¼å
¶ä¸ç Scala å¯ä»¥åæä½æ¬å°éå对象ä¸æ ·è½»æ¾å°æä½åå¸å¼æ°æ®éã
尽管å建 Spark æ¯ä¸ºäºæ¯æåå¸å¼æ°æ®éä¸çè¿ä»£ä½ä¸ï¼ä½æ¯å®é
ä¸å®æ¯å¯¹ Hadoop çè¡¥å
ï¼å¯ä»¥å¨ Hadoop æ件系ç»ä¸å¹¶è¡è¿è¡ãéè¿å为Mesosç第ä¸æ¹é群æ¡æ¶å¯ä»¥æ¯ææ¤è¡ä¸ºãSpark ç±å å·å¤§å¦ä¼¯å
å©åæ ¡ AMP å®éªå®¤ (Algorithms,Machines,and People Lab) å¼åï¼å¯ç¨æ¥æ建大åçãä½å»¶è¿çæ°æ®åæåºç¨ç¨åºã
è½ç¶ Spark ä¸ Hadoop æç¸ä¼¼ä¹å¤ï¼ä½å®æä¾äºå
·ææç¨å·®å¼çä¸ä¸ªæ°çé群计ç®æ¡æ¶ãé¦å
ï¼Spark æ¯ä¸ºé群计ç®ä¸çç¹å®ç±»åçå·¥ä½è´è½½è设计ï¼å³é£äºå¨å¹¶è¡æä½ä¹é´éç¨å·¥ä½æ°æ®éï¼æ¯å¦æºå¨å¦ä¹ ç®æ³ï¼çå·¥ä½è´è½½ã为äºä¼åè¿äºç±»åçå·¥ä½è´è½½ï¼Spark å¼è¿äºå
åé群计ç®çæ¦å¿µï¼å¯å¨å
åé群计ç®ä¸å°æ°æ®éç¼åå¨å
åä¸ï¼ä»¥ç¼©ç访é®å»¶è¿.
å¨å¤§æ°æ®å¤çæ¹é¢ç¸ä¿¡å¤§å®¶å¯¹hadoopå·²ç»è³çè½è¯¦ï¼åºäºGoogleMap/Reduceæ¥å®ç°çHadoop为å¼åè
æä¾äºmapãreduceåè¯ï¼ä½¿å¹¶è¡æ¹å¤çç¨åºåå¾é常å°ç®ååä¼ç¾ãSparkæä¾çæ°æ®éæä½ç±»åæå¾å¤ç§ï¼ä¸åHadoopåªæä¾äºMapåReduce两ç§æä½ãæ¯å¦map,filter, flatMap,sample, groupByKey, reduceByKey, union,join, cogroup,mapValues, sort,partionByçå¤ç§æä½ç±»åï¼ä»ä»¬æè¿äºæä½ç§°ä¸ºTransformationsãåæ¶è¿æä¾Count,collect, reduce, lookup, saveçå¤ç§actionsãè¿äºå¤ç§å¤æ ·çæ°æ®éæä½ç±»åï¼ç»ä¸å±åºç¨è
æä¾äºæ¹ä¾¿ãå个å¤çèç¹ä¹é´çé信模åä¸ååHadoopé£æ ·å°±æ¯å¯ä¸çData Shuffleä¸ç§æ¨¡å¼ãç¨æ·å¯ä»¥å½åï¼ç©åï¼æ§å¶ä¸é´ç»æçååºçãå¯ä»¥è¯´ç¼ç¨æ¨¡åæ¯Hadoopæ´çµæ´».
2ãSparkå¨å®¹éæ§æ¹é¢æ¯å¦æ¯å
¶ä»å·¥å
·æ´æä¼è¶æ§ï¼
ä»Sparkç论æãResilient Distributed Datasets: AFault-TolerantAbstraction for In-Memory Cluster Computingãä¸æ²¡çåºå®¹éæ§åçæå¤å¥½ãåæ¯æå°äºåå¸å¼æ°æ®é计ç®ï¼åcheckpointç两ç§æ¹å¼ï¼ä¸ä¸ªæ¯checkpoint dataï¼ä¸ä¸ªæ¯loggingthe updatesãè²ä¼¼Sparkéç¨äºåè
ãä½æ¯æä¸åæ¥åæå°ï¼è½ç¶åè
çä¼¼èçåå¨ç©ºé´ãä½æ¯ç±äºæ°æ®å¤ç模åæ¯ç±»ä¼¼DAGçæä½è¿ç¨ï¼ç±äºå¾ä¸çæ个èç¹åºéï¼ç±äºlineage chainsçä¾èµå¤ææ§ï¼å¯è½ä¼å¼èµ·å
¨é¨è®¡ç®èç¹çéæ°è®¡ç®ï¼è¿æ ·ææ¬ä¹ä¸ä½ãä»ä»¬åæ¥è¯´ï¼æ¯åæ°æ®ï¼è¿æ¯åæ´æ°æ¥å¿ï¼åcheckpointè¿æ¯ç±ç¨æ·è¯´äºç®å§ãç¸å½äºä»ä¹é½æ²¡è¯´ï¼åæè¿ä¸ªç®ç踢ç»äºç¨æ·ãæ以æçå°±æ¯ç±ç¨æ·æ ¹æ®ä¸å¡ç±»åï¼è¡¡éæ¯åå¨æ°æ®IOåç£ç空é´ç代价åéæ°è®¡ç®ç代价ï¼éæ©ä»£ä»·è¾å°çä¸ç§çç¥ãå代ç»ä¸é´ç»æè¿è¡æä¹
åæ建ç«æ£æ¥ç¹ï¼Sparkä¼è®°ä½äº§çæäºæ°æ®éçæä½åºåãå æ¤ï¼å½ä¸ä¸ªèç¹åºç°æ
éæ¶ï¼Sparkä¼æ ¹æ®åå¨ä¿¡æ¯éæ°æé æ°æ®éãä»ä»¬è®¤ä¸ºè¿æ ·ä¹ä¸éï¼å 为å
¶ä»èç¹å°ä¼å¸®å©é建ã
3ãSpark对äºæ°æ®å¤çè½ååæçæåªäºç¹è²ï¼
Sparkæä¾äºé«çæ§è½å大æ°æ®å¤çè½åï¼ä½¿å¾ç¨æ·å¯ä»¥å¿«éå¾å°åé¦ä½éªæ´å¥½ãå¦ä¸ç±»åºç¨æ¯åæ°æ®ææï¼å 为Sparkå
åå©ç¨å
åè¿è¡ç¼åï¼å©ç¨DAGæ¶é¤ä¸å¿
è¦çæ¥éª¤ï¼æ以æ¯è¾åéåè¿ä»£å¼çè¿ç®ãèæç¸å½ä¸é¨åæºå¨å¦ä¹ ç®æ³æ¯éè¿å¤æ¬¡è¿ä»£æ¶æçç®æ³ï¼æ以éåç¨Sparkæ¥å®ç°ãæ们æä¸äºå¸¸ç¨çç®æ³å¹¶è¡åç¨Sparkå®ç°ï¼å¯ä»¥ä»Rè¯è¨ä¸æ¹ä¾¿å°è°ç¨ï¼éä½äºç¨æ·è¿è¡æ°æ®ææçå¦ä¹ ææ¬ã
Sparké
æä¸ä¸ªæµæ°æ®å¤ç模åï¼ä¸Twitterç Stormæ¡æ¶ç¸æ¯ï¼Sparkéç¨äºä¸ç§æ趣èä¸ç¬ç¹çåæ³ãStormåºæ¬ä¸æ¯åæ¯æ¾å
¥ç¬ç«äºå¡ç管éï¼å¨å
¶ä¸äºå¡ä¼å¾å°åå¸å¼çå¤çãç¸åï¼Sparkéç¨ä¸ä¸ªæ¨¡åæ¶éäºå¡ï¼ç¶åå¨çæ¶é´å
ï¼æ们å设æ¯5ç§ï¼ä»¥æ¹å¤ççæ¹å¼å¤çäºä»¶ãææ¶éçæ°æ®æ为ä»ä»¬èªå·±çRDDï¼ç¶å使ç¨Sparkåºç¨ç¨åºä¸å¸¸ç¨çä¸ç»è¿è¡å¤çãä½è
声称è¿ç§æ¨¡å¼æ¯å¨ç¼æ
¢èç¹åæ
éæ
åµä¸ä¼æ´å 稳å¥ï¼èä¸5ç§çæ¶é´é´éé常对äºå¤§å¤æ°åºç¨å·²ç»è¶³å¤å¿«äºãè¿ç§æ¹æ³ä¹å¾å¥½å°ç»ä¸äºæµå¼å¤çä¸éæµå¼å¤çé¨åã
æ»ç»
è¿å 天å¨çHadoopæå¨æåãhbaseæå¨æåãhiveæå¨æåã大è§æ¨¡åå¸å¼åå¨ç³»ç»ãzoopkeeperã大æ°æ®äºèç½å¤§è§æ¨¡æ°æ®ææä¸åå¸å¼å¤çç书åæ¶è¡¥å
ï¼è½éä¸å¿æ¥å¥½å¥½çå®æ´ççå®ä¸æ¬ä¹¦ï¼æ¯ç¸å½ä¸éçã
温馨提示:答案为网友推荐,仅供参考