資源描述:
《mapreduce框架下并行有序決策樹及有序決策森林》由會(huì)員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫(kù)。
1、HEBEIUNIVERSITY密級(jí):分類號(hào):學(xué)校代碼:10075學(xué)號(hào):20121027碩士學(xué)位論文MapReduce框架下并行有序決策樹及有序決策森林學(xué)位申請(qǐng)人:王姍姍指導(dǎo)教師:王熙照教授企業(yè)導(dǎo)師:秦宏偉高級(jí)工程師學(xué)位類別:工程碩士專業(yè)領(lǐng)域:計(jì)算機(jī)技術(shù)授予單位:河北大學(xué)完成日期:二〇一五年五月ClassifiedIndex:CODE:10075U.D.C:NO:20121027ADissertationfortheDegreeofMasterParallelOrdinalDecisionTreea
2、ndDecisionForestBasedonMapReduceCandidate:WangShanshanSupervisor:Prof.WangXizhaoAdvisorinEnterprise:SE.QinHongweiAcademicDegreeAppliedfor:MasterofEngineeringSpecialty:ComputerTechnologyUniversity:HebeiUniversityDateofAccomplishment:May,2015,2007摘要摘要傳統(tǒng)
3、的有序決策樹能有效處理單調(diào)分類問題。然而,從大數(shù)據(jù)集中用這些算法學(xué)習(xí)單調(diào)決策樹卻是非常困難的。為了解決從大數(shù)據(jù)集中生成有序決策樹的問題,本文在MapReduce框架下,提出了一種并行處理方法。和傳統(tǒng)的有序決策樹歸納算法類似,我們用有序互信息作為啟發(fā)式來(lái)選擇擴(kuò)展屬性。和現(xiàn)存的有序決策樹歸納算法計(jì)算互信息的方法不同,本文應(yīng)用屬性并行化策略計(jì)算有序互信息。在人工生成的大數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果顯示本文提出的算法是可行的,而且從加速比(speed-up),擴(kuò)展比(scale-up)和承載比(size-up)三方
4、面證實(shí)本文提出的算法是行之有效的?;赩C-DRSA理論完成有序隨機(jī)森林算法,并結(jié)合MapReduce計(jì)算框架,在Hadoop平臺(tái)上對(duì)有序隨機(jī)森林算法進(jìn)行了并行化,提高了算法的運(yùn)行效率;實(shí)驗(yàn)結(jié)果也證實(shí)了該算法的可行性及有效性。關(guān)鍵詞有序分類有序決策樹有序互信息有序決策森林MapReduceIAbstractAbstractTraditionalordinaldecisiontree(ODT)caneffectivelydealwithmonotonicclassificationproblems.
5、However,itisverydifficultfortheexistingordinaldecisiontreealgorithmstolearningODTfromlargedatasets.InordertodealwiththeproblemofgeneratinganODTfromlargedatasets,thispaperpresentsaparallelprocessingmechanismintheframeworkofMapReduce.Similartothetraditi
6、onalordinaldecisiontreeinductivealgorithms,therankmutualinformation(RMI)isstillusedtoselecttheextendedattributes.DifferingfromthecalculationofRMIintheexistingordinaldecisiontreeinductivealgorithms,thispaperappliesastrategyofattributeparallelizationtoc
7、alculatetheRMI.Experimentsonlargeordereddatasets(whicharegeneratedartificially)confirmthatourproposedalgorithmisfeasible.Experimentalresultsshowthatouralgorithmiseffectiveandefficientfromthreeaspects:speed-up,scale-upandsize-up.Basedonthevariableconsi
8、stencydominancebasedroughsetapproach(VC-DRSA),anordinalrandomforestalgorithmisproposedinthispaper.CombiningwiththecomputingframeworkofMapReduce,theproposedordinaldecisionforestalgorithmisparalleledontheplatformofHadoop,whichimprovestheefficien