資源描述:
《mapreduce框架下并行有序決策樹及有序決策森林》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在學術(shù)論文-天天文庫。
1、HEBEIUNIVERSITY密級:分類號:學校代碼:10075學號:20121027碩士學位論文MapReduce框架下并行有序決策樹及有序決策森林學位申請人:王姍姍指導教師:王熙照教授企業(yè)導師:秦宏偉高級工程師學位類別:工程碩士專業(yè)領(lǐng)域:計算機技術(shù)授予單位:河北大學完成日期:二〇一五年五月ClassifiedIndex:CODE:10075U.D.C:NO:20121027ADissertationfortheDegreeofMasterParallelOrdinalDecisionTreea
2、ndDecisionForestBasedonMapReduceCandidate:WangShanshanSupervisor:Prof.WangXizhaoAdvisorinEnterprise:SE.QinHongweiAcademicDegreeAppliedfor:MasterofEngineeringSpecialty:ComputerTechnologyUniversity:HebeiUniversityDateofAccomplishment:May,2015,2007摘要摘要傳統(tǒng)
3、的有序決策樹能有效處理單調(diào)分類問題。然而,從大數(shù)據(jù)集中用這些算法學習單調(diào)決策樹卻是非常困難的。為了解決從大數(shù)據(jù)集中生成有序決策樹的問題,本文在MapReduce框架下,提出了一種并行處理方法。和傳統(tǒng)的有序決策樹歸納算法類似,我們用有序互信息作為啟發(fā)式來選擇擴展屬性。和現(xiàn)存的有序決策樹歸納算法計算互信息的方法不同,本文應用屬性并行化策略計算有序互信息。在人工生成的大數(shù)據(jù)集上的實驗結(jié)果顯示本文提出的算法是可行的,而且從加速比(speed-up),擴展比(scale-up)和承載比(size-up)三方
4、面證實本文提出的算法是行之有效的?;赩C-DRSA理論完成有序隨機森林算法,并結(jié)合MapReduce計算框架,在Hadoop平臺上對有序隨機森林算法進行了并行化,提高了算法的運行效率;實驗結(jié)果也證實了該算法的可行性及有效性。關(guān)鍵詞有序分類有序決策樹有序互信息有序決策森林MapReduceIAbstractAbstractTraditionalordinaldecisiontree(ODT)caneffectivelydealwithmonotonicclassificationproblems.
5、However,itisverydifficultfortheexistingordinaldecisiontreealgorithmstolearningODTfromlargedatasets.InordertodealwiththeproblemofgeneratinganODTfromlargedatasets,thispaperpresentsaparallelprocessingmechanismintheframeworkofMapReduce.Similartothetraditi
6、onalordinaldecisiontreeinductivealgorithms,therankmutualinformation(RMI)isstillusedtoselecttheextendedattributes.DifferingfromthecalculationofRMIintheexistingordinaldecisiontreeinductivealgorithms,thispaperappliesastrategyofattributeparallelizationtoc
7、alculatetheRMI.Experimentsonlargeordereddatasets(whicharegeneratedartificially)confirmthatourproposedalgorithmisfeasible.Experimentalresultsshowthatouralgorithmiseffectiveandefficientfromthreeaspects:speed-up,scale-upandsize-up.Basedonthevariableconsi
8、stencydominancebasedroughsetapproach(VC-DRSA),anordinalrandomforestalgorithmisproposedinthispaper.CombiningwiththecomputingframeworkofMapReduce,theproposedordinaldecisionforestalgorithmisparalleledontheplatformofHadoop,whichimprovestheefficien