資源描述:
《《數(shù)據(jù)挖掘、機器學習和weka》》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在教育資源-天天文庫。
1、數(shù)據(jù)挖掘—實用機器學習技術(shù)及Java實現(xiàn)原書英文版《DataMining—PracticalMachineLearningToolsandTechniqueswithJavaImplementations》,新西蘭IanH.Witten、EibeFrank著WekaAnopensourceframeworkfortextanalysisimplementedinJavathatisbeingdevelopedattheUniversityofWaikatoinNewZealand.http://www.cs.waikato.ac.
2、nz/ml/weka/http://www.mkp.com/datamining/概念:KDD、ML、OLAP與DMKDD(KnowledgeDiscoveryinDatabase)是一種知識發(fā)現(xiàn)的一連串過程。ML(MachineLearning)=KD,不限于Database的數(shù)據(jù)過程:挖掘-數(shù)據(jù)模式-表示-驗證-預測OLAP(OnlineAnalyticalProcess)是數(shù)據(jù)庫在線分析過程。數(shù)據(jù)挖掘(dataMining)只是KDD/ML的一個重要組成部分。DM用在產(chǎn)生假設(shè),而OLAP則用于查證假設(shè)概念:DM與DBData
3、Preparation要占Datamining過程70%工作量「Database」+「Datamining」=會說話的數(shù)據(jù)庫概念:DataMining概念:數(shù)據(jù)挖掘是從大量的數(shù)據(jù)中,抽取出潛在的、有價值的知識(模型或規(guī)則)的過程KeyCharacteristicsofDataMining:LargeamountofdataDiscoveringpreviouslyunknown,hiddeninformationExtractingvaluableinformationMakingimportantbusinessdecision
4、usingtheinformationDM/ML的一些要點Thedataisstoredelectronicallyandthesearchisautomatedbycomputer;Aboutsolvingproblemsbyanalyzingdataalreadypresentindatabases;Definedastheprocessofdiscoveringpatternsindata;Thisbookisabout——Techniquesforfindinganddescribingstructuralpatterns
5、indata.structuralpatterns表示法:表、樹、規(guī)則概念:MachineLearningTolearn:togetknowledgeofstudy,experience,orbeingtaught;tobecomeawarebyinformationorfromobservation;tocommittomemory;tobeinformedof,ascertain(確定);toreceiveinstructionShortcomingswhenitcomestotalkingaboutcomputesIt’sv
6、irtuallyimpossibletotestiflearningasbeanachievedornot.Thistieslearningtoperformanceratherthanknowledge簡單例子:天氣問題*天氣數(shù)據(jù):weather.nominal.arff運行Weka,載入數(shù)據(jù),選擇算法id3預測(決策樹)outlook=rainy
7、windy=TRUE:no
8、windy=FALSE:yes測試方法:采用10Cross-validation的測試結(jié)果:ConfusionMatrix(P.138)和準確率ab<--
9、classifiedas81
10、a=yes14
11、b=noCorrectlyClassifiedInstances1285.7143%IncorrectlyClassifiedInstances214.2857%其他算法:NeuralNetwork數(shù)據(jù)挖掘的過程步驟:見『回顧:DM的步驟』輸入:Concepts,Instances,AttributesConcept四種基本的學習類型Classification,association,clustering,numericprediction不考慮類型,我們把要學習的稱為Concept
12、,而把學習的輸出成為conceptdescriptionInstance:數(shù)據(jù)樣本記錄Attribute:數(shù)據(jù)字段Nominal:outlook:sunny=>noOrdinal:距離無法度量,如hot>mild>coolInterval:距離可度