資源描述:
《一種有效的挖掘數(shù)據(jù)流近似頻繁項算法》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在行業(yè)資料-天天文庫。
1、ISSN1000-9825,CODENRUXUEWE-mail:jos@iscas.ac.cnJournalofSoftware,Vol.18,No.4,April2007,pp.884?892http://www.jos.org.cnDOI:10.1360/jos180884Tel/Fax:+86-10-62562563?2007byJournalofSoftware.Allrightsreserved.?一種有效的挖掘數(shù)據(jù)流近似頻繁項算法1+1,211,2王偉平,李建中,張冬冬,郭龍江1(哈爾濱工業(yè)大學(xué)計算機(jī)科學(xué)與技術(shù)
2、學(xué)院,黑龍江哈爾濱150001)2(黑龍江大學(xué)計算機(jī)科學(xué)與技術(shù)學(xué)院,黑龍江哈爾濱150080)AnEfficientAlgorithmforMiningApproximateFrequentItemoverDataStreams1+1,211,2WANGWei-Ping,LIJian-Zhong,ZHANGDong-Dong,GUOLong-Jiang1(SchoolofComputerScienceandTechnology,HarbinInstituteofTechnology,Harbin150001,China)2
3、(SchoolofComputerScienceandTechnology,HeilongjiangUniversity,Harbin150080,China)+Correspondingauthor:Phn:+86-451-6415872,E-mail:wpwang@hit.edu.cn,http://www.hit.edu.cnWangWP,LiJZ,ZhangDD,GuoLJ.Anefficientalgorithmforminingapproximatefrequentitemoverdatastreams.Jou
4、rnalofSoftware,2007,18(4):884?892.http://www.jos.org.cn/1000-9825/18/884.htmAbstract:Afrequentitemofadatastreamisadatapointwhoseoccurrencefrequencyisaboveagiventhreshold.Findingfrequentitemofdatastreamhaswideapplicationsinvariousfields,suchasnetworktrafficmonitor,
5、datastreamOLAPanddatastreammining,etc.Indatastreammodel,thealgorithmcanonlyscanthedatainonepassandtheavailablememoryspaceisverylimitedrelativetothevolumeofadatastream,thereforeitisusuallyunabletofindalltheaccuratefrequentitemsofadatastream.Thispaperproposesanovela
6、lgorithmtofindε-approximate?1frequentitemsofadatastream,itsspacecomplexityisO(ε)andtheprocessingtimeforeachitemisO(1)inaverage.Moreover,thefrequencyerrorboundoftheresultsreturnedbytheproposedalgorithmisε(1?s+ε)N.Amongalltheexistedapproaches,thismethodisthebest.Key
7、words:datastream;datamining;frequentitem;ε-approximate摘要:數(shù)據(jù)流頻繁項是指在數(shù)據(jù)流中出現(xiàn)頻率超出指定閾值的數(shù)據(jù)項.查找數(shù)據(jù)流頻繁項在網(wǎng)絡(luò)故障監(jiān)測、流數(shù)據(jù)分析以及流數(shù)據(jù)挖掘等多個領(lǐng)域有著廣泛的應(yīng)用.在數(shù)據(jù)流模型下,算法只能一遍掃描數(shù)據(jù),并且可用的存儲空間遠(yuǎn)遠(yuǎn)小于數(shù)據(jù)流的規(guī)模,因此,挖掘出所有準(zhǔn)確的數(shù)據(jù)流頻繁項通常是不可能的.提出一種新的挖掘數(shù)據(jù)流近似頻?1繁項的算法.該算法的空間復(fù)雜性為O(ε),每個數(shù)據(jù)項的平均處理時間為O(1),輸出結(jié)果的頻率誤差界限為ε(1?s+ε
8、)N,在目前已有的同類算法中均為最優(yōu).關(guān)鍵詞:數(shù)據(jù)流;數(shù)據(jù)挖掘;頻繁項;ε-近似中圖法分類號:TP311文獻(xiàn)標(biāo)識碼:A?SupportedbytheKeyProgramoftheNationalNaturalScienceFoundationofChinaunderGrantNo.60533110(國家