資源描述:
《基于微博文本的話題聚類-研究和實(shí)現(xiàn)》由會(huì)員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在教育資源-天天文庫(kù)。
1、AbstractThedevelopmentofWeb2.0technologymakestheeraofbigdatacoming.WiththerapiddevelopmentonsocialnetworksuchaSmicro-blog,itbringsSOmanychallengesondataminingandknowledgediscovery,althoughthemicro—blogenrichesthebigdata.Comparedwiththetraditionaltextdata,themicro—blogd
2、atahaSsomedifferenceamongthepersonalinterest,entertainment,businessmarketing,andthepublicpublicity,etc.Furthermore,themicro-blogdataalsohasitsownpropertiesbothoncontentfragmentationanditsmassdata.Howtoanalyzeandmineitshidinginformationisanimportantresearchtask.Topicclu
3、steringisabasicworkonmicro—blogresearch.Byclusteringrelevantmassdataintoseveralgroupsautomatically,itsprocessingresultscanpresentsomehintsonanalyzingandminingthedata.AsthetraditionalapproachesusuallypresentSOmanyresultswithirrelevantorreplicatedinformation,itisnotfeasi
4、bletoprocesstheaboveproblemseffectively.ThetopicclusteringapproachCangrouptherelevantinformationautomatically.Furthermore,byusingthekeywordextraction,theprocessingvisualresultsareintuitional.Thisthesisdosomeresearchworksbasedonmicro—blogbyusingsomeintelligentalgorithms
5、,andthemainworksareasfollows:Firstly,wepresentsomeusefulapproachesonobtainingmicro—blogstructureddataanddatapre’processingbeforeclustering.Secondly,onthebasisofanalyzingthemicro—blogdata,wedoresearchonselectingusefulfeaturesforfurtherprocessing.Thirdly,wedesignalleffec
6、tivelyclusteringalgorithm.Onanalyzingthemicro-blogdata,wedoresearchonanalyzingwhichperformanceisbetter.Fourthly,weextractkeywordsfromclusteringresultset.ThesekeywordsCanbeusedtovisualthetopicclustering.Fifthly,wedothevisualizationprocessing,andtheresultsisclearandvisua
7、l,SOitisusefulonunderstandingandrecognitionthehiddeninformationbehindthemassdata.Theexperimentalresultsandanalysisshowsthefeasibleoftheproposedapproach.Someexistingproblemsandfurtherworksarealsopresentintheend.KeyWordsTopicclustering;Micro—blog;Featurevector;Visualizat
8、ion;InformationgainIII河北科技大學(xué)碩士學(xué)位論文IV目錄摘要?????????????????????????????????IAbstract?·-???????????????????·??????????一I