資源描述:
《基于集群的并行分布式聚類及其應(yīng)用_英文_.pdf》由會(huì)員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在行業(yè)資料-天天文庫(kù)。
1、第38卷第4期鄭州大學(xué)學(xué)報(bào)(理學(xué)版)Vol138No142006年12月J.ofZhengzhouUniv.(Nat.Sci.Ed.)Dec12006Cluster2PCBasedParallelDistributedDataClusteringandItsApplicationsXIASheng2ping,LU¨Xiao2jun,LIUJian2jun,YUANZhen2tao,YUWen2xian(ATRStateKeyLaboratory,NationalUniversityofDefenseTechnology,Changsha410073,China)Abstract:Clus
2、teringdatawithhighdimensionalitiesrequireshigh2performancecomputerstogetresultsinareasonableamountoftime,particularlyforextremelylarge2scaledatabases.Thus,therecursiveSOM(RSOM)treemethodisproposed.RSOMtreeisahierarchyofclustersandsub2clusterswhichincorporatestheclusterrepresentationintotheindexst
3、ructure.Itprovidesapracticalsolutiontoindexclustereddataset,anditsupportstheretrievalofthenearest2neighborseffectivelyandefficientlywithouthavingtolinearlysearchahigh2dimensionallargedatabase.Meanwhile,anincrementalRSOMtree2basedclusteringalgorithmisproposed;andbecauseoftheRSOMtreeisofthenatureof
4、parallelism,andcanbeimplementedonscalableparallelcomputers.Thusacluster2systembaseddistributedparallelalgorithmofincrementalRSOMtreeisproposed.Theperformanceofthemethodhasbeentestedwithhighdimensionalfeaturesetsextractedfromlargeimagedatabase.Keywords:paralleldistributedclustering;recursiveSOMtre
5、e;cluster2computer;incrementalclusteringCLCnumber:TP311ArticleID:1671-6841(2006)04-0033-080IntroductionDataclusteringisanimportantandbasictechnologyindomainssuchasdatamining,imageprocessingandpatternrecognition,andhasbeenunderwideresearchforalongtime.Variantclusteringalgorithms[1][2-3]havebeenpro
6、posed,andtheirmethodscanbegroupedintopartitionbased,hierarchybased,grid[4][5]basedandsubspacebased.Allthesealgorithmsneedthewholedatasetsbeseriallyprocessedatonetime.However,inthedailyworkofclusteringapplications,dataacquiredarechangingfrequently,andthenumberofthemmayexceedtensofmillions,thenumbe
7、rsofthedataandthepatternsinthemareincreasingdynamically.Therearetwosolutionsfortheupdateddata.Thefirstwayistorerunthealgorithm.Thetemporalandspatialcomplexityofdataclusteringwithhighdimensionalityisveryhigh.Ifalldataar