資源描述:
《Latent Fault Detection in Large Scale Services》由會員上傳分享,免費在線閱讀,更多相關內容在學術論文-天天文庫。
1、LatentFaultDetectioninLargeScaleServicesMosheGabel,AssafSchusterRan-GiladBachrach,NikolajBj?rnerDepartmentofComputerScienceMicrosoftResearchTechnion–IsraelInstituteofTechnologyMicrosoftHaifa,IsraelRedmond,WA,USAfmgabel,assafg@cs.technion.ac.ilfrang,nbjornerg@microsoft.comAbstract—Unexpect
2、edmachinefailures,withtheirresultingiscrossed,anactionistriggered.Theseactionsrangefromserviceoutagesanddataloss,posechallengestodatacenterman-notifyingthesystemoperatortoautomaticrecoveryattempts.agement.ExistingfailuredetectiontechniquesrelyondomainRule-basedfailuredetectionsuffersfroms
3、everalkeyprob-knowledge,precious(oftenunavailable)trainingdata,textuallems.Thresholdsmustbemadelowenoughthatfaultswillconsolelogs,orintrusiveservicemodi?cations.Wehypothesizethatmanymachinefailuresarenotaresultofnotgounnoticed.Atthesametimetheyshouldbesetabruptchangesbutratheraresultofalo
4、ngperiodofdegradedhighenoughtoavoidspuriousdetections.However,sincetheperformance.Thisiscon?rmedinourexperiments,inwhichoverworkloadchangesovertime,no?xedthresholdisadequate.20%ofmachinefailureswereprecededbysuchlatentfaults.Moreover,differentservices,orevendifferentversionsoftheWepropose
5、aproactiveapproachforfailureprevention.Wesameservice,mayhavedifferentoperatingpoints.Therefore,presentanovelframeworkforstatisticallatentfaultdetectionusingonlyordinarymachinecounterscollectedasstandardmaintainingtherulesrequiresconstant,manualadjustments,practice.Wedemonstratethreedetect
6、ionmethodswithinthisoftendoneonlyaftera“postmortem”examination.framework.Derivedtestsaredomain-independentandunsuper-Othershavenoticedtheshortcomingsoftheserule-basedvised,requireneitherbackgroundinformationnortuning,andapproaches.[8],[9]proposedtrainingadetectoronhistoricscaletoverylarge
7、services.Weprovestrongguaranteesontheannotateddata.However,suchapproachesfallshortduetofalsepositiveratesofourtests.IndexTerms—faultdetection;webservices;statisticalanalysis;thedif?cultyinobtainingthisdata,aswellasthesensitivitydistributedcomputing;statisticallearni