資源描述:
《Fast_Set_Intersection_in_Memory》由會(huì)員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫(kù)。
1、FastSetIntersectioninMemoryBolinDingArndChristianKonig¨UniversityofIllinoisatUrbana-ChampaignMicrosoftResearch201N.GoodwinAvenueOneMicrosoftWayUrbana,IL61801,USARedmond,WA98052,USAbding3@uiuc.educhrisko@microsoft.comABSTRACTrithmsbecomepossible,whichoftenoutperforminvertedindexes;e.g.,usinghash-ba
2、seddictionaries,intersectingtwosetsL1,L2SetintersectionisafundamentaloperationininformationretrievalrequiresexpectedtimeO(min(jL1j;jL2j)),whichisafactorofanddatabasesystems.Thispaperintroduceslinearspacedatastruc-(log(1+max(jL1j=jL2j;jL1j=jL2j)))betterthanthebestpos-turestorepresentsetssuchthatth
3、eirintersectioncanbecomputedsibleworst-caseperformanceofcomparison-basedalgorithms[6].inaworst-caseef?cientway.Ingeneral,givenk(preprocessed)Inthiswork,weproposenewsetintersectionalgorithmsaimedatsets,withtotallynelements,wewillshowhowtocomputetheirpfastperformance.Theseoutperformthecompetingtechn
4、iquesforintersectioninexpectedtimeO(n=w+kr),whereristhein-mostinputsandarealsorobustinthat–forinputswheretheyaretersectionsizeandwisthenumberofbitsinamachine-word.Innotoptimal–theyareclosetothebest-performingalgorithm.Theaddition,weintroduceaverysimpleversionofthisalgorithmthattradeoffforthisgaini
5、saslightincreaseinthesizeofthedatastruc-hasweakerasymptoticguaranteesbutperformsevenbetterinprac-tures,whencomparedtoaninvertedindex;however,inuser-facingtice;bothalgorithmsoutperformthestateofthearttechniquesforscenarioswherelatencyiscrucial,thistradeoffisoftenacceptable.bothsyntheticandrealdatas
6、etsandworkloads.1.1Contributions1.INTRODUCTIONOurapproachleveragestwokeyobservations:(a)IfwisthesizeFastprocessingofsetintersectionsisakeyoperationinmany(inbits)ofamachine-word,wecanencodeasetfromauniversequeryprocessingtasksinthecontextofdatabasesandinformationofwelementsinasinglemachineword,allo
7、wingforveryfastin-retrieval.Forexample,inthecontextofdatabases,setintersectionstersections.(b)Forthedatadistributionsseeninmanyreal-lifeex-areusedinthecontextofvariousformsofdatamining,textanalyt-amples(inparticu