The GHTorent dataset and tool suite 论文
2013引用 372
Cloud Computing and Resource ManagementDistributed and Parallel Computing SystemsScientific Computing and Data Management
摘要
During the last few years, GitHub has emerged as a popular project hosting, mirroring and collaboration platform. GitHub provides an extensive REST API, which enables researchers to retrieve high-quality, interconnected data. The GHTorent project has been collecting data for all public projects available on Github for more than a year. In this paper, we present the dataset details and construction process and outline the challenges and research opportunities emerging from it.