How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Repositories

Michael Meli (North Carolina State University), Matthew R. McNiece (Cisco Systems and North Carolina State University), Bradley Reaves (North Carolina State University)

GitHub and similar platforms have made public collaborative development of software commonplace. However, a problem arises when this public code must manage authentication secrets, such as API keys or cryptographic secrets. These secrets must be kept private for security, yet common development practices like adding these secrets to code make accidental leakage frequent. In this paper, we present the first large-scale and longitudinal analysis of secret leakage on GitHub. We examine billions of files collected using two complementary approaches: a nearly six-month scan of real-time public GitHub commits and a public snapshot covering 13% of open-source repositories. We focus on private key files and 11 high-impact platforms with distinctive API key formats. This focus allows us to develop conservative detection techniques that we manually and automatically evaluate to ensure accurate results. We find that not only is secret leakage pervasive — affecting over 100,000 repositories— but that thousands of new, unique secrets are leaked every day. We also use our data to explore possible root causes of leakage and to evaluate potential mitigation strategies. This work shows that secret leakage on public repository platforms is rampant and far from a solved problem, placing developers and services at persistent risk of compromise and abuse.

Paper

Video

View More Papers

Measurement and Analysis of Hajime, a Peer-to-peer IoT Botnet

Stephen Herwig (University of Maryland), Katura Harvey (University of Maryland, Max Planck Institute for Software Systems (MPI-SWS)), George Hughey (University of Maryland), Richard Roberts (University of Maryland, Max Planck Institute for Software Systems (MPI-SWS)), Dave Levin (University of Maryland)

One Engine To Serve 'em All: Inferring Taint Rules...

Zheng Leong Chua (National University of Singapore), Yanhao Wang (TCA/SKLCS, Institute of Software, Chinese Academy of Sciences), Teodora Baluta (National University of Singapore), Prateek Saxena (National University of Singapore), Zhenkai Liang (National University of Singapore), Purui Su (TCA/SKLCS, Institute of Software, Chinese Academy of Sciences)

Stealthy Adversarial Perturbations Against Real-Time Video Classification Systems

Shasha Li (University of California Riverside), Ajaya Neupane (University of California Riverside), Sujoy Paul (University of California Riverside), Chengyu Song (University of California Riverside), Srikanth V. Krishnamurthy (University of California Riverside), Amit K. Roy Chowdhury (University of California Riverside), Ananthram Swami (United States Army Research Laboratory)

Nearby Threats: Reversing, Analyzing, and Attacking Google’s ‘Nearby Connections’...

Daniele Antonioli (Singapore University of Technology and Design (SUTD)), Nils Ole Tippenhauer (CISPA), Kasper Rasmussen (University of Oxford)