Vulncode-DB

(last updated: 2022-01-03 )

The vulnerable code database (Vulncode-DB) is a database for vulnerabilities and their corresponding source code if available. The database extends the NVD / CVE data sets with user-supplied information regarding patch links, vulnerable code offsets and descriptions. Particularly, the database intends to make real-world examples of vulnerable code universally accessible and useful.

The underlying code is open-sourced at github.com/google/vulncode-db.

Please note: Vulncode-DB is not an officially supported Google product and this is an experimental alpha version mostly for demonstration purposes. The application might be unreliable, contains many bugs and is not feature complete. Please set your expectations accordingly.

You can stay updated at twitter.com/vulncodedb.


Frequently Asked Questions (FAQ)

Basics


Q: What is the goal of this project?

To make real-world examples of vulnerable code universally accessible and useful. Particular subgoals include:

  1. Educate on how vulnerabilities look like in code and how to spot them. Provide a central place to showcase how vulnerabilities look like in detail. Ever wanted to improve your code auditing abilities? This could become a place where you can start to do so.

  2. Create a real-world data set on vulnerable (open-source) code for tooling and research purposes. Currently, there seems to be no high-quality and real-world data available for this purpose. This data might be useful for research areas like static source code analysis.

Q: How is this useful for me?

Optimally, this project would provide you with all relevant code passages and short descriptions which are relevant for understanding the details of vulnerabilities.

Q: What kind of entries are available?

Vulnerabilities:

  1. with fully annotated code. Example: OpenSSL Heartbleed bug or Python 2.5 buffer overflow bug

  2. without annotation but with reference to the first patch addressing the vulnerability. Examples: Linux entries and SNMP NAT module bug.

  3. raw vulnerability data without a known patch (this includes proprietary software). Example: VideoLAN VLC media player bug.

Q: Where is the data coming from?

The database makes use of the following data sources:

  • NVD / CVE data sets - Basic vulnerability data
  • github.com - Commit metadata and repository contents
  • Custom *.git repositories - The project supports non-Github repositories, too
  • User supplied information - patch links, vulnerable code offsets, descriptions and more

To provide useful context it scans for patch references and makes relevant repository contents directly available.

Please note: Contributing annotations is currently disabled. Please see a rough demo for the intended interface here.

Q: How can I report an issue or feature request?

Please create a bug / feature request at: github.com/google/vulncode-db/issues. Alternatively, as this is an open source project you’re more than welcome to create a pull request.

Development


Q: What technology stack is the project using?

  • Languages: Python and JavaScript
  • Frameworks: Flask (with Jinja2 for templating)
  • Code editor: Microsoft’s Monaco code editor
  • Many other libraries to process repositories and parse patches (see repository contents)

Q: How can I learn more about the code or setup this project myself?

Please take a look at github.com/google/vulncode-db/blob/master/README.md.

Q: Will the data also be available over an API?

Yes it’s planned to provide the project’s data over APIs. Additionally, we want to provide regular database dumps.

Q: How can I contribute?

Any feedback is super welcome. Please feel free to spread the word, contact us or to contribute pull requests to github.com/google/vulncode-db.

Misc


Q: What are the next milestones?

First, we would like to see the feedback and interest in this project. However, some of the major next milestones would be:

  • Automate the integration of new vulnerabilities.
  • Invite a first set of contributors to create annotated entries.
  • Open this up to everyone (similar to the Wikipedia setup) and deploy content moderation.
  • Allow creating and embedding vulnerable code snippets into your websites for example for your own write-ups.

Q: Who are the main developers / whom can I contact for more questions?