-
cvefixes extractor code into dict
-
govulndb extractor
-
nvddb extractor
-
cvefixes + govulndb combine script
-
nvd schema
-
osv schema
-
nvd fetcher using json script
-
govulns to json
-
cvefixes + govulns to json
-
get cve info using all 3 sources
-
validate osv script
-
dump and checkin
-
get filechanges using cvefixes in script
-
repo url list from cves
-
github extractor
-
per repo, gather repo metadata -> check for bulk apis
-
save repo metadata in repos section of cveinfo and overwrite if option
-
get associated commits one by one
-
calculate commits metadata and add in cveinfo
-
create file changes and method changes in filechanges json
-
Views of the dataset to evaluate
- The view should contain cveID, cwe, fixes, tokens
- Create multiple useful views based on different samples, but the view should be consistent
- Ideally the view should be created and stored in a json file so that it can be read and processed deterministically each time
- Dataset needs to be balanced
- Ideally some token metrics info related to the dataset needs to be present to do a dry run wrt cost of the dataset run