GitHub action to automatically close listings that are inactive + removed bloat in listing titles #4815
+111
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes Made
1. New GitHub action to automatically close non-Simplify listings that are inactive
Wrote
validate_listings.py
, a script that takes all the listings fromlistings.json
, and if they are not sourced from Simplify, checks if they are still active. Does this through headless browsing with Selenium. Checks if the page 404s, or contains any words (via regex) that may indicate that the listing is no longer active ("job not found", "no longer accepting applications", etc.) If a listing is deemed to be inactive, it's changed to be inactive.This script is ran daily via
validate_listings.yml
on all the listings as a GitHub action. Can also be activated manually.2. Dash Delimiting for Title Bloat
To clean up the bloat in titles in the internship listings, the script also removes extra parts of job titles buy delimiting based on the dash.
Testing
To test the script, I created a copy of the same listings in
test_listings.json
and ran the script on it several times. I observed that the script successfully removed the listings that were no longer active (checked by clicking on the links later). Additionally, I temporarily had the listings found to be inactive added to aclosed_listings.json
file that I inspected to further make sure it went well.For the dash delimiting, I had the script print out the titles of the jobs, confirming it behaved as expected.