COSMOS v3.1.0 Release Notes
Overview
COSMOS v3.1.0 introduces a major new machine learning classification pipeline and significant improvements to system reliability and user experience. The centerpiece of this release is the ML Classification Queue system, which enables automated document classification via the separate inference API. This is currently used to populate the Time Domain and Multi-Messenger Astronomy (TDAMM) portal by automatically tagging astrophysics content for specialized discovery. In the future, we will leverage it for division and document type tagging, as well as other metadata automation tasks.
This release also includes comprehensive testing infrastructure improvements with new frontend and backend test suites, ensuring code quality and reliability. The user interface has been enhanced with multiple usability fixes for common workflows, and several critical bugs have been resolved to improve system stability. Administrative capabilities have been expanded with better logging, form validation, and API enhancements.
Major Features
ML Classification Pipeline
- New Classification System: Implemented a robust job processing mechanism to batch URLs for the inference API
- Smart Batching: Added intelligent text length management with configurable maximums
- Comprehensive Job Tracking: New models to track individual jobs sent to the API:
- ModelVersion: Tracks multiple versions of classification models
- InferenceJob: Manages jobs for collections of URLs
- ExternalJob: Represents batched jobs sent to the inference API
- Status Management: Complete workflow with status tracking (queued, pending, completed, failed, cancelled)
- Classification Threshold Processing: Implemented class-based thresholding for classification results
- TDAMM Tag Updates: Removed redundant tags and added missing ones for more accurate classification
- Celery Integration: Scheduled processing during off-hours on weekdays and continuously on weekends
- Admin: New admin panels for viewing Model Versions, Inference Queue, and External Jobs
API Enhancements
- Feedback Form Dropdown: New API endpoint and dropdown options for the feedback form
- TDAMM Tag Serialization: Modified serialization method in the CuratedURLAPISerializer to better support frontend
- CORS Support: Added HTTPS link for SDE-LRM to CORS allowed origins
Testing Improvements
Frontend Testing Infrastructure
- Selenium WebDriver: Comprehensive frontend testing setup with Chrome
- Authentication Testing: Implemented test suite for authentication flows
- UI Component Tests: Added tests for collection display, data tables, and search functionality
- Form Validation: Created tests for pattern application forms with validation checks
Backend Testing Coverage
- Job Generation Pipeline: Enhanced coverage for config and job creation pipeline
- XML Processing: Comprehensive tests for XML processing
- Critical Functionality Planning: Identified critical areas of the codebase for future testing
- Coverage Reporting: Integrated coverage.py for automated coverage reports on PRs
Infrastructure Updates
System Automation
- Scraper and Indexer Management: Updated nomenclature and parameterized the convert_template_to_job method
- Job Generation: Streamlined job creation during the curation workflow
- XML Processing: Enhanced XML processing to facilitate configuration generation
Administrative Improvements
- Slack Notifications: Enhanced import notifications with detailed status updates
- Feedback System: Updated slack notification structure with dropdown option text
- Testing Strategy: Introduced comprehensive testing strategy documentation
- Changelog: Introduced CHANGELOG.md to provide consumable descriptions of PRs
Bug Fixes
Data Processing
- Zero-Value Document Type: Fixed approximately 2,000 documents with document type value of 0 in nasa_science
- URL Import Logging: Enhanced logging to show expected, succeeded, and failed URL imports
- Document Type Creator Form: Set multi-select as default option for pattern creation forms
User Interface
- Quote Escaping: Fixed issue with quotes not being properly escaped in titles
- Scroll Position: Preserved scroll position when selecting document types on individual URLs
- Pattern Filtering: Fixed filtering issues in the Title Patterns multi-URL pattern selection
- Column Sorting: Corrected sorting behavior in Collections table for URL count columns
- Document Type Filter: Fixed filtering functionality in the Delta URLs page
- Button Layout: Improved spacing by arranging 'Show 100', 'CSV', and 'Customize Columns' buttons in one line
- Form Validation: Added appropriate error messages for empty document type selections
Security
- HTML Content Validation: Added validation to protect against HTML injection in the feedback form
- Secure Resource Loading: Ensured all external resources load securely by switching to HTTPS and adding SRI checks
What's Changed
- Specify pattern match type in the pattern forms by @dhanur-sharma in #1172
- Add Curated URLs column to homepage by @dhanur-sharma in #1170
- Added affected curated urls count on url pattern pages by @dhanur-sharma in #1169
- Uniform Handling of Errors throughout COSMOS by @saifrk in #1136
- Update exclude checkmark action to change behavior based on inclusion status by @Kirandawadi in #1167
- Enforce Code Quality at PR Time by @saifrk in #1201
- Automatic Running of Tests On Pull Request by @saifrk in #1123
- 1177 notifications update slack notification pipeline by @bishwaspraveen in #1200
- API Tests for Token Verification, Request Accuracy, Response Parsing, and Error Handling by @saifrk in #1089
- Create CHANGELOG.md by @CarsonDavis in #1221
- Tests for critical functionalities by @saifrk in #1220
- HTML validator has been set at serializer level by @bishwaspraveen in #1218
- Implement unit test for forms on the frontend by @Kirandawadi in #1226
- Finalize the infrastructure for frontend testing by @Kirandawadi in #1222
- 960 notifications add a dropdown with options on the feedback form by @bishwaspraveen in #1210
- serialzed and changed API structure to fit LRM requirements by @bishwaspraveen in #1215
- 3227 bugfix title patterns selecting multi url pattern does nothing by @bishwaspraveen in #1230
- Added Changelog for issue #1192 and #1195 by @Kirandawadi in #1237
- Update run_full_test_suite.yml by @CarsonDavis in #1233
- Added changelog for Issue_1001 by @saifrk in #1234
- Frontend test "test_create_title_pattern" failing due to insufficient wait time by @Kirandawadi in #1236
- remove unused getParameterByName in delta_url_list.js by @CarsonDavis in #1239
- Merge Dev Into Staging by @CarsonDavis in #1213
- Tests for Config & Job Creation + XML Processing by @saifrk in #1225
- Updated template and job creation for scrapers and indexers by @dhanur-sharma in #1072
- changes js code to preserve y scroll position while saving by @bishwaspraveen in #1228
- Slack Notification when importing urls by @saifrk in #1229
- Fix the issues of doctypes having 0 as a doctype by @bishwaspraveen in #1031
- 1101 bug fix quotes not escaped in titles by @bishwaspraveen in #1244
- changed the default to multi url pattern by @bishwaspraveen in #1216
- Minor Enhancement: Document Type Pattern Form – Require Document Type or Show Appropriate Error by @saifrk in #1247
- Added https URL to allow CORS by @dhanur-sharma in #1250
- Alignment of ‘Show 100’, ‘CSV’, ‘CUSTOMIZE COLUMNS’ by @saifrk in #1242
- Implement HTTPS and add SRI to external resources to fix CodeQL alert by @saifrk in #1245
- Integrate classification queue by @CarsonDavis in #1248
- 1182 ml classification queue by @CarsonDavis in #1219
- merge bugfixes and ml integration to staging by @CarsonDavis in #1254
- Staging by @CarsonDavis in #1255
- add sde-lrm tdamm to allowed cors by @CarsonDavis in #1256
- Updated generate_inference_job to create ModelVersion if needed by @dhanur-sharma in #1260
- Updated settings by @dhanur-sharma in #1266
- 1251 column sorting issue curated urls count sorts by delta urls count by @Kirandawadi in #1265
- 1252 document type filter not working in delta urls page by @Kirandawadi in #1263
- Updated process_inference_queue to run on the schedule by @dhanur-sharma in #1270
- add admin for inference models by @CarsonDavis in #1273
- Added objects for weekend schedule by @dhanur-sharma in #1275
- 1277 class based thresholding by @dhanur-sharma in #1278
- Process signals reliably by @dhanur-sharma in #1281
- specify exact collections on which to run TDAMM classifications by @CarsonDavis in #1286
- Added changelog for the inference pipeline by @dhanur-sharma in #1288
- Updated home page load time by @dhanur-sharma in #1290
Full Changelog: v3.0.0...v3.1.0