Skip to content

v3.1.0

Latest
Compare
Choose a tag to compare
@CarsonDavis CarsonDavis released this 28 Mar 01:09
· 17 commits to dev since this release
fb09bbc

COSMOS v3.1.0 Release Notes

Overview

COSMOS v3.1.0 introduces a major new machine learning classification pipeline and significant improvements to system reliability and user experience. The centerpiece of this release is the ML Classification Queue system, which enables automated document classification via the separate inference API. This is currently used to populate the Time Domain and Multi-Messenger Astronomy (TDAMM) portal by automatically tagging astrophysics content for specialized discovery. In the future, we will leverage it for division and document type tagging, as well as other metadata automation tasks.

This release also includes comprehensive testing infrastructure improvements with new frontend and backend test suites, ensuring code quality and reliability. The user interface has been enhanced with multiple usability fixes for common workflows, and several critical bugs have been resolved to improve system stability. Administrative capabilities have been expanded with better logging, form validation, and API enhancements.

Major Features

ML Classification Pipeline

  • New Classification System: Implemented a robust job processing mechanism to batch URLs for the inference API
  • Smart Batching: Added intelligent text length management with configurable maximums
  • Comprehensive Job Tracking: New models to track individual jobs sent to the API:
    • ModelVersion: Tracks multiple versions of classification models
    • InferenceJob: Manages jobs for collections of URLs
    • ExternalJob: Represents batched jobs sent to the inference API
  • Status Management: Complete workflow with status tracking (queued, pending, completed, failed, cancelled)
  • Classification Threshold Processing: Implemented class-based thresholding for classification results
  • TDAMM Tag Updates: Removed redundant tags and added missing ones for more accurate classification
  • Celery Integration: Scheduled processing during off-hours on weekdays and continuously on weekends
  • Admin: New admin panels for viewing Model Versions, Inference Queue, and External Jobs

API Enhancements

  • Feedback Form Dropdown: New API endpoint and dropdown options for the feedback form
  • TDAMM Tag Serialization: Modified serialization method in the CuratedURLAPISerializer to better support frontend
  • CORS Support: Added HTTPS link for SDE-LRM to CORS allowed origins

Testing Improvements

Frontend Testing Infrastructure

  • Selenium WebDriver: Comprehensive frontend testing setup with Chrome
  • Authentication Testing: Implemented test suite for authentication flows
  • UI Component Tests: Added tests for collection display, data tables, and search functionality
  • Form Validation: Created tests for pattern application forms with validation checks

Backend Testing Coverage

  • Job Generation Pipeline: Enhanced coverage for config and job creation pipeline
  • XML Processing: Comprehensive tests for XML processing
  • Critical Functionality Planning: Identified critical areas of the codebase for future testing
  • Coverage Reporting: Integrated coverage.py for automated coverage reports on PRs

Infrastructure Updates

System Automation

  • Scraper and Indexer Management: Updated nomenclature and parameterized the convert_template_to_job method
  • Job Generation: Streamlined job creation during the curation workflow
  • XML Processing: Enhanced XML processing to facilitate configuration generation

Administrative Improvements

  • Slack Notifications: Enhanced import notifications with detailed status updates
  • Feedback System: Updated slack notification structure with dropdown option text
  • Testing Strategy: Introduced comprehensive testing strategy documentation
  • Changelog: Introduced CHANGELOG.md to provide consumable descriptions of PRs

Bug Fixes

Data Processing

  • Zero-Value Document Type: Fixed approximately 2,000 documents with document type value of 0 in nasa_science
  • URL Import Logging: Enhanced logging to show expected, succeeded, and failed URL imports
  • Document Type Creator Form: Set multi-select as default option for pattern creation forms

User Interface

  • Quote Escaping: Fixed issue with quotes not being properly escaped in titles
  • Scroll Position: Preserved scroll position when selecting document types on individual URLs
  • Pattern Filtering: Fixed filtering issues in the Title Patterns multi-URL pattern selection
  • Column Sorting: Corrected sorting behavior in Collections table for URL count columns
  • Document Type Filter: Fixed filtering functionality in the Delta URLs page
  • Button Layout: Improved spacing by arranging 'Show 100', 'CSV', and 'Customize Columns' buttons in one line
  • Form Validation: Added appropriate error messages for empty document type selections

Security

  • HTML Content Validation: Added validation to protect against HTML injection in the feedback form
  • Secure Resource Loading: Ensured all external resources load securely by switching to HTTPS and adding SRI checks

What's Changed

Full Changelog: v3.0.0...v3.1.0