Skip to content

Contribution to fix small bug and to reformat #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

linhkid
Copy link

@linhkid linhkid commented Apr 7, 2024

  • Fix JSON error when there is incomplete prompt leading to failed json generation
  • Add more fields in the summariaztion such as methodology, discussion, data used etc
  • Add generated date to digest.html file

1. {"Relevancy score": "an integer score out of 10", "Reasons for match": "1-2 sentence short reasonings", "Goal": "What kind of pain points the paper is trying to solve?", "Data": "Summary of the data source used in the paper", "Methodology": "Summary of methodologies used in the paper", "Git": "Link to the code repo (if available)", "Experiments & Results": "Summary of any experiments & its results", "Discussion & Next steps": "Further discussion and next steps of the research"}

My research interests are: NLP, RAGs, LLM, Optmization in Machine learning, Data science, Generative AI, Optimization in LLM, Finance modelling ...
Copy link
Collaborator

@rmfan rmfan Apr 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interests get appended on here:

prompt += query['interest']

No need to add them manually to the relevancy prompt

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks

README.md Outdated
**ArXiv Digest and Personalized Recommendations using Large Language Models.**
**ArXiv Digest (extra version) and Personalized Recommendations using Large Language Models.**

*(Note: This is an adjusted repo to match my needs. For original repo please refer to **AutoLLM** that I forked from)*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pull request to the original repo 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry Richard, pls ignore haha.

linhkid added 18 commits April 26, 2024 23:35
Revamp workflow, UI and using multiple models for this repo

1.Content Extraction After Filtering:

Added a new step in the process between Stage 1 and Stage 2
After papers pass the relevancy filter, the system now extracts HTML content for them
Only papers that make it through the threshold get their content fetched
Uses the crawl_html_version function from download_new_papers.py that you already have
Process Flow:
Stage 1: Quick filtering based on title and abstract only
Content Extraction: Fetch HTML content for papers that passed the filter
Stage 2: Detailed analysis that includes the full content
Updated Filter Prompt:
Made it clear in the Stage 1 prompt that this is just preliminary screening
Specified that papers scoring 7+ will be analyzed in depth with full content
Added clearer instructions for the relevancy scoring
Fixed Processing Flow:
Always processes all available papers
Uses fixed batches of 8 papers for Stage 1 (title & abstract only)
Guarantees at least 10 papers will be analyzed in depth
Revamp UI using Gradio and removed Sending email
Workflow Improvements:
Added clear comments explaining the fixed parameters
Removed the batch update function which is no longer needed
Simplified the code overall
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants