Contribution to fix small bug and to reformat #27

linhkid · 2024-04-07T03:59:20Z

Fix JSON error when there is incomplete prompt leading to failed json generation
Add more fields in the summariaztion such as methodology, discussion, data used etc
Add generated date to digest.html file

rmfan · 2024-04-21T18:01:32Z

src/relevancy_prompt.txt

+1. {"Relevancy score": "an integer score out of 10", "Reasons for match": "1-2 sentence short reasonings", "Goal": "What kind of pain points the paper is trying to solve?", "Data": "Summary of the data source used in the paper", "Methodology": "Summary of methodologies used in the paper", "Git": "Link to the code repo (if available)", "Experiments & Results": "Summary of any experiments & its results", "Discussion & Next steps": "Further discussion and next steps of the research"}
+
+My research interests are: NLP, RAGs, LLM, Optmization in Machine learning, Data science, Generative AI, Optimization in LLM, Finance modelling ...


The interests get appended on here:

ArxivDigest/src/relevancy.py

Line 23 in 0eadab7

prompt += query['interest']

No need to add them manually to the relevancy prompt

rmfan · 2024-04-21T18:02:20Z

README.md

-**ArXiv Digest and Personalized Recommendations using Large Language Models.**
+**ArXiv Digest (extra version) and Personalized Recommendations using Large Language Models.**
+
+*(Note: This is an adjusted repo to match my needs. For original repo please refer to **AutoLLM** that I forked from)*


This is a pull request to the original repo 😄

Sorry Richard, pls ignore haha.

Revamp workflow, UI and using multiple models for this repo 1.Content Extraction After Filtering: Added a new step in the process between Stage 1 and Stage 2 After papers pass the relevancy filter, the system now extracts HTML content for them Only papers that make it through the threshold get their content fetched Uses the crawl_html_version function from download_new_papers.py that you already have Process Flow: Stage 1: Quick filtering based on title and abstract only Content Extraction: Fetch HTML content for papers that passed the filter Stage 2: Detailed analysis that includes the full content Updated Filter Prompt: Made it clear in the Stage 1 prompt that this is just preliminary screening Specified that papers scoring 7+ will be analyzed in depth with full content Added clearer instructions for the relevancy scoring Fixed Processing Flow: Always processes all available papers Uses fixed batches of 8 papers for Stage 1 (title & abstract only) Guarantees at least 10 papers will be analyzed in depth Revamp UI using Gradio and removed Sending email Workflow Improvements: Added clear comments explaining the fixed parameters Removed the batch update function which is no longer needed Simplified the code overall

Multiagent multipurpose

Linh Nguyen and others added 10 commits March 31, 2024 21:49

edit prompts

94aa2bb

edit exception

9df54fe

test push

e3e24ea

Add other fields and fix JSON format errors

284474f

add date time to file name

cb7341f

Edit some comments

937bbef

Update README.md

6ec246b

Update README.md

d98d8da

Update README.md

126773e

test adding new attributes

5d885c4

rmfan reviewed Apr 21, 2024

View reviewed changes

linhkid added 18 commits April 26, 2024 23:35

Read html version of papers instead of just abstract

fc0e67e

Add subjects and add more tokens for the model to digest

fc807c3

Modify Huggingface app.py

a3848f5

Change README

ae371ad

Change README

723f383

Fix crawler error lead to logic's fault in checking subjects

a332618

Change URL for main page landing, waiting for TODO on abstract

48da507

Fix the abstract not found error, and also add ssl cert for windows

9b11eb5

Major fix and upgrade for Arxiv digest

16cd86c

ok for now

23c38b5

just to be safe, it's processing single file ok now

89ffcf1

2 stage filtering

51389ee

Merge branch 'main' into multiagent_multipurpose

e09d501

refine and refactor

e8da783

edit README

45dd62d

edit README

a8eec4d

Merge pull request #2 from linhkid/multiagent_multipurpose

01ce725

Multiagent multipurpose

linhkid added 3 commits April 7, 2025 15:07

Update README.md

427cf6a

edit threshold bug

bddfee4

add scrollable sidebar for HTML

cb8e751

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contribution to fix small bug and to reformat #27

Contribution to fix small bug and to reformat #27

linhkid commented Apr 7, 2024

rmfan Apr 21, 2024 •

edited

Loading

linhkid Apr 24, 2024

rmfan Apr 21, 2024

linhkid Apr 24, 2024

		1. {"Relevancy score": "an integer score out of 10", "Reasons for match": "1-2 sentence short reasonings", "Goal": "What kind of pain points the paper is trying to solve?", "Data": "Summary of the data source used in the paper", "Methodology": "Summary of methodologies used in the paper", "Git": "Link to the code repo (if available)", "Experiments & Results": "Summary of any experiments & its results", "Discussion & Next steps": "Further discussion and next steps of the research"}

		My research interests are: NLP, RAGs, LLM, Optmization in Machine learning, Data science, Generative AI, Optimization in LLM, Finance modelling ...

Contribution to fix small bug and to reformat #27

Are you sure you want to change the base?

Contribution to fix small bug and to reformat #27

Conversation

linhkid commented Apr 7, 2024

rmfan Apr 21, 2024 • edited Loading

Choose a reason for hiding this comment

linhkid Apr 24, 2024

Choose a reason for hiding this comment

rmfan Apr 21, 2024

Choose a reason for hiding this comment

linhkid Apr 24, 2024

Choose a reason for hiding this comment

rmfan Apr 21, 2024 •

edited

Loading