Skip to content

Import Kaerlein bibliography into JabRef #12700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
9 tasks
koppor opened this issue Mar 12, 2025 · 5 comments
Open
9 tasks

Import Kaerlein bibliography into JabRef #12700

koppor opened this issue Mar 12, 2025 · 5 comments
Assignees

Comments

@koppor
Copy link
Member

koppor commented Mar 12, 2025

The site http://kristallin.de/Kaerlein/Kaerlein.html links a PDF with many pages of references. These need to be imported into JabRef.

We need a fallback mechansim routing each PDF page to our AI provider.


Tasks

  • Add documentation on "New library based on references in PDF file..."
    Image
  • Try import with Kaerlein
  • Extract one PDF page with Karlein import it using a JUnit case
  • Refine code to match expectations
  • Extract other PDF page. Repeate.
  • Write a JUnit test case importing complete Kaerlein

Thoughts

Prompt (for a single entry)

Convert following to a good BibLaTeX entry. Keep all text. Do not abbreviate any text.

"New Library based on references in PDF file" probalby doesn't work (because its tied to IEEE).

@Kaan0029
Copy link
Contributor

Hi! I'd like to give this issue a try, if that's okay.

Broadly, I’m planning to extend BibliographyFromPdfImporter with the AI fallback mechanism that extracts BibLaTeX references from PDFs if the standard method returns no entries. I think the fallback will:

  • (Lazily) initialize a ChatLanguageModel (from langchain4j) using JvmOpenAiChatLanguageModel
  • Build prompts per page and send them via the selected provider from AiPreferences (could also do multiple pages – need to test how this affects error likelihood)
  • Parse the AI response to generate valid BibEntry objects

I’ll also create a new component (something like ReferenceExtractionAiHandler) to handle AI prompt construction and response parsing. I think those should be all of the primary changes (some other parts need adaptation too, of course).

2 questions I had:

  • While the instructions focus on the Kaerlein bibliography, the fallback mechanism seems generic. Would you prefer a generalized implementation that could potentially support any PDF–would be no extra work really since we're using an AI provider? (shouldn't be a difference implementation-wise either – however maybe in terms of API calls?)
  • I noticed that NewLibraryFromPdfActionOffline uses BibliographyFromPdfImporter, whereas NewLibraryFromPdfActionOnline uses Grobid. Should the AI fallback be integrated into both workflows, or is the intention to enhance just the offline route? Could be implemented for both (would simply wrap the e.g. Grobid call in a fallback chain), I just wanted to clarify.

@koppor
Copy link
Member Author

koppor commented Apr 16, 2025

  1. Please do a general implementation
  2. "Offline" should never use online services.

Frist step: Only in ...Online.

Then: Think how it can be included in ...Offline: A user can configure localhost as AI provider. Then, there is no Internet use. Maybe, the simplest thing is to add a preference "Treat as offline provider*. If checked, then ...Offline uses AI.

@Kaan0029
Copy link
Contributor

/assign-me

@github-actions github-actions bot added the 📍 Assigned Assigned by assign-issue-action (or manually assigned) label Apr 23, 2025
Copy link
Contributor

👋 Hey @Kaan0029, thank you for your interest in this issue! 🎉

We're excited to have you on board. Start by exploring our Contributing guidelines, and don't forget to check out our workspace setup guidelines to get started smoothly.

In case you encounter failing tests during development, please check our developer FAQs!

Having any questions or issues? Feel free to ask here on GitHub. Need help setting up your local workspace? Join the conversation on JabRef's Gitter chat. And don't hesitate to open a (draft) pull request early on to show the direction it is heading towards. This way, you will receive valuable feedback.

Happy coding! 🚀

Copy link
Contributor

github-actions bot commented May 5, 2025

⏰ Assignment Reminder

Hi @Kaan0029, this is a friendly reminder about your assignment to this issue.

Warning

This issue will be automatically unassigned in 11 days if there's no activity.

How to keep your assignment


If you are working on it, you can prevent automatic unassignment by:

  • Submitting a draft PR with your progress
  • Asking for the 📌 Pinned label if you need more time

We appreciate your contribution and are here to help if needed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants