Skip to content

Fix pdfminer-six dependencies. #417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pprados
Copy link
Contributor

@pprados pprados commented Apr 3, 2025

Pdfminer has a bug that prevents some PDF files from being read. I've fixed it. A new version is available and should be used in the various projects that depend on this component.

See pdfminer/pdfminer.six#1081

This PR use the version >=20250327

To pass the CI/CD test, I also set the dependencies for python 3.9.

With a minimum version of 3.10, it will be possible to download the latest versions of dependencies.

@pprados pprados force-pushed the pprados/fix-dependencies branch from 5781af5 to 6ce55cc Compare April 3, 2025 11:50
@pprados pprados force-pushed the pprados/fix-dependencies branch 6 times, most recently from a375202 to 2bcc43b Compare April 3, 2025 13:03
@pprados pprados force-pushed the pprados/fix-dependencies branch from 2bcc43b to 3697b6f Compare April 3, 2025 13:10
@pprados pprados marked this pull request as ready for review April 3, 2025 13:20
@pprados pprados force-pushed the pprados/fix-dependencies branch from fdaa931 to d1b9931 Compare April 8, 2025 11:51
@pprados pprados force-pushed the pprados/fix-dependencies branch from d1b9931 to 7142a20 Compare April 9, 2025 11:28
@pprados
Copy link
Contributor Author

pprados commented Apr 11, 2025

@badGarnet

The "Configure AWS credentials" failed => Input required and not supplied: aws-region

make pip-compile invocation fails. Dependencies are not correct for python 3.9. Raising the minimum version to 3.10 would solve the problem. Many dependencies now need 3.10.

The pip-compile approach doesn't handle this properly. An approach using Poetry, for example, would be preferable.

@pprados
Copy link
Contributor Author

pprados commented Apr 15, 2025

@badGarnet
Can you fix the CI/CD ?

@badGarnet
Copy link
Collaborator

badGarnet commented Apr 15, 2025

@badGarnet Can you fix the CI/CD ?

merged a change that should have fixed the ci.

badGarnet added a commit that referenced this pull request Apr 15, 2025
resolves ci issue observed in #417
@pprados pprados marked this pull request as draft April 16, 2025 07:27
@pprados pprados mentioned this pull request Apr 16, 2025
@pprados pprados marked this pull request as ready for review April 16, 2025 08:41
@pprados
Copy link
Contributor Author

pprados commented Apr 16, 2025

@badGarnet
Another bug in CI/CD: #420

@badGarnet
Copy link
Collaborator

@badGarnet Another bug in CI/CD: #420

@pprados this seem to be a result of updated dependencies. I will take a closer look at the PR and see what's changed. For reference this was recently ran successfully.

@pprados
Copy link
Contributor Author

pprados commented Apr 17, 2025

Cool. Thanks.

@badGarnet
Copy link
Collaborator

ok so a few thoughts here:

  • we avoid pinning with == as much as possible in .in files so that the library has as much compatibility with other libraries as possible; in fact it is due to other users' request that we relaxed pdfminer to be unconstrained
  • if a project needs pdfminer==v20250327 and numpy>=2.0 one can simply add that constrain to the project's requirements and pip or uv or any dep management system can compile that into workable list of dependencies. e.g., write something like requirements.in for a project with
unstructured-inference
pdfminer==v20250327
numpy>=2.0

then compile it with pip compile requirements.in will get requirements.txt that has the right version of pdfminer as well as the rest of dependencies.

  • managing deps based on python version is not something pip-tool supports. Other popular tools like uv also doesn't support such usage so from maintenance point of view it is risky to attempt managing multiple versions of requirements based on python versions and risking confusing devs.

Based on those I would suggest take option 2 to see if it fits your need and if it does we can close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants