Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud not convert stream / pdf to markdown #1134

Open
TorgeStahl opened this issue Mar 16, 2025 · 1 comment
Open

Cloud not convert stream / pdf to markdown #1134

TorgeStahl opened this issue Mar 16, 2025 · 1 comment

Comments

@TorgeStahl
Copy link

Hey there,

i wanted to generate a markdown of a really long pdf document (roughly around 100 pages). Simple print works, but as soon as it should be converted to markdown, it gives the following issue below. Is there a now limitation to the length of a document?

Traceback (most recent call last):
File "/Users/user/Desktop/Repositories/markitdown/script/markdown.py", line 73, in
main()
~~~~^^
File "/Users/user/Desktop/Repositories/markitdown/script/markdown.py", line 34, in main
text = process_file(file_path)
File "/Users/user/Desktop/Repositories/markitdown/script/markdown.py", line 19, in process_file
result = md.convert(file_path)
File "/Users/user/Desktop/Repositories/markitdown/packages/markitdown/src/markitdown/_markitdown.py", line 259, in convert
return self.convert_local(source, stream_info=stream_info, **kwargs)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/Desktop/Repositories/markitdown/packages/markitdown/src/markitdown/_markitdown.py", line 310, in convert_local
return self._convert(file_stream=fh, stream_info_guesses=guesses, **kwargs)
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/Desktop/Repositories/markitdown/packages/markitdown/src/markitdown/_markitdown.py", line 541, in _convert
raise UnsupportedFormatException(
f"Could not convert stream to Markdown. No converter attempted a conversion, suggesting that the filetype is simply not supported."
)
markitdown._exceptions.UnsupportedFormatException: Could not convert stream to Markdown. No converter attempted a conversion, suggesting that the filetype is simply not supported

@afourney
Copy link
Member

Thanks for the report. Let's get to the bottom of this.

What version of the library are you using? Did you install it with [all] or at least [pdf]?
Is this a problem with all (e.g., smaller) PDFs? Or just this one?
Are you using the python library or the command line?

On my plate is to add a debug option and more python logging, to better support debugging these types of scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants