Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markitdown optional dependency installation #1152

Open
Robert-Jia00129 opened this issue Mar 25, 2025 · 3 comments
Open

markitdown optional dependency installation #1152

Robert-Jia00129 opened this issue Mar 25, 2025 · 3 comments

Comments

@Robert-Jia00129
Copy link

Initial Error:

Traceback (most recent call last):
  File "/Users/jiazhenghao/CodingProjects/research/SocSim/pdf2sim.py", line 6, in <module>
    result = md.convert(paper_path)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/llm-sim/lib/python3.11/site-packages/markitdown/_markitdown.py", line 258, in convert
    return self.convert_local(source, stream_info=stream_info, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/llm-sim/lib/python3.11/site-packages/markitdown/_markitdown.py", line 312, in convert_local
    return self._convert(file_stream=fh, stream_info_guesses=guesses, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/llm-sim/lib/python3.11/site-packages/markitdown/_markitdown.py", line 540, in _convert
    raise FileConversionException(attempts=failed_attempts)
markitdown._exceptions.FileConversionException: File conversion failed after 1 attempts:
 - PdfConverter threw MissingDependencyException with message: PdfConverter recognized the input as a potential .pdf file, but the dependencies needed to read .pdf files have not been installed. To resolve this error, include the optional dependency [pdf] or [all] when installing MarkItDown. For example:

* pip install markitdown[pdf]
* pip install markitdown[all]
* pip install markitdown[pdf, ...]
* etc.

Tried
pip install markitdown[all]

Produced Error:

zsh: no matches found: markitdown[all]

Fix:
pip install 'markitdown[all]'

@Vloon
Copy link

Vloon commented Mar 25, 2025

I have the same issue (but for DocxConverter instead of PdfConverter).

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.12.9/x64/bin/markitdown", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages/markitdown/__main__.py", line 197, in main
    result = markitdown.convert(
             ^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages/markitdown/_markitdown.py", line 260, in convert
    return self.convert_local(source, stream_info=stream_info, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages/markitdown/_markitdown.py", line 314, in convert_local
    return self._convert(file_stream=fh, stream_info_guesses=guesses, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages/markitdown/_markitdown.py", line 600, in _convert
    raise FileConversionException(attempts=failed_attempts)
markitdown._exceptions.FileConversionException: File conversion failed after 1 attempts:
 - DocxConverter threw MissingDependencyException with message: DocxConverter recognized the input as a potential .docx file, but the dependencies needed to read .docx files have not been installed. To resolve this error, include the optional dependency [docx] or [all] when installing MarkItDown. For example:

* pip install markitdown[docx]
* pip install markitdown[all]
* pip install markitdown[docx, ...]
* etc.

I use a Azure linux pipeline to run these two steps:

- bash: |
    echo Installing MarkItDown...
    pip install 'markitdown[all]'
  displayName: 'Install MarkItDown'

- bash: |
    echo Using MarkItDown to process markdown files...
    markitdown ${{ parameters.pathToFile }} > "$(Build.ArtifactStagingDirectory)/${{ parameters.outputFile }}.md"
  displayName: 'Run MarkItDown'

So the fix shown above doesn't seem to work. Anyone any thoughts on why?

@afourney
Copy link
Member

hmmm, how do you folks typically install packages with optional dependencies?

Indeed, with zsh or fish, quoting is necessary and should be sufficient: pip install 'markitdown[all]'

You could also write a requirements.txt with:

markitdown[all]

And then do pip install -r requirements.txt

@Vloon
Copy link

Vloon commented Mar 31, 2025

Thanks for the reply. Apparently the issue was in the type of quotes (which it quite often is in Azure Pipelines). pip install "markitdown[all]" worked instead of pip install 'markitdown[all]'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants