Skip to content

Commit 424e450

Browse files
Removed .pdf from document title in pdf_converter (#315)
* removed .pdf from document title in pdf_converter * bump version
1 parent 7919dad commit 424e450

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

cdqa/utils/converters.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ def pdf_converter(directory_path, min_length=200, include_line_breaks=False):
168168
df = pd.DataFrame(columns=["title", "paragraphs"])
169169
for i, pdf in enumerate(list_pdf):
170170
try:
171-
df.loc[i] = [pdf, None]
171+
df.loc[i] = [pdf.replace(".pdf",''), None]
172172
raw = parser.from_file(os.path.join(directory_path, pdf))
173173
s = raw["content"]
174174
paragraphs = re.split("\n(?=\u2028|[A-Z-0-9])", s)

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ def read(file):
88

99
setup(
1010
name="cdqa",
11-
version="1.3.7",
11+
version="1.3.8",
1212
author="Félix MIKAELIAN, André FARIAS, Matyas AMROUCHE, Olivier SANS, Théo NAZON",
1313
description="An End-To-End Closed Domain Question Answering System",
1414
long_description=read("README.md"),

0 commit comments

Comments
 (0)