Skip to content

Commit f66e5e0

Browse files
committed
added text pdf word-by-word differ based on dwdiff
1 parent 9dcce1c commit f66e5e0

File tree

1 file changed

+40
-0
lines changed

1 file changed

+40
-0
lines changed

pdf-word-diff.sh

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
#!/bin/bash
2+
# simple textual diff-er that outputs html to stdout
3+
# NB: if the text can't be extracted, it won't show up, eg. text in pics or garbled pdfs
4+
pdf1=$1
5+
pdf1n=$(basename $1)
6+
pdf2=$2
7+
pdf2n=$(basename $2)
8+
params=$3
9+
encoding=${4:-UTF-8}
10+
[[ -z $2 ]] && exit 111
11+
12+
cat <<HTMLHEAD
13+
<html>
14+
<head>
15+
<meta charset="UTF-8"/>
16+
<title>Spremembe med $pdf1n in $pdf2n</title>
17+
</head>
18+
<style>
19+
.added { color: green; background: #e3f3c5; }
20+
.removed { color: red; background: #fedfdf; text-decoration: line-through; }
21+
</style>
22+
<body>
23+
<h1>Besedilne spremembe med:
24+
<ul>
25+
<li> <span class="removed">$pdf1n</span></li>
26+
<li> <span class="added">$pdf2n</span></li>
27+
</ul>
28+
</h1>
29+
<pre>
30+
HTMLHEAD
31+
32+
dwdiff -i -A best -P $params \
33+
--start-delete='<span class="removed">' --stop-delete='</span>' \
34+
--start-insert='<span class="added" >' --stop-insert='</span>' \
35+
<(pdftotext -enc $encoding -layout "$pdf1" -) \
36+
<(pdftotext -enc $encoding -layout "$pdf2" -)
37+
38+
cat <<-HTMLTAIL
39+
</pre></body></html>
40+
HTMLTAIL

0 commit comments

Comments
 (0)