Skip to content

Commit 9cab50f

Browse files
authored
Allow link density value to be modified (#874)
* Allow link density value to be modified * Add linkDensityModifier documentation
1 parent 5abeedd commit 9cab50f

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ The `options` object accepts a number of properties, all optional:
3737
* `disableJSONLD` (boolean, default `false`): when extracting page metadata, Readability gives precedence to Schema.org fields specified in the JSON-LD format. Set this option to `true` to skip JSON-LD parsing.
3838
* `serializer` (function, default `el => el.innerHTML`) controls how the `content` property returned by the `parse()` method is produced from the root DOM element. It may be useful to specify the `serializer` as the identity function (`el => el`) to obtain a DOM element instead of a string for `content` if you plan to process it further.
3939
* `allowedVideoRegex` (RegExp, default `undefined` ): a regular expression that matches video URLs that should be allowed to be included in the article content. If `undefined`, the [default regex](https://github.com/mozilla/readability/blob/8e8ec27cd2013940bc6f3cc609de10e35a1d9d86/Readability.js#L133) is applied.
40+
* `linkDensityModifier` (number, default `0`): a number that is added to the base link density threshold during the shadiness checks. This can be used to penalize nodes with a high link density or vice versa.
4041

4142
### `parse()`
4243

Readability.js

+3-2
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ function Readability(doc, options) {
5454
};
5555
this._disableJSONLD = !!options.disableJSONLD;
5656
this._allowedVideoRegex = options.allowedVideoRegex || this.REGEXPS.videos;
57+
this._linkDensityModifier = options.linkDensityModifier || 0;
5758

5859
// Start with all flags set
5960
this._flags = this.FLAG_STRIP_UNLIKELYS |
@@ -2185,10 +2186,10 @@ Readability.prototype = {
21852186
if (!isList && !isFigureChild && headingDensity < 0.9 && contentLength < 25 && (img === 0 || img > 2) && linkDensity > 0) {
21862187
errs.push(`Suspiciously short. (headingDensity=${headingDensity}, img=${img}, linkDensity=${linkDensity})`);
21872188
}
2188-
if (!isList && weight < 25 && linkDensity > 0.2) {
2189+
if (!isList && weight < 25 && linkDensity > (0.2 + this._linkDensityModifier)) {
21892190
errs.push(`Low weight and a little linky. (linkDensity=${linkDensity})`);
21902191
}
2191-
if (weight >= 25 && linkDensity > 0.5) {
2192+
if (weight >= 25 && linkDensity > (0.5 + this._linkDensityModifier)) {
21922193
errs.push(`High weight and mostly links. (linkDensity=${linkDensity})`);
21932194
}
21942195
if ((embedCount === 1 && contentLength < 75) || embedCount > 1) {

0 commit comments

Comments
 (0)