Skip to content

Commit 2a49406

Browse files
authored
Add Prettier to the repo, and format all the files! (#428)
This adds prettier to the repo, and sets up the pre-commit hook to auto-format as well as lint. Also updates ignores files to exclude crawls, test-crawls, scratch, dist as needed.
1 parent af1e086 commit 2a49406

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+3172
-2006
lines changed

.eslintrc.cjs

+5-5
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,18 @@ module.exports = {
55
node: true,
66
jest: true,
77
},
8-
extends: ["eslint:recommended", "plugin:@typescript-eslint/recommended"],
8+
extends: [
9+
"eslint:recommended",
10+
"plugin:@typescript-eslint/recommended",
11+
"prettier",
12+
],
913
parser: "@typescript-eslint/parser",
1014
plugins: ["@typescript-eslint"],
1115
parserOptions: {
1216
ecmaVersion: 12,
1317
sourceType: "module",
1418
},
1519
rules: {
16-
indent: ["error", 2],
17-
"linebreak-style": ["error", "unix"],
18-
quotes: ["error", "double"],
19-
semi: ["error", "always"],
2020
"no-constant-condition": ["error", { checkLoops: false }],
2121
"no-use-before-define": [
2222
"error",

.github/workflows/ci.yaml

+23-30
Original file line numberDiff line numberDiff line change
@@ -6,48 +6,41 @@ on:
66

77
jobs:
88
lint:
9-
109
runs-on: ubuntu-latest
1110

1211
strategy:
1312
matrix:
1413
node-version: [18.x]
1514

1615
steps:
17-
- uses: actions/checkout@v3
18-
- name: Use Node.js ${{ matrix.node-version }}
19-
uses: actions/setup-node@v3
20-
with:
21-
node-version: ${{ matrix.node-version }}
22-
- name: install requirements
23-
run: yarn install
24-
- name: run linter
25-
run: yarn lint
26-
27-
build:
16+
- uses: actions/checkout@v3
17+
- name: Use Node.js ${{ matrix.node-version }}
18+
uses: actions/setup-node@v3
19+
with:
20+
node-version: ${{ matrix.node-version }}
21+
- name: install requirements
22+
run: yarn install
23+
- name: run linter
24+
run: yarn lint && yarn format
2825

26+
build:
2927
runs-on: ubuntu-latest
3028

3129
strategy:
3230
matrix:
3331
node-version: [18.x]
3432

3533
steps:
36-
- uses: actions/checkout@v3
37-
- name: Use Node.js ${{ matrix.node-version }}
38-
uses: actions/setup-node@v3
39-
with:
40-
node-version: ${{ matrix.node-version }}
41-
- name: install requirements
42-
run: yarn install
43-
- name: build js
44-
run: yarn run tsc
45-
- name: build docker
46-
run: docker-compose build
47-
- name: run jest
48-
run: sudo yarn test
49-
50-
51-
52-
53-
34+
- uses: actions/checkout@v3
35+
- name: Use Node.js ${{ matrix.node-version }}
36+
uses: actions/setup-node@v3
37+
with:
38+
node-version: ${{ matrix.node-version }}
39+
- name: install requirements
40+
run: yarn install
41+
- name: build js
42+
run: yarn run tsc
43+
- name: build docker
44+
run: docker-compose build
45+
- name: run jest
46+
run: sudo yarn test

.github/workflows/release.yaml

+7-15
Original file line numberDiff line numberDiff line change
@@ -8,44 +8,36 @@ jobs:
88
name: Build x86 and ARM Images and push to Dockerhub
99
runs-on: ubuntu-22.04
1010
steps:
11-
-
12-
name: Check out the repo
11+
- name: Check out the repo
1312
uses: actions/checkout@v4
1413

15-
-
16-
name: Docker image metadata
14+
- name: Docker image metadata
1715
id: meta
1816
uses: docker/metadata-action@v5
1917
with:
2018
images: webrecorder/browsertrix-crawler
2119
tags: |
2220
type=semver,pattern={{version}}
2321
24-
-
25-
name: Set up QEMU
22+
- name: Set up QEMU
2623
uses: docker/setup-qemu-action@v3
2724
with:
2825
platforms: arm64
2926

30-
-
31-
name: Set up Docker Buildx
27+
- name: Set up Docker Buildx
3228
uses: docker/setup-buildx-action@v1
33-
-
34-
name: Login to DockerHub
29+
- name: Login to DockerHub
3530
uses: docker/login-action@v3
3631
with:
3732
username: ${{ secrets.DOCKER_USERNAME }}
3833
password: ${{ secrets.DOCKER_PASSWORD }}
39-
-
40-
name: Build and push
34+
- name: Build and push
4135
id: docker_build
4236
uses: docker/build-push-action@v3
4337
with:
4438
context: .
4539
push: true
4640
tags: ${{ steps.meta.outputs.tags }}
4741
platforms: "linux/amd64,linux/arm64"
48-
-
49-
name: Image digest
42+
- name: Image digest
5043
run: echo ${{ steps.docker_build.outputs.digest }}
51-

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ node_modules/
66
crawls/
77
test-crawls/
88
.DS_Store
9+
dist

.husky/pre-commit

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
#!/usr/bin/env sh
22
. "$(dirname -- "$0")/_/husky.sh"
33

4-
yarn lint
4+
yarn lint:fix

.pre-commit-config.yaml

+7-7
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
repos:
2-
- repo: local
3-
hooks:
4-
- id: husky-run-pre-commit
5-
name: husky
6-
language: system
7-
entry: .husky/pre-commit
8-
pass_filenames: false
2+
- repo: local
3+
hooks:
4+
- id: husky-run-pre-commit
5+
name: husky
6+
language: system
7+
entry: .husky/pre-commit
8+
pass_filenames: false

.prettierignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
dist

CHANGES.md

+13-5
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
11
## CHANGES
22

33
v0.8.1
4+
45
- Logging and Behavior Tweaks by @ikreymer in https://github.com/webrecorder/browsertrix-crawler/pull/229
56
- Fix typos by @stavares843 in https://github.com/webrecorder/browsertrix-crawler/pull/232
67
- Add crawl log to WACZ by @ikreymer in https://github.com/webrecorder/browsertrix-crawler/pull/231
78

89
v0.8.0
10+
911
- Switch to Chrome/Chromium 109
1012
- Convert to ESM module
1113
- Add ad blocking via request interception (#173)
@@ -25,11 +27,13 @@ v0.8.0
2527
- update behaviors to 0.4.1, rename 'Behavior line' -> 'Behavior log' by @ikreymer in https://github.com/webrecorder/browsertrix-crawler/pull/223
2628

2729
v0.7.1
30+
2831
- Fix for warcio.js by @ikreymer in #178
2932
- Guard against pre-existing user/group by @edsu in #176
3033
- Fix incorrect combineWARCs property in README.md by @Georift in #180
3134

3235
v0.7.0
36+
3337
- Update to Chrome/Chromium 101 - (0.7.0 Beta 0) by @ikreymer in #144
3438
- Add --netIdleWait, bump dependencies (0.7.0-beta.2) by @ikreymer in #145
3539
- Update README.md by @atomotic in #147
@@ -41,7 +45,6 @@ v0.7.0
4145
- Interrupt Handling Fixes by @ikreymer in #167
4246
- Run in Docker as User by @edsu in #171
4347

44-
4548
v0.6.0
4649

4750
- Add a --waitOnDone option, which has browsertrix crawler wait when finished (for use with Browsertrix Cloud)
@@ -56,8 +59,8 @@ v0.6.0
5659
- Fixes to interrupting a single instance in a shared state crawl
5760
- force all cookies, including session cookies, to fixed duration in days, configurable via --cookieDays
5861

59-
6062
v0.5.0
63+
6164
- Scope: support for `scopeType: domain` to include all subdomains and ignoring 'www.' if specified in the seed.
6265
- Profiles: support loading remote profile from URL as well as local file
6366
- Non-HTML Pages: Load non-200 responses in browser, even if non-html, fix waiting issues with non-HTML pages (eg. PDFs)
@@ -75,8 +78,8 @@ v0.5.0
7578
- Signing: Support for optional signing of WACZ
7679
- Dependencies: update to latest pywb, wacz and browsertrix-behaviors packages
7780

78-
7981
v0.4.4
82+
8083
- Page Block Rules Fix: 'request already handled' errors by avoiding adding duplicate handlers to same page.
8184
- Page Block Rules Fix: await all continue/abort() calls and catch errors.
8285
- Page Block Rules: Don't apply to top-level page, print warning and recommend scope rules instead.
@@ -86,18 +89,21 @@ v0.4.4
8689
- README: Update old type -> scopeType, list new scope types.
8790

8891
v0.4.3
92+
8993
- BlockRules Fixes: When considering the 'inFrameUrl' for a navigation request for an iframe, use URL of parent frame.
9094
- BlockRules Fixes: Always allow pywb proxy scripts.
9195
- Logging: Improved debug logging for block rules (log blocked requests and conditional iframe requests) when 'debug' set in 'logging'
9296

9397
v0.4.2
98+
9499
- Compose/docs: Build latest image by default, update README to refer to latest image
95100
- Fix typo in `crawler.capturePrefix` that resulted in `directFetchCapture()` always failing
96101
- Tests: Update all tests to use `test-crawls` directory
97102
- extractLinks() just extracts links from default selectors, allows custom driver to filter results
98103
- loadPage() accepts a list of selector options with selector, extract, and isAttribute settings for further customization of link extraction
99104

100105
v0.4.1
106+
101107
- BlockRules Optimizations: don't intercept requests if no blockRules
102108
- Profile Creation: Support extending existing profile by passing a --profile param to load on startup
103109
- Profile Creation: Set default window size to 1600x900, add --windowSize param for setting custom size
@@ -107,6 +113,7 @@ v0.4.1
107113
- CI: Build a multi-platform (amd64 and arm64) image on each release
108114

109115
v0.4.0
116+
110117
- YAML based config, specifyable via --config property or via stdin (with '--config stdin')
111118
- Support for different scope types ('page', 'prefix', 'host', 'any', 'none') + crawl depth at crawl level
112119
- Per-Seed scoping, including different scope types, or depth and include/exclude rules configurable per seed in 'seeds' list via YAML config
@@ -120,16 +127,17 @@ v0.4.0
120127
- Update to latest pywb (2.5.0b4), browsertrix-behaviors (0.2.3), py-wacz (0.3.1)
121128

122129
v0.3.2
123-
- Added a `--urlFile` option: Allows users to specify a .txt file list of exact URLs to crawl (one URL per line).
124130

131+
- Added a `--urlFile` option: Allows users to specify a .txt file list of exact URLs to crawl (one URL per line).
125132

126133
v0.3.1
134+
127135
- Improved shutdown wait: Instead of waiting for 5 secs, wait until all pending requests are written to WARCs
128136
- Bug fix: Use async APIs for combine WARC to avoid spurious issues with multiple crawls
129137
- Behaviors Update to Behaviors to 0.2.1, with support for facebook pages
130138

131-
132139
v0.3.0
140+
133141
- WARC Combining: `--combineWARC` and `--rolloverSize` flags for generating combined WARC at end of crawl, each WARC upto specified rolloverSize
134142
- Profiles: Support for creating reusable browser profiles, stored as tarballs, and running crawl with a login profile (see README for more info)
135143
- Behaviors: Switch to Browsertrix Behaviors v0.1.1 for in-page behaviors

0 commit comments

Comments
 (0)