Skip to content

Commit ab9a148

Browse files
authored
fix(firestore-bigquery-export): use latest change tracker (#2429)
* fix(firestore-bigquery-export): improve DELETE operation handling and documentation * chore(firestore-bigquery-export): bump extension version
1 parent d4fc959 commit ab9a148

File tree

5 files changed

+55
-7
lines changed

5 files changed

+55
-7
lines changed

firestore-bigquery-export/CHANGELOG.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
## Version 0.2.5
2+
3+
fix: keep partition value on delete using old data
4+
5+
docs: improve "Remove stale data" query in guide
6+
17
## Version 0.2.4
28

39
feat: Add bigquery dataset locations and remove duplicates
@@ -10,7 +16,7 @@ fix: pass full document resource name to bigquery
1016

1117
fix: remove default value on DATABASE_REGION
1218

13-
## Versions 0.2.1
19+
## Version 0.2.1
1420

1521
fix: correct database region params and make mutable
1622

firestore-bigquery-export/extension.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# limitations under the License.
1414

1515
name: firestore-bigquery-export
16-
version: 0.2.4
16+
version: 0.2.5
1717
specVersion: v1beta
1818

1919
displayName: Stream Firestore to BigQuery

firestore-bigquery-export/functions/package-lock.json

Lines changed: 4 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

firestore-bigquery-export/functions/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"author": "Jan Wyszynski <[email protected]>",
1414
"license": "Apache-2.0",
1515
"dependencies": {
16-
"@firebaseextensions/firestore-bigquery-change-tracker": "^1.1.41",
16+
"@firebaseextensions/firestore-bigquery-change-tracker": "^1.1.42",
1717
"@google-cloud/bigquery": "^7.6.0",
1818
"@types/chai": "^4.1.6",
1919
"@types/express-serve-static-core": "4.17.30",

firestore-bigquery-export/guides/EXAMPLE_QUERIES.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,10 @@ If you want to clean up data from your `changelog` table, use the following
115115
`DELETE` query to delete all rows that fall within a certain time period,
116116
e.g. greater than 1 month old.
117117

118+
#### Option 1: Remove stale changelog records but keep latest change per document (default)
119+
120+
If you want to remove all entries that are over one month old, regardless of whether they are the latest change for a document (e.g., including DELETE operations), use the following query:
121+
118122
```sql
119123
/* The query below deletes any rows below that are over one month old. */
120124
DELETE FROM `[PROJECT ID].[DATASET ID].[CHANGELOG TABLE ID]`
@@ -132,3 +136,41 @@ WHERE (document_name, timestamp) IN
132136
AND DATETIME(t.timestamp) < DATE_ADD(CURRENT_DATETIME(), INTERVAL -1 MONTH)
133137
)
134138
```
139+
140+
⚠️ Note: This query will remove all entries older than one month, including the most recent record for documents whose last change (e.g., a DELETE) happened more than a month ago. Use this only if you do not need to retain full historical state in your changelog table.
141+
142+
#### Option 2: Remove all changelog records older than one month — including latest DELETE operations
143+
144+
If you want to remove all entries that are over one month old, regardless of whether they are the latest change for a document (e.g., including DELETE operations), use the following query:
145+
146+
```sql
147+
/* Deletes all changelog records older than one month, including latest DELETEs */
148+
DELETE FROM `[PROJECT ID].[DATASET ID].[CHANGELOG TABLE ID]`
149+
WHERE DATETIME(timestamp) < DATE_ADD(CURRENT_DATETIME(), INTERVAL -1 MONTH)
150+
```
151+
152+
#### Option 3: Remove all changelog records older than one month, including latest DELETE operations only
153+
154+
This option removes all old records, and it will also delete DELETE operations even if they are the latest change for a document — as long as they are older than one month.
155+
156+
Use this if you want to aggressively clean up deleted documents from your changelog, even if that means latest views will no longer reflect that those documents were deleted.
157+
158+
```sql
159+
/* Deletes any changelog records over one month old,
160+
including DELETEs that are the latest entry for a document */
161+
DELETE FROM `[PROJECT ID].[DATASET ID].[CHANGELOG TABLE ID]`
162+
WHERE (document_name, timestamp) IN (
163+
WITH latest AS (
164+
SELECT MAX(timestamp) AS timestamp, document_name
165+
FROM `[PROJECT ID].[DATASET ID].[CHANGELOG TABLE ID]`
166+
GROUP BY document_name
167+
)
168+
SELECT (t.document_name, t.timestamp)
169+
FROM `[PROJECT ID].[DATASET ID].[CHANGELOG TABLE ID]` AS t
170+
JOIN latest ON t.document_name = latest.document_name
171+
WHERE (t.timestamp != latest.timestamp OR t.operation = 'DELETE')
172+
AND DATETIME(t.timestamp) < DATE_ADD(CURRENT_DATETIME(), INTERVAL -1 MONTH)
173+
)
174+
```
175+
176+
⚠️ Note: This will remove DELETE records that are older than one month even if they are the most recent change. As a result, your \_latest view will no longer show that those documents were deleted — they may appear as if they never existed. Use this option only if that behavior is acceptable for your use case.

0 commit comments

Comments
 (0)