Skip to content

Commit 94fcf11

Browse files
authored
Merge pull request #2266 from firebase/next
Release: firestore-bigquery-export 0.1.57
2 parents a64de2a + 78de81c commit 94fcf11

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+9482
-12450
lines changed

firestore-bigquery-export/CHANGELOG.md

+12
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,15 @@
1+
## Version 0.1.57
2+
3+
feat - add basic materialized views support, incremental and non-incremental.
4+
5+
fix - do not add/update clustering if an invalid clustering field is present.
6+
7+
docs - improve cross-project IAM documentation
8+
9+
fix - emit correct events to extension, backwardly compatible.
10+
11+
docs - add documentation on workarounds to mitigate data loss during extension updates
12+
113
## Version 0.1.56
214

315
feat - improve sync strategy by immediately writing to BQ, and using cloud tasks only as a last resort

firestore-bigquery-export/POSTINSTALL.md

+49-12
Original file line numberDiff line numberDiff line change
@@ -4,30 +4,30 @@ You can test out this extension right away!
44

55
1. Go to your [Cloud Firestore dashboard](https://console.firebase.google.com/project/${param:BIGQUERY_PROJECT_ID}/firestore/data) in the Firebase console.
66

7-
1. If it doesn't already exist, create the collection you specified during installation: `${param:COLLECTION_PATH}`
7+
2. If it doesn't already exist, create the collection you specified during installation: `${param:COLLECTION_PATH}`
88

9-
1. Create a document in the collection called `bigquery-mirror-test` that contains any fields with any values that you'd like.
9+
3. Create a document in the collection called `bigquery-mirror-test` that contains any fields with any values that you'd like.
1010

11-
1. Go to the [BigQuery web UI](https://console.cloud.google.com/bigquery?project=${param:BIGQUERY_PROJECT_ID}&p=${param:BIGQUERY_PROJECT_ID}&d=${param:DATASET_ID}) in the Google Cloud Platform console.
11+
4. Go to the [BigQuery web UI](https://console.cloud.google.com/bigquery?project=${param:BIGQUERY_PROJECT_ID}&p=${param:BIGQUERY_PROJECT_ID}&d=${param:DATASET_ID}) in the Google Cloud Platform console.
1212

13-
1. Query your **raw changelog table**, which should contain a single log of creating the `bigquery-mirror-test` document.
13+
5. Query your **raw changelog table**, which should contain a single log of creating the `bigquery-mirror-test` document.
1414

1515
```
1616
SELECT *
1717
FROM `${param:BIGQUERY_PROJECT_ID}.${param:DATASET_ID}.${param:TABLE_ID}_raw_changelog`
1818
```
1919
20-
1. Query your **latest view**, which should return the latest change event for the only document present -- `bigquery-mirror-test`.
20+
6. Query your **latest view**, which should return the latest change event for the only document present -- `bigquery-mirror-test`.
2121
2222
```
2323
SELECT *
2424
FROM `${param:BIGQUERY_PROJECT_ID}.${param:DATASET_ID}.${param:TABLE_ID}_raw_latest`
2525
```
2626
27-
1. Delete the `bigquery-mirror-test` document from [Cloud Firestore](https://console.firebase.google.com/project/${param:BIGQUERY_PROJECT_ID}/firestore/data).
27+
7. Delete the `bigquery-mirror-test` document from [Cloud Firestore](https://console.firebase.google.com/project/${param:BIGQUERY_PROJECT_ID}/firestore/data).
2828
The `bigquery-mirror-test` document will disappear from the **latest view** and a `DELETE` event will be added to the **raw changelog table**.
2929
30-
1. You can check the changelogs of a single document with this query:
30+
8. You can check the changelogs of a single document with this query:
3131
3232
```
3333
SELECT *
@@ -54,13 +54,50 @@ Enabling wildcard references will provide an additional STRING based column. The
5454
5555
`Clustering` will not need to create or modify a table when adding clustering options, this will be updated automatically.
5656
57-
### Configuring Cross-Platform BigQuery Setup
57+
#### Cross-project Streaming
5858
59-
When defining a specific BigQuery project ID, a manual step to set up permissions is required:
59+
By default, the extension exports data to BigQuery in the same project as your Firebase project. However, you can configure it to export to a BigQuery instance in a different Google Cloud project. To do this:
6060
61-
1. Navigate to https://console.cloud.google.com/iam-admin/iam?project=${param:BIGQUERY_PROJECT_ID}
62-
2. Add the **BigQuery Data Editor** role to the following service account:
63-
`ext-${param:EXT_INSTANCE_ID}@${param:PROJECT_ID}.iam.gserviceaccount.com`.
61+
1. During installation, set the `BIGQUERY_PROJECT_ID` parameter as your target BigQuery project ID.
62+
63+
2. Identify the service account on the source project associated with the extension. By default, it will be constructed as `ext-<extension-instance-id>@project-id.iam.gserviceaccount.com`. However, if the extension instance ID is too long, it may be truncated and 4 random characters appended to abide by service account length limits.
64+
65+
3. To find the exact service account, navigate to IAM & Admin -> IAM in the Google Cloud Platform Console. Look for the service account listed with "Name" as "Firebase Extensions <your extension instance ID> service account". The value in the "Principal" column will be the service account that needs permissions granted in the target project.
66+
67+
4. Grant the extension's service account the necessary BigQuery permissions on the target project. You can use our provided scripts:
68+
69+
**For Linux/Mac (Bash):**
70+
```bash
71+
curl -O https://raw.githubusercontent.com/firebase/extensions/master/firestore-bigquery-export/scripts/grant-crossproject-access.sh
72+
chmod +x grant-crossproject-access.sh
73+
./grant-crossproject-access.sh -f SOURCE_FIREBASE_PROJECT -b TARGET_BIGQUERY_PROJECT [-i EXTENSION_INSTANCE_ID] [-s SERVICE_ACCOUNT]
74+
```
75+
76+
**For Windows (PowerShell):**
77+
```powershell
78+
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/firebase/extensions/master/firestore-bigquery-export/scripts/grant-crossproject-access.ps1" -OutFile "grant-crossproject-access.ps1"
79+
.\grant-crossproject-access.ps1 -FirebaseProject SOURCE_FIREBASE_PROJECT -BigQueryProject TARGET_BIGQUERY_PROJECT [-ExtensionInstanceId EXTENSION_INSTANCE_ID] [-ServiceAccount SERVICE_ACCOUNT]
80+
```
81+
82+
**Parameters:**
83+
For Bash script:
84+
- `-f`: Your Firebase (source) project ID
85+
- `-b`: Your target BigQuery project ID
86+
- `-i`: (Optional) Extension instance ID if different from default "firestore-bigquery-export"
87+
- `-s`: (Optional) Service account email. If not provided, it will be constructed using the extension instance ID
88+
89+
For PowerShell script:
90+
- `-FirebaseProject`: Your Firebase (source) project ID
91+
- `-BigQueryProject`: Your target BigQuery project ID
92+
- `-ExtensionInstanceId`: (Optional) Extension instance ID if different from default "firestore-bigquery-export"
93+
- `-ServiceAccount`: (Optional) Service account email. If not provided, it will be constructed using the extension instance ID
94+
95+
**Prerequisites:**
96+
- You must have the [gcloud CLI](https://cloud.google.com/sdk/docs/install) installed and configured
97+
- You must have permission to grant IAM roles on the target BigQuery project
98+
- The extension must be installed before running the script
99+
100+
**Note:** If extension installation is failing to create a dataset on the target project initially due to missing permissions, don't worry. The extension will automatically retry once you've granted the necessary permissions using these scripts.
64101

65102
### _(Optional)_ Import existing documents
66103

firestore-bigquery-export/PREINSTALL.md

+133
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,81 @@ Prior to sending the document change to BigQuery, you have an opportunity to tra
6969

7070
The response should be indentical in structure.
7171

72+
#### Materialized Views
73+
74+
This extension supports both regular views and materialized views in BigQuery. While regular views compute their results each time they're queried, materialized views store their query results, providing faster access at the cost of additional storage.
75+
76+
There are two types of materialized views available:
77+
78+
1. **Non-incremental Materialized Views**: These views support more complex queries including filtering on aggregated fields, but require complete recomputation during refresh.
79+
80+
2. **Incremental Materialized Views**: These views update more efficiently by processing only new or changed records, but come with query restrictions. Most notably, they don't allow filtering or partitioning on aggregated fields in their defining SQL, among other limitations.
81+
82+
**Important Considerations:**
83+
- Neither type of materialized view in this extension currently supports partitioning or clustering
84+
- Both types allow you to configure refresh intervals and maximum staleness settings during extension installation or configuration
85+
- Once created, a materialized view's SQL definition cannot be modified. If you reconfigure the extension to change either the view type (incremental vs non-incremental) or the SQL query, the extension will drop the existing materialized view and recreate it
86+
- Carefully consider your use case before choosing materialized views:
87+
- They incur additional storage costs as they cache query results
88+
- Non-incremental views may have higher processing costs during refresh
89+
- Incremental views have more query restrictions but are more efficient to update
90+
91+
Example of a non-incremental materialized view SQL definition generated by the extension:
92+
```sql
93+
CREATE MATERIALIZED VIEW `my_project.my_dataset.my_table_raw_changelog`
94+
OPTIONS (
95+
allow_non_incremental_definition = true,
96+
enable_refresh = true,
97+
refresh_interval_minutes = 60,
98+
max_staleness = INTERVAL "4:0:0" HOUR TO SECOND
99+
)
100+
AS (
101+
WITH latests AS (
102+
SELECT
103+
document_name,
104+
MAX_BY(document_id, timestamp) AS document_id,
105+
MAX(timestamp) AS timestamp,
106+
MAX_BY(event_id, timestamp) AS event_id,
107+
MAX_BY(operation, timestamp) AS operation,
108+
MAX_BY(data, timestamp) AS data,
109+
MAX_BY(old_data, timestamp) AS old_data,
110+
MAX_BY(extra_field, timestamp) AS extra_field
111+
FROM `my_project.my_dataset.my_table_raw_changelog`
112+
GROUP BY document_name
113+
)
114+
SELECT *
115+
FROM latests
116+
WHERE operation != "DELETE"
117+
)
118+
```
119+
120+
Example of an incremental materialized view SQL definition generated by the extension:
121+
```sql
122+
CREATE MATERIALIZED VIEW `my_project.my_dataset.my_table_raw_changelog`
123+
OPTIONS (
124+
enable_refresh = true,
125+
refresh_interval_minutes = 60,
126+
max_staleness = INTERVAL "4:0:0" HOUR TO SECOND
127+
)
128+
AS (
129+
SELECT
130+
document_name,
131+
MAX_BY(document_id, timestamp) AS document_id,
132+
MAX(timestamp) AS timestamp,
133+
MAX_BY(event_id, timestamp) AS event_id,
134+
MAX_BY(operation, timestamp) AS operation,
135+
MAX_BY(data, timestamp) AS data,
136+
MAX_BY(old_data, timestamp) AS old_data,
137+
MAX_BY(extra_field, timestamp) AS extra_field
138+
FROM
139+
`my_project.my_dataset.my_table_raw_changelog`
140+
GROUP BY
141+
document_name
142+
)
143+
```
144+
145+
Please review [BigQuery's documentation on materialized views](https://cloud.google.com/bigquery/docs/materialized-views-intro) to fully understand the implications for your use case.
146+
72147
#### Using Customer Managed Encryption Keys
73148

74149
By default, BigQuery encrypts your content stored at rest. BigQuery handles and manages this default encryption for you without any additional actions on your part.
@@ -100,6 +175,64 @@ If you follow these steps, your changelog table should be created using your cus
100175

101176
After your data is in BigQuery, you can run the [schema-views script](https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/guides/GENERATE_SCHEMA_VIEWS.md) (provided by this extension) to create views that make it easier to query relevant data. You only need to provide a JSON schema file that describes your data structure, and the schema-views script will create the views.
102177

178+
#### Cross-project Streaming
179+
180+
By default, the extension exports data to BigQuery in the same project as your Firebase project. However, you can configure it to export to a BigQuery instance in a different Google Cloud project. To do this:
181+
182+
1. During installation, set the `BIGQUERY_PROJECT_ID` parameter to your target BigQuery project ID.
183+
184+
2. After installation, you'll need to grant the extension's service account the necessary BigQuery permissions on the target project. You can use our provided scripts:
185+
186+
**For Linux/Mac (Bash):**
187+
```bash
188+
curl -O https://raw.githubusercontent.com/firebase/extensions/master/firestore-bigquery-export/scripts/grant-crossproject-access.sh
189+
chmod +x grant-crossproject-access.sh
190+
./grant-crossproject-access.sh -f SOURCE_FIREBASE_PROJECT -b TARGET_BIGQUERY_PROJECT [-i EXTENSION_INSTANCE_ID]
191+
```
192+
193+
**For Windows (PowerShell):**
194+
```powershell
195+
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/firebase/extensions/master/firestore-bigquery-export/scripts/grant-crossproject-access.ps1" -OutFile "grant-crossproject-access.ps1"
196+
.\grant-crossproject-access.ps1 -FirebaseProject SOURCE_FIREBASE_PROJECT -BigQueryProject TARGET_BIGQUERY_PROJECT [-ExtensionInstanceId EXTENSION_INSTANCE_ID]
197+
```
198+
199+
**Parameters:**
200+
For Bash script:
201+
- `-f`: Your Firebase (source) project ID
202+
- `-b`: Your target BigQuery project ID
203+
- `-i`: (Optional) Extension instance ID if different from default "firestore-bigquery-export"
204+
205+
For PowerShell script:
206+
- `-FirebaseProject`: Your Firebase (source) project ID
207+
- `-BigQueryProject`: Your target BigQuery project ID
208+
- `-ExtensionInstanceId`: (Optional) Extension instance ID if different from default "firestore-bigquery-export"
209+
210+
**Prerequisites:**
211+
- You must have the [gcloud CLI](https://cloud.google.com/sdk/docs/install) installed and configured
212+
- You must have permission to grant IAM roles on the target BigQuery project
213+
- The extension must be installed before running the script
214+
215+
**Note:** If extension installation is failing to create a dataset on the target project initially due to missing permissions, don't worry. The extension will automatically retry once you've granted the necessary permissions using these scripts.
216+
217+
#### Mitigating Data Loss During Extension Updates
218+
219+
When updating or reconfiguring this extension, there may be a brief period where data streaming from Firestore to BigQuery is interrupted. While this limitation exists within the Extensions platform, we provide two strategies to mitigate potential data loss.
220+
221+
##### Strategy 1: Post-Update Import
222+
After reconfiguring the extension, run the import script on your collection to ensure all data is captured. Refer to the "Import Existing Documents" section above for detailed steps.
223+
224+
##### Strategy 2: Parallel Instance Method
225+
1. Install a second instance of the extension that streams to a new BigQuery table
226+
2. Reconfigure the original extension
227+
3. Once the original extension is properly configured and streaming events
228+
4. Uninstall the second instance
229+
5. Run a BigQuery merge job to combine the data from both tables
230+
231+
##### Considerations
232+
- Strategy 1 is simpler but may result in duplicate records that need to be deduplicated
233+
- Strategy 2 requires more setup but provides better data continuity
234+
- Choose the strategy that best aligns with your data consistency requirements and operational constraints
235+
103236
#### Billing
104237
To install an extension, your project must be on the [Blaze (pay as you go) plan](https://firebase.google.com/pricing)
105238

0 commit comments

Comments
 (0)