Skip to content

Add Bigeye checks for mozilla_org_derived datasets #6473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions bigquery_etl/cli/monitoring.py
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,8 @@ def _update_bigconfig(
for collection in bigconfig.tag_deployments:
for deployment in collection.deployments:
for metric in deployment.metrics:
print("im here")
print(deployment)
if metric.metric_type is None:
err_message = f"""There appears to be an issue parsing \
a metric type definition for `{project}.{dataset}.{table}` \
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
type: BIGCONFIG_FILE

row_creation_times:
column_selectors:
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.blogs_goals_v2.date

saved_metric_definitions:
metrics:
- saved_metric_id: COUNT_DUPLICATES
metric_type:
type: PREDEFINED
predefined_metric: COUNT_DUPLICATES
metric_name: Duplicates (#)
group_by:
- date
threshold:
type: CONSTANT
upper_bound: 0.0
lower_bound: 0.0
lookback:
lookback_window:
interval_type: DAYS
interval_value: -1
lookback_type: METRIC_TIME
bucket_size: DAY
rct_overrides:
- date
- saved_metric_id: VISIT_IDENTIFIER_REGEX_CHECK
metric_type:
type: TEMPLATE
template_id: 947
aggregation_type: COUNT
template_name: visit_identifier_regex_check
metric_name: COUNT of visit_identifier_regex_check
group_by:
- date
threshold:
type: CONSTANT
upper_bound: 0.0
lower_bound: 0.0
parameters:
- key: column_name
string_value: visit_identifier
lookback:
lookback_window:
interval_type: DAYS
interval_value: -1
lookback_type: METRIC_TIME
bucket_size: DAY
rct_overrides:
- date

tag_deployments:
- collection:
name: blogs_goals_v2
description: SDK Generated
deployments:
- column_selectors:
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.blogs_goals_v2.visit_identifier
metrics:
- saved_metric_id: COUNT_DUPLICATES
- column_selectors:
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.blogs_goals_v2
metrics:
- saved_metric_id: VISIT_IDENTIFIER_REGEX_CHECK
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,6 @@ bigquery:
clustering:
fields:
- visit_identifier
monitoring:
enabled: true
references: {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
type: BIGCONFIG_FILE
saved_metric_definitions:
metrics:
- saved_metric_id: count_number_rows
metric_type:
type: PREDEFINED
predefined_metric: COUNT_ROWS
metric_name: downloads_with_attribution_v2_row_count
group_by:
- download_date
threshold:
type: CONSTANT
lower_bound: 50000.0
rct_overrides:
- download_date
tag_deployments:
- collection:
name: Google Analytics
description: All GA related data
deployments:
- column_selectors:
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.downloads_with_attribution_v2.*
metrics:
- saved_metric_id: count_number_rows
- column_selectors:
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.downloads_with_attribution_v2.*
metrics:
- metric_type:
type: PREDEFINED
predefined_metric: FRESHNESS
metric_name: FRESHNESS [warn]
metric_schedule:
named_schedule:
name: Default Schedule - 13:00 UTC
- metric_type:
type: PREDEFINED
predefined_metric: VOLUME
metric_name: VOLUME [fail]
metric_schedule:
named_schedule:
name: Default Schedule - 13:00 UTC
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,6 @@ bigquery:
expiration_days: null
clustering: null
references: {}
monitoring:
enabled: true

Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
type: BIGCONFIG_FILE

row_creation_times:
column_selectors:
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_clients_v2.first_seen_date

saved_metric_definitions:
metrics:
- saved_metric_id: COUNT_DUPLICATES
metric_type:
type: PREDEFINED
predefined_metric: COUNT_DUPLICATES
metric_name: Duplicates (#)
threshold:
type: AUTO
sensitivity: MEDIUM
upper_bound_only: false
lower_bound_only: false
rct_overrides:
- bigeye-no-rct
- saved_metric_id: COUNT_ROWS
metric_type:
type: PREDEFINED
predefined_metric: COUNT_ROWS
metric_name: Row count (#)
conditions:
- "first_seen_date >= '2024-01-01'\n and first_reported.country IN ('United States',\
\ 'Canada')"
group_by:
- first_seen_date
- first_reported.country
threshold:
type: AUTO
sensitivity: MEDIUM
upper_bound_only: false
lower_bound_only: false
lookback:
lookback_window:
interval_type: DAYS
interval_value: -1
lookback_type: METRIC_TIME
bucket_size: DAY
rct_overrides:
- first_seen_date
metric_schedule:
named_schedule:
name: Default Schedule - 13:00 UTC

tag_deployments:
- collection:
name: Google Analytics
description: All checks related to GA tables
deployments:
- column_selectors:
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_clients_v2.ga_client_id
metrics:
- saved_metric_id: COUNT_DUPLICATES
- column_selectors:
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_clients_v2.*
metrics:
- saved_metric_id: COUNT_ROWS
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,6 @@ scheduling:
bigquery:
clustering:
fields: ["first_seen_date"]
monitoring:
enabled: true
references: {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
type: BIGCONFIG_FILE

row_creation_times:
column_selectors:
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_sessions_v2.session_date

saved_metric_definitions:
metrics:
- saved_metric_id: PERCENT_NULL
metric_type:
type: PREDEFINED
predefined_metric: PERCENT_NULL
metric_name: Null (%)
threshold:
type: CONSTANT
upper_bound: 0.0
lower_bound: 0.0
lookback:
lookback_window:
interval_type: DAYS
interval_value: -1
lookback_type: METRIC_TIME
bucket_size: DAY
rct_overrides:
- session_date
metric_schedule:
named_schedule:
name: Default Schedule - 13:00 UTC
- saved_metric_id: COUNT_ROWS
metric_type:
type: PREDEFINED
predefined_metric: COUNT_ROWS
metric_name: Row count (#)
group_by:
- ga_session_id
- ga_client_id
threshold:
type: CONSTANT
upper_bound: 1.0
lower_bound: 0.0
rct_overrides:
- bigeye-no-rct
metric_schedule:
named_schedule:
name: Default Schedule - 13:00 UTC
tag_deployments:
- collection:
name: Google Analytics
description: All checks related to GA tables
deployments:
- column_selectors:
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_sessions_v2.session_date
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_sessions_v2.ga_session_id
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_sessions_v2.ga_client_id
metrics:
- saved_metric_id: PERCENT_NULL
- column_selectors:
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.ga_sessions_v2.*
metrics:
- saved_metric_id: COUNT_ROWS
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,6 @@ workgroup_access:
- workgroup:mozilla-confidential
- workgroup:google-managed/external-ads-datafusion
- workgroup:google-managed/external-ads-dataproc
monitoring:
enabled: true

Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
type: BIGCONFIG_FILE

row_creation_times:
column_selectors:
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.www_site_hits_v2.date

saved_metric_definitions:
metrics:
- saved_metric_id: VISIT_IDENTIFIER_REGEX_CHECK
metric_type:
type: TEMPLATE
template_id: 947
aggregation_type: COUNT
template_name: visit_identifier_regex_check
metric_name: mozilla_org_derived.www_site_hits_v2 - COUNT of visit_identifier_regex_check
group_by:
- date
threshold:
type: CONSTANT
upper_bound: 0.0
lower_bound: 0.0
parameters:
- key: column_name
string_value: visit_identifier
lookback:
lookback_window:
interval_type: DAYS
interval_value: -1
lookback_type: METRIC_TIME
bucket_size: DAY
rct_overrides:
- date
metric_schedule:
named_schedule:
name: Default Schedule - 13:00 UTC

tag_deployments:
- collection:
name: Google Analytics
description: All checks related to GA tables
deployments:
- column_selectors:
- name: moz-fx-data-shared-prod.moz-fx-data-shared-prod.mozilla_org_derived.www_site_hits_v2
metrics:
- saved_metric_id: VISIT_IDENTIFIER_REGEX_CHECK
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,6 @@ bigquery:
- country
- language
- event_name
monitoring:
enabled: true
references: {}