Skip to content

Use fixed column order in baseline_clients_last_seen #7119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to the effort of fixing column orders in this query, there are a couple other cases it'd be good to fix up:

  • The init query has submission_date after the *_bits columns. I'd recommend we have submission_date be the first column in the init query like it is in the main query (IMO partition columns should always be the first column in the schema).
  • The _previous CTE has the *_bits columns in a different order than the init query and the _current CTE. Luckily because the IF(_current.client_id IS NOT NULL, _current, _previous).* REPLACE (...) expression is replacing those *_bits columns with explicit references to the associated columns from _current and _previous the end result is correct. However, if those *_bits columns weren't being replaced like that this would silently be doing the wrong thing (similar to unioning two queries with different column orders), so IMO it would be good to make the *_bits column order in _previous match the other two cases.

Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,38 @@ WITH _current AS (
CAST(browser_engagement_uri_count > 0 AS INT64) &
CAST(browser_engagement_active_ticks > 0 AS INT64) AS days_desktop_active_bits,
{% endif %}
* EXCEPT(submission_date)
client_id,
sample_id,
first_run_date,
durations,
days_seen_session_start_bits,
days_seen_session_end_bits,
normalized_channel,
normalized_os,
normalized_os_version,
android_sdk_version,
locale,
city,
country,
isp,
app_build,
app_channel,
app_display_version,
architecture,
device_manufacturer,
device_model,
telemetry_sdk_build,
first_seen_date,
is_new_profile,
Comment on lines +60 to +69
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good if this query's column order matched the schema.yaml column order (also in the _previous CTE):

Suggested change
isp,
app_build,
app_channel,
app_display_version,
architecture,
device_manufacturer,
device_model,
telemetry_sdk_build,
first_seen_date,
is_new_profile,
app_build,
app_channel,
app_display_version,
architecture,
device_manufacturer,
device_model,
telemetry_sdk_build,
first_seen_date,
is_new_profile,
isp,

Or alternatively, the schema.yaml could be updated to match this column order.

distribution_id,
geo_subdivision,
profile_group_id,
install_source,
windows_build_number,
browser_engagement_uri_count,
browser_engagement_active_ticks,
legacy_telemetry_client_id,
is_default_browser,
FROM
`{{ daily_table }}`
WHERE
Expand All @@ -54,21 +85,46 @@ WITH _current AS (
--
_previous AS (
SELECT
-- All columns except submission_date ( * (EXCEPT(submission_date)) )
-- listed out to ensure order is identical to output of _current
days_seen_bits,
days_active_bits,
{% if app_name == "firefox_desktop" %}
days_desktop_active_bits,
{% endif %}
days_created_profile_bits,
* EXCEPT (
submission_date,
days_seen_bits,
days_active_bits,
{% if app_name == "firefox_desktop" %}
days_desktop_active_bits,
{% endif %}
days_created_profile_bits
),
client_id,
sample_id,
first_run_date,
durations,
days_seen_session_start_bits,
days_seen_session_end_bits,
normalized_channel,
normalized_os,
normalized_os_version,
android_sdk_version,
locale,
city,
country,
isp,
app_build,
app_channel,
app_display_version,
architecture,
device_manufacturer,
device_model,
telemetry_sdk_build,
first_seen_date,
is_new_profile,
distribution_id,
geo_subdivision,
profile_group_id,
install_source,
windows_build_number,
browser_engagement_uri_count,
browser_engagement_active_ticks,
legacy_telemetry_client_id,
is_default_browser,
FROM
`{{ last_seen_table }}`
WHERE
Expand Down