Skip to content

Cube preAggregation fail to incrementally capture the up-to-date time series data #9293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kevinleeTCA opened this issue Mar 3, 2025 · 2 comments
Assignees
Labels
question The issue is a question. Please use Stack Overflow for questions.

Comments

@kevinleeTCA
Copy link

Problem

We have an incremental preAggregation rollup, updated on daily basis, and partition by month

  preAggregations: {
    rollup: {
      measures: [CUBE.incomeAmount, CUBE.count, CUBE.managementFeeIncomeAmount, CUBE.minDate],
      dimensions: [CUBE.transactionTaxCategoryId, CUBE.generalLedgerManagement, CUBE.accountOwner, CUBE.accountBookType],
      timeDimension: CUBE.createdAt,
      granularity: `day`,
      indexes: {
        idx: {
          columns: [ CUBE.accountOwner, CUBE.generalLedgerManagement],
        },
      },
      refresh_key: {
        every: `30 6 * * *`,
        timezone: `Australia/Sydney`,
        incremental: true,
        update_window: `120 days`,
      },
      partition_granularity: `month`,
      build_range_start: {
        sql: `SELECT '2020-08-11'::timestamp AT TIME ZONE 'utc'`,
      },
      build_range_end: {
        sql: `SELECT NOW()`,
      },
    },
  },

We also have the 5 timezones that we would like to support for our customer' query:

  scheduledRefreshTimeZones: [
    'Australia/Adelaide',
    'Australia/Brisbane',
    'Australia/Darwin',
    'Australia/Perth',
    'Australia/Sydney'
  ],

our configuration:

Cube server type: self-host via docker
Cube version: 1.1.18
Cubestore version: 1.1.18

We found a missing record in the pre-aggregation rollups, that record created at 2025-03-02 10pm (Australia/Sydney). Daily rollup runs at 6.30am (Australia/Sydney).

Details

  1. looking at all our rollups for March, we have 5 of them corresponding to the 5 timezones, but hard to figure it out which one represents which timezone, any clue ?
    Image
  2. Among these 5, only 1 of them , i.e., Australia/Perth partition has the missing management fee transaction data ($35). but it also means it has 1 missing transaction ($18) that happens at 11pm, 28th, Feb (Australia/Perth)

Image

  1. All other 4 partitions have no such transaction, but does have the $18 that happens 2025-02-28T15:01:29.343Z (UTC), as it is legitimately 1st of March in their timezone.
  2. This is aligned with the query we emitted via API.

Image

  1. Looking at the actual time of the $35, it is 2025-03-02T11:07:17.596Z (UTC) which is about 10pm at 02/03 in Australia/Sydney timezone, so it should be captured in other time partitions, as we did the daily aggregation 03/03 at 6.30am Australia/Sydney timezone, you can see the build_range_end are all about 7am 03/03, so it should cover the missing transactions.

Related Cube.js schema

...
  measures: {
    incomeAmount: {
      sql: `amount`,
      type: `sum`,
    },
    feeIncomeAmount: {
      type: `sum`,
      title: `Income excluding any Rent`,
      sql: `amount`,
      filters: [
        {
          sql: `${CUBE.transactionTaxCategoryId} != 'RENT'`,
        },
      ],
    },
    count: {
      type: `count`,
    },
    minDate: {
      sql: `created_at::TIMESTAMP`,
      type: `min`,
    },
    managementFeeIncomeAmount: {
      type: `sum`,
      title: `Total Management Fee Income`,
      sql: `amount`,
      filters: [
        {
          sql: `${CUBE.transactionTaxCategoryId} = 'MANAGEMENT_FEES'`,
        },
      ],
      description: "Management fee income also known as commission fee or commission rate income, " +
          "it is a service fee that landlord/owners/investor charged on top of the rent collected from tenants"
    },
  },
  dimensions: {
    eventId: {
      sql: `event_id`,
      type: `string`,
      primaryKey: true,
    },
    accountId: {
      sql: `account_id`,
      type: `string`,
      primaryKey: true,
      shown: true,
    },
    accountBookType: {
      sql: `book`,
      type: `string`,
    },
    createdAt: {
      sql: `created_at::TIMESTAMP`,
      type: `time`,
    },
    accountOwner: {
      sql: `account_owner_reference`,
      type: `string`,
    },
    generalLedgerManagement: {
      sql: `relating_to_management`,
      type: `string`,
    },
    transactionTaxCategoryId: {
      sql: `tax_category`,
      type: `string`,
    },
  },

Related Cube.js query

query glIncomeByTaxCategoryReport($organisationId: String!, $teamIds: [String!], $groupIds: [String!], $propertyTypes: [String!], $dateRange: [String!], $orderBy: RootOrderByInput!, $limit: Int, $timezone: String) {
  cube(
    timezone: $timezone
    where: {pM_Teams: {organisationId: {equals: $organisationId}, teamId: {in: $teamIds}, propertyType: {in: $propertyTypes}}, generalLedger_Income: {createdAt: {inDateRange: $dateRange}, accountBookType: {equals: "CASH"}}, groupsByMa: {groupIds: {contains: $groupIds}}}
    orderBy: $orderBy
    limit: $limit
  ) {
    generalLedger_Income {
      incomeAmount
      managementFeeIncomeAmount
			generalLedgerManagement
      count
      __typename
    }
    pM_FeeTaxCategories {
      feeTaxCategoryName
      __typename
    }
		pM_Teams{
			teamName
		}
    __typename
  }
}

params

{
	"organisationId": "4df60ddb-d2ba-4b32-96b0-616c58ecafb8",
	"dateRange": [
		"2025-03-01",
		"2025-03-28"
	],
	"orderBy": {
		"generalLedger_Income": {
			"createdAt": "desc"
		}
	},
	"limit": 3000,
	"timezone": "Australia/Perth"
}
@kevinleeTCA kevinleeTCA added the question The issue is a question. Please use Stack Overflow for questions. label Mar 3, 2025
@kevinleeTCA
Copy link
Author

I saw a related PR here: #9284 submitted by @KSDaemon

@KSDaemon
Copy link
Member

KSDaemon commented Mar 7, 2025

Hi @kevinleeTCA!
Yesterday, v1.2.18 was released with some fixes related to pre-aggregations refresh flows. Please try it in your env and examine - does it fix anything? Try removing timezone-specific queries in the build range start/end. Also, pay attention to updated partitions - unexpected partitions should not be updated too...

@KSDaemon KSDaemon self-assigned this Mar 7, 2025
@igorlukanin igorlukanin self-assigned this Mar 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question The issue is a question. Please use Stack Overflow for questions.
Projects
None yet
Development

No branches or pull requests

3 participants