feat(gateway): Add GraphQL endpoint #3555

lvauvillier · 2021-10-18T20:36:36Z

This PR is an implementation of the GraphQL spec described in issue #3433

I think i covered all the main functionalities of the REST API.

Description

A new endpoint /cubejs-api/graphql is added in the gateway.
All security middlewares (checkAuth, queryRewrite, etc.) are still used.

In dev mode, graphiql UI is available.

How it works

GraphQL schema is automatically generated using the compilerApi.
GraphQL AST info are used to generate cubequery at root cube entity level
Results are remapped to the graphql query shape

Warning

Current implementation will raise an error for a continueWait response. Long polling is not implemented (and not possible with this API design).

Example

original query:

{
  "measures": ["Stories.count"],
  "dimensions": ["Stories.category"],
  "filters": [{
    "member": "Stories.isDraft",
    "operator": "equals",
    "values": ["No"]
  }],
  "timeDimensions": [{
    "dimension": "Stories.time",
    "dateRange": ["2015-01-01", "2015-12-31"],
    "granularity": "month"
  }],
  "limit": 100,
  "offset": 50,
  "order": {
    "Stories.time": "asc",
    "Stories.count": "desc"
  },
  "timezone": "America/Los_Angeles"
}

is equivalent to graphql query:

{
  load(limit: 100, offset: 50, timezone: "America/Los_Angeles") {
    stories(granularity: {time: month}, orderBy: { time: asc, count: desc }, where: {isDraft: {equals: "NO"}, time: {inDateRange: ["2015-01-01", "2015-12-31"]}}) {
      count
      category
      time
    }
  }
}

directives @skip and @include are supported:

query GetStories($byCategory: Boolean = false) {
  stories(granularity: { time: month }) {
      count
      category @include(if: $byCategory)
      time
  }
}

codecov · 2021-10-18T21:29:24Z

Codecov Report

Merging #3555 (3d49ea9) into master (de7471d) will decrease coverage by 1.37%.
The diff coverage is 5.72%.

@@            Coverage Diff             @@
##           master    #3555      +/-   ##
==========================================
- Coverage   59.67%   58.30%   -1.38%     
==========================================
  Files         125      126       +1     
  Lines       10014    10283     +269     
  Branches     2266     2340      +74     
==========================================
+ Hits         5976     5995      +19     
- Misses       3756     4002     +246     
- Partials      282      286       +4

Impacted Files	Coverage Δ
...ackages/cubejs-server-core/src/core/CompilerApi.js	`71.62% <0.00%> (-4.10%)`	⬇️
packages/cubejs-api-gateway/src/graphql.ts	`4.83% <4.83%> (ø)`
packages/cubejs-api-gateway/src/gateway.ts	`67.37% <30.00%> (-0.68%)`	⬇️
packages/cubejs-api-gateway/src/query.js	`72.80% <0.00%> (-1.03%)`	⬇️
packages/cubejs-server-core/src/core/server.ts	`62.98% <0.00%> (ø)`
packages/cubejs-server-core/src/core/DevServer.ts	`8.46% <0.00%> (ø)`
...kages/cubejs-api-gateway/src/SubscriptionServer.ts	`6.77% <0.00%> (ø)`
...y-orchestrator/src/orchestrator/PreAggregations.ts	`83.88% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b23dda9...3d49ea9. Read the comment docs.

paveltiunov · 2021-10-19T21:20:35Z

@lvauvillier Hey Luc! Thanks for contributing this one! It's a huge accomplishment! I'm still reviewing it. So far so good. It might be beneficial to add a couple of E2E tests with graphql requests to the cubejs-testing package. What do you think?

packages/cubejs-api-gateway/src/graphql.ts

ovr · 2021-10-20T13:25:58Z

packages/cubejs-api-gateway/src/graphql.ts

+
+  const json = await response.json();
+
+  if (json.error === 'Continue wait') {


Cube.js has a queue system that supports removing unused queries (orphaned). Continue wait (long polling) is used to detect that client still requires and waiting for a response from the server. I don't think that it's a blocker for this feature, but anyway it's a limitation.

cC @paveltiunov

@lvauvillier There's no easy solution to the long polling problem in graphql so for now I'd suggest just returning the error or any other type of response that means it needs to be fetched again. Workarounds to this problem would be using pre-aggregations, increasing Continue wait timeouts, and implementing retry logic client-side. Retry logic on the backend isn't necessary and it could be dropped as it basically handled by continueWaitTimeout (https://cube.dev/docs/config#queue-options). The reason long polling exists in Cube isn't that Cube itself can't wait for so long but that reverse proxies will usually drop connection much earlier.

@paveltiunov I now use apiGateway.load() to retrieve results. A Continue wait response will raise an error.

ovr · 2021-10-20T13:28:19Z

packages/cubejs-api-gateway/src/graphql.ts

+  const headers = { ...originalReq.headers } as Record<string, string>;
+  ['host', 'connexion', 'content-length'].forEach(key => delete headers[key]);
+
+  const response = await fetch(url, {


IMHO, I think it's better to use a direct call for the load method from ApiGateway instead of creating a proxy.

We use a similar strategy for WS & SQL servers:

https://github.com/cube-js/cube.js/blob/master/packages/cubejs-api-gateway/src/sql-server.ts#L51

https://github.com/cube-js/cube.js/blob/master/packages/cubejs-api-gateway/src/SubscriptionServer.ts#L88

@ovr, i agree, this was my first option, but i revert to a fetch call because i don't find how to deal with the continue wait issue.
I can see a hardcoded continueWait: true in the adapterApi.executeQuery() options. I didn't spend a lot of time to understand how the load api works and what triggers the continue wait response.

@ovr, i finally used theload() api and raise an error if i get a continue wait response.

paveltiunov · 2021-10-21T01:33:27Z

@lvauvillier On a side note how would query with members from multiple cubes look like?

lvauvillier · 2021-10-21T09:29:49Z

@paveltiunov There is no problem to query multiple cubes:

{
  cube1 {
    measures {
      count
    }
  }
  cube2 {
    measures {
      count
    }
  }
}

you will get a result:

{
  "cube1": [{ "measures": { "count": 10 }}],
  "cube2":  [{ "measures": { "count": 20 }}],
}

The query will be resolved using 2 distinct api calls.
We can add a dataloader (https://github.com/graphql/dataloader) to batch all cube resolvers using a single "multi" query.

paveltiunov · 2021-10-22T06:32:10Z

@lvauvillier I mean what if we need to do a joined cubes query?

lvauvillier · 2021-10-22T08:50:28Z

@paveltiunov good question. This is not possible with this design.
We need to remove the "cube" node level and put back prefixes to handle joins. Let me think about it.

lvauvillier · 2021-10-22T17:57:00Z

@paveltiunov

This design can work (and is closer to the rest api):

{
  load(filters: ..., limit: ...) {
    measures {
      cube1 {
        count
      }
      cube2 {
        count
      }
    }
    dimensions {
      cube1 {
        country
      }
      cube2 {
        publishedAt(granularity: day) <-- timeDimension
      }
    }
}

But the result shape is complex...

A simpler design can also work as we can guess the type (dimension or measures) of each cube fields. (timeDimensions are already auto detected if we set a granularity arg to a dimension).

{
  load(filters: ..., limit: ...) {
    cube1 {
      count <-- measure
      country <-- dimension
    }
    cube2 {
      count <-- measure
      publishedAt(granularity: day) <-- timeDimension
    }
  }
}

Other point:

In this new design, filters are now global, so we need to put back the cube information in the filter member field.

Keeping graphqQL naming convention (camelCase) filters will look like this:

{
  load(filters: [{ member: "cube2.publishedAt", operator: inDateRange, values: ["2021-01-01", "2021-12-31"]}]) {
    cube1 {
      count <-- measure
      country <-- dimension
    }
    cube2 {
      count <-- measure
      publishedAt(granularity: day) <-- timeDimension
    }
  }
}

I dont like this syntax and the "dot" prevents the usage of a generated graphql enum.

Maybe we can use here:

cube2PublishedAt: looks better, but not collision safe
cube2_publishedAt: i'm not sure if underscore are allowed in cube names, if yes it is not collision safe too...
a new cube field : { cube: "cube2", member: "publishedAt", operator: ... }, but this also prevent the usage of an enum as cube and member becomes dependent values.

What do you think?

paveltiunov · 2021-10-28T03:47:52Z

@lvauvillier Yeah. Cool! This nested cube approach seems to be solving an issue. Regarding . usage: do you think we might consider typed filter object very similar to https://hasura.io/docs/latest/graphql/core/databases/postgres/queries/query-filters.html#the-where-argument ?

lvauvillier · 2021-10-28T10:46:26Z

@paveltiunov I also like the idea to have a filtering argument at both query and member fields level. For nested filters we need to use the simpler design to get rid of the unnecessary members / dimensions level.

This also means that:

We have to generate a "where" Input type for each cube.
We can limit allowed operators using the member type

I'll give a try this weekend.

lvauvillier · 2021-11-02T09:15:14Z

@paveltiunov I just push an update and it works nicely :)

The new filters works great with autocompletion and we can now join cubes.

Filters are autogenerated based on the member type with some changes from the REST API (heavily inspired from hasura and prisma filtering api):

it accepts number values for number types (instead of string values everywhere)
notSet is removed in favor of { set: false }
equals and notEquals accept a single value, but we have now inand notIn for multiple values
AND and OR fields for boolean operator
filters can be set at load level or <cubeName> level. Using load level allow user to use boolean operator accros multiple cubes.

Example:

{
  load(offset: 20, limit: 5, where: {users: {firstName: {notEquals: "Golden"}}}) {
    orders(granularity: {createdAt: day}, where: {OR: [{createdAt: {afterDate: "2021-01-01"}}, {status: {equals: "shipped"}}]}, orderBy: {createdAt: asc}) {
      count
      status
      createdAt
    }
    users {
      firstName
    }
  }
}

will generate:

{
  "measures": [
    "Orders.count"
  ],
  "dimensions": [
    "Orders.status",
    "Users.firstName"
  ],
  "timeDimensions": [
    {
      "dimension": "Orders.createdAt",
      "granularity": "day"
    }
  ],
  "order": {
    "Orders.createdAt": "asc"
  },
  "limit": 5,
  "offset": 20,
  "filters": [
    {
      "member": "Users.firstName",
      "operator": "notEquals",
      "values": [
        "Golden"
      ]
    },
    {
      "or": [
        {
          "member": "Orders.createdAt",
          "operator": "afterDate",
          "values": [
            "2021-01-01"
          ]
        },
        {
          "member": "Orders.status",
          "operator": "equals",
          "values": [
            "shipped"
          ]
        }
      ]
    }
  ]
}

New graphQL spec:

input DateTimeFilter {
  afterDate: String
  beforeDate: String
  equals: String
  in: [String]
  inDateRange: [String]
  notEquals: String
  notIn: [String]
  notInDateRange: [String]
  set: Boolean
}

input FloatFilter {
  contains: Float
  equals: Float
  in: [Float]
  notContains: Float
  notEquals: Float
  notIn: [Float]
  set: Boolean
}

input StringFilter {
  contains: String
  equals: String
  in: [String]
  notContains: String
  notEquals: String
  notIn: [String]
  set: Boolean
}

enum Granularity {
  day
  hour
  minute
  month
  second
  week
  year
}

enum OrderBy {
  asc
  desc
}

type Query {
  load(limit: Int, offset: Int, renewQuery: Boolean, timezone: String, where: RootWhereInput): [Result!]!
}

type Result {
  <cubeName>(granularity: <CubeName>GranularityInput, orderBy: <CubeName>OrderByInput, where: <CubeName>WhereInput): <CubeName>Members!
  ...
}

type <CubeName>Members {
  <member>: String | Float | DateTime
  ...
}

input RootWhereInput {
  AND: [RootWhereInput!]
  OR: [RootWhereInput!]
  <cubeName>: <CubeName>WhereInput
  ...
}

input <CubeName>WhereInput {
  AND: [<CubeName>WhereInput!]
  OR: [<CubeName>WhereInput!]
  <member>: FloatFilter | DateTimeFilter | StringFilter
  ...
}

input <CubeName>GranularityInput {
 <timeMember>: Granularity
}

input <CubeName>OrderByInput {
 <member>: OrderBy
 ...
}

paveltiunov · 2021-11-03T04:46:55Z

@lvauvillier Looks really good and it seems to play nicely overall! What do you think about putting granularity inside the time dimension field like this?

{
  load(offset: 20, limit: 5, where: {users: {firstName: {notEquals: "Golden"}}}) {
    orders(where: {OR: [{createdAt: {afterDate: "2021-01-01"}}, {status: {equals: "shipped"}}]}, orderBy: {createdAt: asc}) {
      count
      status
      createdAt { day }
    }
    users {
      firstName
    }
  }
}

The reason for it is generally speaking you can query multiple granularities of the same time dimension. It's rare but still valid use case.

Could you please also rebase so we can run the test and be ready to merge?

lvauvillier · 2021-11-03T09:05:17Z

@paveltiunov If we add granularity subfields to time dimensions, we need to keep the ability use the time dimension without granularity.

We can add a new raw field to achieve this (i'm not sure of the name)

With granularity

{
  load(offset: 20, limit: 5, where: {users: {firstName: {notEquals: "Golden"}}}) {
    orders(where: {OR: [{createdAt: {afterDate: "2021-01-01"}}, {status: {equals: "shipped"}}]}, orderBy: {createdAt: asc}) {
      count
      status
      createdAt {
        day
      }
    }
    users {
      firstName
    }
  }
}

Without granularity

{
  load(offset: 20, limit: 5, where: {users: {firstName: {notEquals: "Golden"}}}) {
    orders(where: {OR: [{createdAt: {afterDate: "2021-01-01"}}, {status: {equals: "shipped"}}]}, orderBy: {createdAt: asc}) {
      count
      status
      createdAt {
        raw
      }
    }
    users {
      firstName
    }
  }
}

I also think about using a union type, but scalars are not allowed: we can't use a simple DateTime | Granularity.
Union types also require to use fragment notation in the query:

createdAt {
  ...on Granularity {
    day
  }
}

More info on this issue: graphql/graphql-spec#215

paveltiunov · 2021-11-04T01:51:04Z

@lvauvillier Yeah. This is a really great point! raw might work. Other way is to have explicit dimensions resolver like this to avoid reserved names introduction:

{
  load(offset: 20, limit: 5, where: {users: {firstName: {notEquals: "Golden"}}}) {
    orders(where: {OR: [{createdAt: {afterDate: "2021-01-01"}}, {status: {equals: "shipped"}}]}, orderBy: {createdAt: asc}) {
      count
      status
      dimensions {
        createdAt
      }
    }
    users {
      firstName
    }
  }
}

paveltiunov · 2021-11-04T04:02:05Z

@lvauvillier I guess we can go with raw for now and see how it progresses over time. Could you please add it? Thanks!

lvauvillier · 2021-11-04T22:33:46Z

@paveltiunov I just pushed a new version. I finally chose the name value instead of raw.
It now works with multiple granularities at the same time.

I also added the missing quarter granularity and add a nonNull constraint to all members.

New graphQL spec:

input DateTimeFilter {
  afterDate: String
  beforeDate: String
  equals: String
  in: [String]
  inDateRange: [String]
  notEquals: String
  notIn: [String]
  notInDateRange: [String]
  set: Boolean
}

input FloatFilter {
  contains: Float
  equals: Float
  in: [Float]
  notContains: Float
  notEquals: Float
  notIn: [Float]
  set: Boolean
}

input StringFilter {
  contains: String
  equals: String
  in: [String]
  notContains: String
  notEquals: String
  notIn: [String]
  set: Boolean
}

type TimeDimension {
  day: DateTime!
  hour: DateTime!
  minute: DateTime!
  month: DateTime!
  quarter: DateTime!
  second: DateTime!
  value: DateTime!
  week: DateTime!
  year: DateTime!
}

enum OrderBy {
  asc
  desc
}

type Query {
  load(limit: Int, offset: Int, renewQuery: Boolean, timezone: String, where: RootWhereInput): [Result!]!
}

type Result {
  <cubeName>(orderBy: <CubeName>OrderByInput, where: <CubeName>WhereInput): <CubeName>Members!
  ...
}

type <CubeName>Members {
  <member>: String! | Float!
  <timeMember>: TimeDimension!
  ...
}

input RootWhereInput {
  AND: [RootWhereInput!]
  OR: [RootWhereInput!]
  <cubeName>: <CubeName>WhereInput
  ...
}

input <CubeName>WhereInput {
  AND: [<CubeName>WhereInput!]
  OR: [<CubeName>WhereInput!]
  <member>: FloatFilter | DateTimeFilter | StringFilter
  ...
}

input <CubeName>OrderByInput {
 <member>: OrderBy
 ...
}

paveltiunov · 2021-11-10T00:13:35Z

@lvauvillier We're almost there! The small thing I noted is you introduced cache for the schema. If we cache it we should respect multi-tenancy. To not withhold merging of this big PR we can consider either removing caching (I realize it won't be suitable for production workload yet) or storing it as a part of CompilerApi. You can introduce a setter for that. Should be an easy fix.

lvauvillier · 2021-11-10T10:51:56Z

@paveltiunov Oh, you are right, schemas can de dynamic if a repositoryFactory is defined.
CompilerApi class has now a getter/setter to cache the graphql schema.

It also raises the point of extending graphiQL UI to add the ability to define security contexts like in the playground. This can be part of a new PR.

paveltiunov · 2021-11-16T22:34:44Z

@lvauvillier Hey Luc! Looks great! Thanks again for such a big effort!

lvauvillier requested a review from a team as a code owner October 18, 2021 20:36

github-actions bot added the pr:community Contribution from Cube.js community members. label Oct 18, 2021

lvauvillier mentioned this pull request Oct 18, 2021

GraphQL query Support #3433

Closed

rpaik requested review from paveltiunov and ryanpei October 18, 2021 21:20

ovr reviewed Oct 20, 2021

View reviewed changes

packages/cubejs-api-gateway/src/graphql.ts Outdated Show resolved Hide resolved

ovr reviewed Oct 20, 2021

View reviewed changes

lvauvillier changed the title ~~feat(gateway): Add GraphQL proxy~~ feat(gateway): Add GraphQL endpoint Oct 21, 2021

lvauvillier requested a review from ovr November 2, 2021 09:20

lvauvillier added 5 commits November 3, 2021 10:57

feat(gateway): Add GraphQL proxy

f939ca3

Add missing hour granularity

9ef4bf8

Add graphql as regular dependency

a9eda52

Use apiGateway.load() instead of fetch() to get results

a6597cf

New api design and filter argument

ed9d06e

lvauvillier force-pushed the graphql-api branch from d4a2c2f to ed9d06e Compare November 3, 2021 10:03

lvauvillier added 3 commits November 4, 2021 23:16

Move granularity from args to fields

bea2a67

Lint

76730c3

Non null members

bf09bb9

Use compilerApi instance to cache graphql schema

3d49ea9

paveltiunov merged commit ed85192 into cube-js:master Nov 16, 2021

lvauvillier deleted the graphql-api branch December 7, 2021 09:51

peterklingelhofer mentioned this pull request Jul 14, 2023

feat: GraphQL GROUP BY Resolver #6886

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gateway): Add GraphQL endpoint #3555

feat(gateway): Add GraphQL endpoint #3555

lvauvillier commented Oct 18, 2021 •

edited

Loading

codecov bot commented Oct 18, 2021 •

edited

Loading

paveltiunov commented Oct 19, 2021

ovr Oct 20, 2021

paveltiunov Oct 21, 2021

lvauvillier Oct 21, 2021

ovr Oct 20, 2021 •

edited

Loading

lvauvillier Oct 20, 2021

lvauvillier Oct 21, 2021 •

edited

Loading

paveltiunov commented Oct 21, 2021

lvauvillier commented Oct 21, 2021 •

edited

Loading

paveltiunov commented Oct 22, 2021

lvauvillier commented Oct 22, 2021

lvauvillier commented Oct 22, 2021 •

edited

Loading

paveltiunov commented Oct 28, 2021

lvauvillier commented Oct 28, 2021 •

edited

Loading

lvauvillier commented Nov 2, 2021 •

edited

Loading

paveltiunov commented Nov 3, 2021 •

edited

Loading

lvauvillier commented Nov 3, 2021 •

edited

Loading

paveltiunov commented Nov 4, 2021

paveltiunov commented Nov 4, 2021

lvauvillier commented Nov 4, 2021 •

edited

Loading

paveltiunov commented Nov 10, 2021

lvauvillier commented Nov 10, 2021 •

edited

Loading

paveltiunov commented Nov 16, 2021


		const json = await response.json();

		if (json.error === 'Continue wait') {

feat(gateway): Add GraphQL endpoint #3555

feat(gateway): Add GraphQL endpoint #3555

Conversation

lvauvillier commented Oct 18, 2021 • edited Loading

Description

How it works

Warning

Example

codecov bot commented Oct 18, 2021 • edited Loading

Codecov Report

paveltiunov commented Oct 19, 2021

ovr Oct 20, 2021

Choose a reason for hiding this comment

paveltiunov Oct 21, 2021

Choose a reason for hiding this comment

lvauvillier Oct 21, 2021

Choose a reason for hiding this comment

ovr Oct 20, 2021 • edited Loading

Choose a reason for hiding this comment

lvauvillier Oct 20, 2021

Choose a reason for hiding this comment

lvauvillier Oct 21, 2021 • edited Loading

Choose a reason for hiding this comment

paveltiunov commented Oct 21, 2021

lvauvillier commented Oct 21, 2021 • edited Loading

paveltiunov commented Oct 22, 2021

lvauvillier commented Oct 22, 2021

lvauvillier commented Oct 22, 2021 • edited Loading

Other point:

paveltiunov commented Oct 28, 2021

lvauvillier commented Oct 28, 2021 • edited Loading

lvauvillier commented Nov 2, 2021 • edited Loading

paveltiunov commented Nov 3, 2021 • edited Loading

lvauvillier commented Nov 3, 2021 • edited Loading

paveltiunov commented Nov 4, 2021

paveltiunov commented Nov 4, 2021

lvauvillier commented Nov 4, 2021 • edited Loading

paveltiunov commented Nov 10, 2021

lvauvillier commented Nov 10, 2021 • edited Loading

paveltiunov commented Nov 16, 2021

lvauvillier commented Oct 18, 2021 •

edited

Loading

codecov bot commented Oct 18, 2021 •

edited

Loading

ovr Oct 20, 2021 •

edited

Loading

lvauvillier Oct 21, 2021 •

edited

Loading

lvauvillier commented Oct 21, 2021 •

edited

Loading

lvauvillier commented Oct 22, 2021 •

edited

Loading

lvauvillier commented Oct 28, 2021 •

edited

Loading

lvauvillier commented Nov 2, 2021 •

edited

Loading

paveltiunov commented Nov 3, 2021 •

edited

Loading

lvauvillier commented Nov 3, 2021 •

edited

Loading

lvauvillier commented Nov 4, 2021 •

edited

Loading

lvauvillier commented Nov 10, 2021 •

edited

Loading