Skip to content

feat(gateway): Add GraphQL endpoint #3555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 16, 2021

Conversation

lvauvillier
Copy link
Contributor

@lvauvillier lvauvillier commented Oct 18, 2021

This PR is an implementation of the GraphQL spec described in issue #3433

I think i covered all the main functionalities of the REST API.

Description

A new endpoint /cubejs-api/graphql is added in the gateway.
All security middlewares (checkAuth, queryRewrite, etc.) are still used.

In dev mode, graphiql UI is available.

How it works

  • GraphQL schema is automatically generated using the compilerApi.
  • GraphQL AST info are used to generate cubequery at root cube entity level
  • Results are remapped to the graphql query shape

Warning

Current implementation will raise an error for a continueWait response. Long polling is not implemented (and not possible with this API design).

Example

original query:

{
  "measures": ["Stories.count"],
  "dimensions": ["Stories.category"],
  "filters": [{
    "member": "Stories.isDraft",
    "operator": "equals",
    "values": ["No"]
  }],
  "timeDimensions": [{
    "dimension": "Stories.time",
    "dateRange": ["2015-01-01", "2015-12-31"],
    "granularity": "month"
  }],
  "limit": 100,
  "offset": 50,
  "order": {
    "Stories.time": "asc",
    "Stories.count": "desc"
  },
  "timezone": "America/Los_Angeles"
}

is equivalent to graphql query:

{
  load(limit: 100, offset: 50, timezone: "America/Los_Angeles") {
    stories(granularity: {time: month}, orderBy: { time: asc, count: desc }, where: {isDraft: {equals: "NO"}, time: {inDateRange: ["2015-01-01", "2015-12-31"]}}) {
      count
      category
      time
    }
  }
}

directives @skip and @include are supported:

query GetStories($byCategory: Boolean = false) {
  stories(granularity: { time: month }) {
      count
      category @include(if: $byCategory)
      time
  }
} 

@lvauvillier lvauvillier requested a review from a team as a code owner October 18, 2021 20:36
@github-actions github-actions bot added the pr:community Contribution from Cube.js community members. label Oct 18, 2021
@rpaik rpaik requested review from paveltiunov and ryanpei October 18, 2021 21:20
@codecov
Copy link

codecov bot commented Oct 18, 2021

Codecov Report

Merging #3555 (3d49ea9) into master (de7471d) will decrease coverage by 1.37%.
The diff coverage is 5.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3555      +/-   ##
==========================================
- Coverage   59.67%   58.30%   -1.38%     
==========================================
  Files         125      126       +1     
  Lines       10014    10283     +269     
  Branches     2266     2340      +74     
==========================================
+ Hits         5976     5995      +19     
- Misses       3756     4002     +246     
- Partials      282      286       +4     
Impacted Files Coverage Δ
...ackages/cubejs-server-core/src/core/CompilerApi.js 71.62% <0.00%> (-4.10%) ⬇️
packages/cubejs-api-gateway/src/graphql.ts 4.83% <4.83%> (ø)
packages/cubejs-api-gateway/src/gateway.ts 67.37% <30.00%> (-0.68%) ⬇️
packages/cubejs-api-gateway/src/query.js 72.80% <0.00%> (-1.03%) ⬇️
packages/cubejs-server-core/src/core/server.ts 62.98% <0.00%> (ø)
packages/cubejs-server-core/src/core/DevServer.ts 8.46% <0.00%> (ø)
...kages/cubejs-api-gateway/src/SubscriptionServer.ts 6.77% <0.00%> (ø)
...y-orchestrator/src/orchestrator/PreAggregations.ts 83.88% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b23dda9...3d49ea9. Read the comment docs.

@paveltiunov
Copy link
Member

@lvauvillier Hey Luc! Thanks for contributing this one! It's a huge accomplishment! I'm still reviewing it. So far so good. It might be beneficial to add a couple of E2E tests with graphql requests to the cubejs-testing package. What do you think?


const json = await response.json();

if (json.error === 'Continue wait') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cube.js has a queue system that supports removing unused queries (orphaned). Continue wait (long polling) is used to detect that client still requires and waiting for a response from the server. I don't think that it's a blocker for this feature, but anyway it's a limitation.

cC @paveltiunov

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lvauvillier There's no easy solution to the long polling problem in graphql so for now I'd suggest just returning the error or any other type of response that means it needs to be fetched again. Workarounds to this problem would be using pre-aggregations, increasing Continue wait timeouts, and implementing retry logic client-side. Retry logic on the backend isn't necessary and it could be dropped as it basically handled by continueWaitTimeout (https://cube.dev/docs/config#queue-options). The reason long polling exists in Cube isn't that Cube itself can't wait for so long but that reverse proxies will usually drop connection much earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paveltiunov I now use apiGateway.load() to retrieve results. A Continue wait response will raise an error.

const headers = { ...originalReq.headers } as Record<string, string>;
['host', 'connexion', 'content-length'].forEach(key => delete headers[key]);

const response = await fetch(url, {
Copy link
Member

@ovr ovr Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, I think it's better to use a direct call for the load method from ApiGateway instead of creating a proxy.

We use a similar strategy for WS & SQL servers:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ovr, i agree, this was my first option, but i revert to a fetch call because i don't find how to deal with the continue wait issue.
I can see a hardcoded continueWait: true in the adapterApi.executeQuery() options. I didn't spend a lot of time to understand how the load api works and what triggers the continue wait response.

Copy link
Contributor Author

@lvauvillier lvauvillier Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ovr, i finally used theload() api and raise an error if i get a continue wait response.

@paveltiunov
Copy link
Member

@lvauvillier On a side note how would query with members from multiple cubes look like?

@lvauvillier
Copy link
Contributor Author

lvauvillier commented Oct 21, 2021

@paveltiunov There is no problem to query multiple cubes:

{
  cube1 {
    measures {
      count
    }
  }
  cube2 {
    measures {
      count
    }
  }
}

you will get a result:

{
  "cube1": [{ "measures": { "count": 10 }}],
  "cube2":  [{ "measures": { "count": 20 }}],
}

The query will be resolved using 2 distinct api calls.
We can add a dataloader (https://github.com/graphql/dataloader) to batch all cube resolvers using a single "multi" query.

@lvauvillier lvauvillier changed the title feat(gateway): Add GraphQL proxy feat(gateway): Add GraphQL endpoint Oct 21, 2021
@paveltiunov
Copy link
Member

@lvauvillier I mean what if we need to do a joined cubes query?

@lvauvillier
Copy link
Contributor Author

@paveltiunov good question. This is not possible with this design.
We need to remove the "cube" node level and put back prefixes to handle joins. Let me think about it.

@lvauvillier
Copy link
Contributor Author

lvauvillier commented Oct 22, 2021

@paveltiunov

This design can work (and is closer to the rest api):

{
  load(filters: ..., limit: ...) {
    measures {
      cube1 {
        count
      }
      cube2 {
        count
      }
    }
    dimensions {
      cube1 {
        country
      }
      cube2 {
        publishedAt(granularity: day) <-- timeDimension
      }
    }
}

But the result shape is complex...

A simpler design can also work as we can guess the type (dimension or measures) of each cube fields. (timeDimensions are already auto detected if we set a granularity arg to a dimension).

{
  load(filters: ..., limit: ...) {
    cube1 {
      count <-- measure
      country <-- dimension
    }
    cube2 {
      count <-- measure
      publishedAt(granularity: day) <-- timeDimension
    }
  }
}

Other point:

In this new design, filters are now global, so we need to put back the cube information in the filter member field.

Keeping graphqQL naming convention (camelCase) filters will look like this:

{
  load(filters: [{ member: "cube2.publishedAt", operator: inDateRange, values: ["2021-01-01", "2021-12-31"]}]) {
    cube1 {
      count <-- measure
      country <-- dimension
    }
    cube2 {
      count <-- measure
      publishedAt(granularity: day) <-- timeDimension
    }
  }
}

I dont like this syntax and the "dot" prevents the usage of a generated graphql enum.

Maybe we can use here:

  • cube2PublishedAt: looks better, but not collision safe
  • cube2_publishedAt: i'm not sure if underscore are allowed in cube names, if yes it is not collision safe too...
  • a new cube field : { cube: "cube2", member: "publishedAt", operator: ... }, but this also prevent the usage of an enum as cube and member becomes dependent values.

What do you think?

@paveltiunov
Copy link
Member

@lvauvillier Yeah. Cool! This nested cube approach seems to be solving an issue. Regarding . usage: do you think we might consider typed filter object very similar to https://hasura.io/docs/latest/graphql/core/databases/postgres/queries/query-filters.html#the-where-argument ?

@lvauvillier
Copy link
Contributor Author

lvauvillier commented Oct 28, 2021

@paveltiunov I also like the idea to have a filtering argument at both query and member fields level. For nested filters we need to use the simpler design to get rid of the unnecessary members / dimensions level.

This also means that:

  • We have to generate a "where" Input type for each cube.
  • We can limit allowed operators using the member type

I'll give a try this weekend.

@lvauvillier
Copy link
Contributor Author

lvauvillier commented Nov 2, 2021

@paveltiunov I just push an update and it works nicely :)

The new filters works great with autocompletion and we can now join cubes.

Filters are autogenerated based on the member type with some changes from the REST API (heavily inspired from hasura and prisma filtering api):

  • it accepts number values for number types (instead of string values everywhere)
  • notSet is removed in favor of { set: false }
  • equals and notEquals accept a single value, but we have now inand notIn for multiple values
  • AND and OR fields for boolean operator
  • filters can be set at load level or <cubeName> level. Using load level allow user to use boolean operator accros multiple cubes.

Example:

{
  load(offset: 20, limit: 5, where: {users: {firstName: {notEquals: "Golden"}}}) {
    orders(granularity: {createdAt: day}, where: {OR: [{createdAt: {afterDate: "2021-01-01"}}, {status: {equals: "shipped"}}]}, orderBy: {createdAt: asc}) {
      count
      status
      createdAt
    }
    users {
      firstName
    }
  }
}

will generate:

{
  "measures": [
    "Orders.count"
  ],
  "dimensions": [
    "Orders.status",
    "Users.firstName"
  ],
  "timeDimensions": [
    {
      "dimension": "Orders.createdAt",
      "granularity": "day"
    }
  ],
  "order": {
    "Orders.createdAt": "asc"
  },
  "limit": 5,
  "offset": 20,
  "filters": [
    {
      "member": "Users.firstName",
      "operator": "notEquals",
      "values": [
        "Golden"
      ]
    },
    {
      "or": [
        {
          "member": "Orders.createdAt",
          "operator": "afterDate",
          "values": [
            "2021-01-01"
          ]
        },
        {
          "member": "Orders.status",
          "operator": "equals",
          "values": [
            "shipped"
          ]
        }
      ]
    }
  ]
}

New graphQL spec:

input DateTimeFilter {
  afterDate: String
  beforeDate: String
  equals: String
  in: [String]
  inDateRange: [String]
  notEquals: String
  notIn: [String]
  notInDateRange: [String]
  set: Boolean
}

input FloatFilter {
  contains: Float
  equals: Float
  in: [Float]
  notContains: Float
  notEquals: Float
  notIn: [Float]
  set: Boolean
}

input StringFilter {
  contains: String
  equals: String
  in: [String]
  notContains: String
  notEquals: String
  notIn: [String]
  set: Boolean
}

enum Granularity {
  day
  hour
  minute
  month
  second
  week
  year
}

enum OrderBy {
  asc
  desc
}

type Query {
  load(limit: Int, offset: Int, renewQuery: Boolean, timezone: String, where: RootWhereInput): [Result!]!
}

type Result {
  <cubeName>(granularity: <CubeName>GranularityInput, orderBy: <CubeName>OrderByInput, where: <CubeName>WhereInput): <CubeName>Members!
  ...
}

type <CubeName>Members {
  <member>: String | Float | DateTime
  ...
}

input RootWhereInput {
  AND: [RootWhereInput!]
  OR: [RootWhereInput!]
  <cubeName>: <CubeName>WhereInput
  ...
}

input <CubeName>WhereInput {
  AND: [<CubeName>WhereInput!]
  OR: [<CubeName>WhereInput!]
  <member>: FloatFilter | DateTimeFilter | StringFilter
  ...
}

input <CubeName>GranularityInput {
 <timeMember>: Granularity
}

input <CubeName>OrderByInput {
 <member>: OrderBy
 ...
}

@lvauvillier lvauvillier requested a review from ovr November 2, 2021 09:20
@paveltiunov
Copy link
Member

paveltiunov commented Nov 3, 2021

@lvauvillier Looks really good and it seems to play nicely overall! What do you think about putting granularity inside the time dimension field like this?

{
  load(offset: 20, limit: 5, where: {users: {firstName: {notEquals: "Golden"}}}) {
    orders(where: {OR: [{createdAt: {afterDate: "2021-01-01"}}, {status: {equals: "shipped"}}]}, orderBy: {createdAt: asc}) {
      count
      status
      createdAt { day }
    }
    users {
      firstName
    }
  }
}

The reason for it is generally speaking you can query multiple granularities of the same time dimension. It's rare but still valid use case.

Could you please also rebase so we can run the test and be ready to merge?

@lvauvillier
Copy link
Contributor Author

lvauvillier commented Nov 3, 2021

@paveltiunov If we add granularity subfields to time dimensions, we need to keep the ability use the time dimension without granularity.

We can add a new raw field to achieve this (i'm not sure of the name)

With granularity

{
  load(offset: 20, limit: 5, where: {users: {firstName: {notEquals: "Golden"}}}) {
    orders(where: {OR: [{createdAt: {afterDate: "2021-01-01"}}, {status: {equals: "shipped"}}]}, orderBy: {createdAt: asc}) {
      count
      status
      createdAt {
        day
      }
    }
    users {
      firstName
    }
  }
}

Without granularity

{
  load(offset: 20, limit: 5, where: {users: {firstName: {notEquals: "Golden"}}}) {
    orders(where: {OR: [{createdAt: {afterDate: "2021-01-01"}}, {status: {equals: "shipped"}}]}, orderBy: {createdAt: asc}) {
      count
      status
      createdAt {
        raw
      }
    }
    users {
      firstName
    }
  }
}

I also think about using a union type, but scalars are not allowed: we can't use a simple DateTime | Granularity.
Union types also require to use fragment notation in the query:

createdAt {
  ...on Granularity {
    day
  }
}

More info on this issue: graphql/graphql-spec#215

@paveltiunov
Copy link
Member

@lvauvillier Yeah. This is a really great point! raw might work. Other way is to have explicit dimensions resolver like this to avoid reserved names introduction:

{
  load(offset: 20, limit: 5, where: {users: {firstName: {notEquals: "Golden"}}}) {
    orders(where: {OR: [{createdAt: {afterDate: "2021-01-01"}}, {status: {equals: "shipped"}}]}, orderBy: {createdAt: asc}) {
      count
      status
      dimensions {
        createdAt
      }
    }
    users {
      firstName
    }
  }
}

@paveltiunov
Copy link
Member

@lvauvillier I guess we can go with raw for now and see how it progresses over time. Could you please add it? Thanks!

@lvauvillier
Copy link
Contributor Author

lvauvillier commented Nov 4, 2021

@paveltiunov I just pushed a new version. I finally chose the name value instead of raw.
It now works with multiple granularities at the same time.

I also added the missing quarter granularity and add a nonNull constraint to all members.

New graphQL spec:

input DateTimeFilter {
  afterDate: String
  beforeDate: String
  equals: String
  in: [String]
  inDateRange: [String]
  notEquals: String
  notIn: [String]
  notInDateRange: [String]
  set: Boolean
}

input FloatFilter {
  contains: Float
  equals: Float
  in: [Float]
  notContains: Float
  notEquals: Float
  notIn: [Float]
  set: Boolean
}

input StringFilter {
  contains: String
  equals: String
  in: [String]
  notContains: String
  notEquals: String
  notIn: [String]
  set: Boolean
}

type TimeDimension {
  day: DateTime!
  hour: DateTime!
  minute: DateTime!
  month: DateTime!
  quarter: DateTime!
  second: DateTime!
  value: DateTime!
  week: DateTime!
  year: DateTime!
}

enum OrderBy {
  asc
  desc
}

type Query {
  load(limit: Int, offset: Int, renewQuery: Boolean, timezone: String, where: RootWhereInput): [Result!]!
}

type Result {
  <cubeName>(orderBy: <CubeName>OrderByInput, where: <CubeName>WhereInput): <CubeName>Members!
  ...
}

type <CubeName>Members {
  <member>: String! | Float!
  <timeMember>: TimeDimension!
  ...
}

input RootWhereInput {
  AND: [RootWhereInput!]
  OR: [RootWhereInput!]
  <cubeName>: <CubeName>WhereInput
  ...
}

input <CubeName>WhereInput {
  AND: [<CubeName>WhereInput!]
  OR: [<CubeName>WhereInput!]
  <member>: FloatFilter | DateTimeFilter | StringFilter
  ...
}

input <CubeName>OrderByInput {
 <member>: OrderBy
 ...
}

@paveltiunov
Copy link
Member

@lvauvillier We're almost there! The small thing I noted is you introduced cache for the schema. If we cache it we should respect multi-tenancy. To not withhold merging of this big PR we can consider either removing caching (I realize it won't be suitable for production workload yet) or storing it as a part of CompilerApi. You can introduce a setter for that. Should be an easy fix.

@lvauvillier
Copy link
Contributor Author

lvauvillier commented Nov 10, 2021

@paveltiunov Oh, you are right, schemas can de dynamic if a repositoryFactory is defined.
CompilerApi class has now a getter/setter to cache the graphql schema.

It also raises the point of extending graphiQL UI to add the ability to define security contexts like in the playground. This can be part of a new PR.

@paveltiunov
Copy link
Member

@lvauvillier Hey Luc! Looks great! Thanks again for such a big effort!

@paveltiunov paveltiunov merged commit ed85192 into cube-js:master Nov 16, 2021
@lvauvillier lvauvillier deleted the graphql-api branch December 7, 2021 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr:community Contribution from Cube.js community members.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants