Skip to content

GraphQL query Support #3433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lvauvillier opened this issue Sep 15, 2021 · 4 comments
Closed

GraphQL query Support #3433

lvauvillier opened this issue Sep 15, 2021 · 4 comments
Assignees
Labels
enhancement New feature proposal

Comments

@lvauvillier
Copy link
Contributor

lvauvillier commented Sep 15, 2021

Context

Is your feature request related to a problem? Please describe.
A previous issue was opened on this topic (#1765) but was closed due to query format design and ResultSet issue.

As we already run a working graphql API on top of cubeJS REST API on production, I wanted to share here the design API we choosed and how we solved it.

I hope this will start an official support of graphQL API.

Describe the solution you'd like
There is two possible solutions:

  • A graphql API on top of the existing REST API (the GraphQL API act as a "proxy")
  • A standalone graphql API (a new graphQLApiGateway module in the cubejs server core?)

We will here choose the first solution to benefit of existing optimization, security, query rewrite etc...

API design

Context

GraphQL APIs returns data in the same shape as they were requested. This constraint dont allow a perfect mapping to the existing REST api and its response.

⚠️ The ResultSet object will not be usable for data blending and has to be rewrited for graphQL responses. (Or provide a GraphQLResultSet object). This task is not hard to do, if we flatten the graphQL object, most of the existing part of the current ResultSet class can be reused.

Example

{
    events (
       filters: [
       {
            member: "publishedAt"
            operator: inDateRange
            values: ["2021-08-01", "2021-08-30"]
        }
    ]) {
        measures {
            count
            pageViewsCount
        }
        dimensions {
            country
        }
    }
}

Response

{
    "events": [{
        "measures": {
            "count": 145,
            "pageViewCount": 35
        },
        "dimensions": {
            "country": "US"
        } 
    },
    {
        "measures": {
            "count": 23,
            "pageViewCount": 9
        },
        "dimensions": {
            "country": "FR"
        } 
    },
    {
        "measures": {
            "count": 45,
            "pageViewCount": 12
        },
        "dimensions": {
            "country": "DE"
        } 
    }]
}

Spec

enum CubeFilterOperator {
  afterDate
  beforeDate
  contains
  equals
  gt
  gte
  inDateRange
  lt
  lte
  notContains
  notEquals
  notInDateRange
  notSet
  set
}

enum CubeOrder {
  asc
  desc
}

enum CubeGranularity {
  second
  minute
  hour
  day
  week
  month
  year
}

input CubeFilterInput {
  member: String!
  operator: CubeFilterOperator!
  values: [String]
  or: [CubeFilterInput!]
  and: [CubeFilterInput!]
}

type Root {
    <cubeName>(filters: [CubeFilterInput!], timezone: String, limit: Int, offset: Int, renewQuery: Boolean): [<CubeName>!]!
    ...
}

type <CubeName> {
    dimensions: <CubeName>Dimension!
    measures: <CubeName>Measure!
}

type <CubeName>Dimension {
    <dimensionName>(order: CubeOrder): String | SafeInt
    <timeDimensionName>(order: CubeOrder, granularity: CubeGranularity): DateTime
    ...
}

API Generation

Framework

We need to use a code-first graphql framework to automatically generate entities using the cube schema definition.

Example here of api generation using Data model (using nexus framework):
https://github.com/prisma/nexus-prisma
https://github.com/graphql-nexus/nexus-plugin-prisma

Implementation

Resolving steps

1 - For each at each <CubeName>, a query has to be generated. If more than one Cube are requested in the same graphQL query, we can use a DataLoader to blend generated queries at the end and make a unique REST API call.
2 - At <CubeName> level, we have only the arguments (filters, limit, offset, etc.). We need to walk through the children nodes and collect all measures, dimensions and granularity. To achieve this task we can use the info argument to get the graphQL query AST. Wee need to take care of graphQL directives (@if, @Skip, @include) to build a valid query.
3 - Send the query to the existing cubejs REST API, handle the "Continue Wait" response, handle errors and re-shape the result to match the graphQL tree

Custom resolvers

With the current api if we wanted to query by dimension (eg. by product) and if this dimension is stored as an id (productId), we will get only the productId in the result.

What if we wanted to display a nice graph with the product Name? We can add the name in the cube, but what if we need more data? this can lead to issues especially a waste of data if we use preaggregations.

We can use here the GraphQL capabilities to create a new Product entity and use custom resolvers.
the entity name and resolvers can be added to the cube schema definition. Only the productId will be used to resolve the entity.

Example:

{
    purchases {
        measures {
            count
        }
        dimensions {
            product {
                id
                name
            }
        }
    }
}
@rpaik rpaik added the enhancement New feature proposal label Sep 15, 2021
@paveltiunov
Copy link
Member

@lvauvillier Hey Luc! Thanks for posting this one! I'm curious if you have any ideas on how to handle long polling over some long periods of time? I see you propose to handle it server-side. What if the load balancer has a timeout for HTTP requests?

Mentioning @MattGson @tomsej here as participants of #1765.

@paveltiunov paveltiunov self-assigned this Sep 20, 2021
@lvauvillier
Copy link
Contributor Author

lvauvillier commented Sep 20, 2021

@paveltiunov this is a good question. For my current usage we assume that queries takes a reasonable time (our pre-aggregations covers 100% of possible dashboard queries).

We just handle the "continue wait" response using promises and delays.

This is the getCubeResults function we use in our resolvers:

export async function getCubeResults(
  query: CubeQuery,
  delay = 500
): Promise<any> {
  const token = createToken();
  const headers = { Authorization: `Bearer ${token}` };

  const url = `${process.env.CUBEJS_API}/cubejs-api/v1/load?${encode({
    query: JSON.stringify(query)
  })}`;

  const response = await fetch(url, { method: "GET", headers });
  if (!response.ok) {
    throw Error(`Error querying cubejs api: status ${response.status}`);
  }

  const json = await response.json();

  if (json.error === "Continue wait") {
    await new Promise((resolve) => setTimeout(resolve, delay));
    return getCubeResults(query, delay * 1.2); // increase delay
  }

  return json;
}

I dont think GraphQL is designed to handle long running queries.
The best practice for long running queries in GraphQL might be:
1 - register a task using a mutation and get a taskId
2 - poll task status (a subscription can be used)
3 - get results when status is "completed"

@lvauvillier
Copy link
Contributor Author

A first implementation is available (see #3555). All feedbacks are welcomed

@lvauvillier
Copy link
Contributor Author

New spec is available in PR #3555.
I close this issue

paveltiunov pushed a commit that referenced this issue Nov 16, 2021
* feat(gateway): Add GraphQL proxy

* Add missing hour granularity

* Add graphql as regular dependency

* Use apiGateway.load() instead of fetch() to get results

* New api design and filter argument

* Move granularity from args to fields

* Lint

* Non null members

* Use compilerApi instance to cache graphql schema

Fixes #3433
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature proposal
Projects
None yet
Development

No branches or pull requests

3 participants