Skip to content

PR1: Added pagination in the house table layer for get all tables/databases #320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 15, 2025

Conversation

abhisheknath2011
Copy link
Member

@abhisheknath2011 abhisheknath2011 commented May 10, 2025

Summary

Issue] Briefly discuss the summary of the changes made in this
pull request in 2-3 lines.

This PR adds pagination in the house table layer for get all tables/databases. Openhouse data access layer (house tables) uses Spring data JPA repository i.e. CrudRepository. However, CrudRepository does not support pagination and sorting capability. PagingAndSortingRepository which extends CrudRepository (prior to Spring 3.0) supports pagination. We are currently using spring boot 2.7.8. So by extending all our repositories to PagingAndSortingRepository we don't lose any of the existing behaviours and we gain paging and sorting capability on top of existing JPA repositories. JpaRepository which in turn extends PagingAndSortingRepository and comes with more advanced features which we don't need. So extending PagingAndSortingRepository is the best option for achieving pagination with sorting.

Backward compatibility:
We want to keep the existing APIs while pagination is ramped. API signature changes with pagination and so new APIs and methods are added to ensure that existing APIs work as expected.

Pagination:
A new API endpoint /v1/hts/tables/query is added for supporting pagination for user tables. The API accepts search parameters and pagination parameters. The pagination parameter includes page, size and sortedBy fields such as tableId,databaseId etc. Sorting is imported for pagination to work effectively. Unsorted data could lead to invalid results per page. All the pagination parameters have defaults if not provided. Page starts with 0 and the default value for page size is 50. Default sorting is provided on databaseId. All sorting order is ascending and sort order is not part of the API parameter. Sort order can be included in future if needed.

Pagination changes include only user tables in this PR. Pagination for jobs and feature toggle APIs are not part of this PR. They can be included in future if needed.

New HTS API
A new API called getPaginatedUserTables is added in the HTS layer with end point /v1/hts/tables/query. This API accepts search parameters and pagination parameters as mentioned above. This API returns a list of tables/databases depending on search parameters and the behaviour is same as exiting getUserTables API. The response object remains same as the existing object for existing API i.e. GetAllEntityResponseBody. A new parameter called pageResults is added to include pagination results. Pagination results are represented by Page (org.springframework.data.domain.Page). The pageResults contains contents (contains search results), pageable (for pagination related metadata), total pages, total elements and some additional parameters such as last to represent if the page is the last page.

Next PR:
The next PR would include pagination changes in the tables layer and internal catalog layer.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

Tested using local docker for HTS layer changes for pagination.

Pagination support for list databases:
There are 3 databases using existing list databases call

Existing list databases API call:

% curl -XGET -v 'http://localhost:8001/hts/tables/query' -H 'Content-Type: application/json' | jq

{
  "results": [
    {
      "tableId": null,
      "databaseId": "d1",
      "tableVersion": null,
      "metadataLocation": null,
      "storageType": null,
      "creationTime": null
    },
    {
      "tableId": null,
      "databaseId": "d2",
      "tableVersion": null,
      "metadataLocation": null,
      "storageType": null,
      "creationTime": null
    },
    {
      "tableId": null,
      "databaseId": "d3",
      "tableVersion": null,
      "metadataLocation": null,
      "storageType": null,
      "creationTime": null
    }
  ],
  "pageResults": null
}

List databases API using pagination
Query for 1st page:

% curl -XGET -v 'http://localhost:8001/v1/hts/tables/query?page=0&size=2' -H 'Content-Type: application/json' | jq

{
  "results": null,
  "pageResults": {
    "content": [
      {
        "tableId": null,
        "databaseId": "d1",
        "tableVersion": null,
        "metadataLocation": null,
        "storageType": null,
        "creationTime": null
      },
      {
        "tableId": null,
        "databaseId": "d2",
        "tableVersion": null,
        "metadataLocation": null,
        "storageType": null,
        "creationTime": null
      }
    ],
    "pageable": {
      "sort": {
        "empty": false,
        "sorted": true,
        "unsorted": false
      },
      "offset": 0,
      "pageNumber": 0,
      "pageSize": 2,
      "paged": true,
      "unpaged": false
    },
    "last": false,
    "totalPages": 2,
    "totalElements": 3,
    "size": 2,
    "number": 0,
    "sort": {
      "empty": false,
      "sorted": true,
      "unsorted": false
    },
    "first": true,
    "numberOfElements": 2,
    "empty": false
  }
}

Query for 2nd page:

% curl -XGET -v 'http://localhost:8001/v1/hts/tables/query?page=1&size=2' -H 'Content-Type: application/json' | jq

{
  "results": null,
  "pageResults": {
    "content": [
      {
        "tableId": null,
        "databaseId": "d3",
        "tableVersion": null,
        "metadataLocation": null,
        "storageType": null,
        "creationTime": null
      }
    ],
    "pageable": {
      "sort": {
        "empty": false,
        "sorted": true,
        "unsorted": false
      },
      "offset": 2,
      "pageNumber": 1,
      "pageSize": 2,
      "paged": true,
      "unpaged": false
    },
    "last": true,
    "totalPages": 2,
    "totalElements": 3,
    "size": 2,
    "number": 1,
    "sort": {
      "empty": false,
      "sorted": true,
      "unsorted": false
    },
    "first": false,
    "numberOfElements": 1,
    "empty": false
  }
}

Existing list tables API call:

% curl -XGET -v 'http://localhost:8001/hts/tables/query?databaseId=d1' -H 'Content-Type: application/json' | jq

{
  "results": [
    {
      "tableId": "t1",
      "databaseId": "d1",
      "tableVersion": "/data/openhouse/d1/t1-6da57fce-6eba-4471-bc52-4310b46137e8/00000-d6152239-5f89-4f0e-8dd8-f0d27bda9bf7.metadata.json",
      "metadataLocation": "/data/openhouse/d1/t1-6da57fce-6eba-4471-bc52-4310b46137e8/00000-d6152239-5f89-4f0e-8dd8-f0d27bda9bf7.metadata.json",
      "storageType": "hdfs",
      "creationTime": 1747076463360
    },
    {
      "tableId": "t2",
      "databaseId": "d1",
      "tableVersion": "/data/openhouse/d1/t2-6ed48b5e-98c1-46e1-8b73-abf608d9d85a/00000-907448a8-c81a-46e0-993e-161e17970ed3.metadata.json",
      "metadataLocation": "/data/openhouse/d1/t2-6ed48b5e-98c1-46e1-8b73-abf608d9d85a/00000-907448a8-c81a-46e0-993e-161e17970ed3.metadata.json",
      "storageType": "hdfs",
      "creationTime": 1747076490423
    },
    {
      "tableId": "t3",
      "databaseId": "d1",
      "tableVersion": "/data/openhouse/d1/t3-f9a507c6-f3c2-45e2-8cca-f82df7cb8ebb/00000-f9a4a0d5-a5c4-4652-8042-4e234223dbdd.metadata.json",
      "metadataLocation": "/data/openhouse/d1/t3-f9a507c6-f3c2-45e2-8cca-f82df7cb8ebb/00000-f9a4a0d5-a5c4-4652-8042-4e234223dbdd.metadata.json",
      "storageType": "hdfs",
      "creationTime": 1747076511381
    },
    {
      "tableId": "t4",
      "databaseId": "d1",
      "tableVersion": "/data/openhouse/d1/t4-7b32ea67-6bf9-41c9-ba97-92883779a1f2/00000-9aac909b-81ae-4384-a030-63de8d6d2436.metadata.json",
      "metadataLocation": "/data/openhouse/d1/t4-7b32ea67-6bf9-41c9-ba97-92883779a1f2/00000-9aac909b-81ae-4384-a030-63de8d6d2436.metadata.json",
      "storageType": "hdfs",
      "creationTime": 1747076529280
    },
    {
      "tableId": "t5",
      "databaseId": "d1",
      "tableVersion": "/data/openhouse/d1/t5-799ff9e9-518d-4385-befc-36b8c056a688/00000-3f73f6aa-bb87-44fc-94d6-a28ba47b41e0.metadata.json",
      "metadataLocation": "/data/openhouse/d1/t5-799ff9e9-518d-4385-befc-36b8c056a688/00000-3f73f6aa-bb87-44fc-94d6-a28ba47b41e0.metadata.json",
      "storageType": "hdfs",
      "creationTime": 1747076546654
    }
  ],
  "pageResults": null
}

List tables API using pagination
Query without specifying pagination params using new API uses default values:

% curl -XGET -v 'http://localhost:8001/v1/hts/tables/query?databaseId=d1' -H 'Content-Type: application/json' | jq

{
  "results": null,
  "pageResults": {
    "content": [
      {
        "tableId": "t1",
        "databaseId": "d1",
        "tableVersion": "/data/openhouse/d1/t1-6da57fce-6eba-4471-bc52-4310b46137e8/00000-d6152239-5f89-4f0e-8dd8-f0d27bda9bf7.metadata.json",
        "metadataLocation": "/data/openhouse/d1/t1-6da57fce-6eba-4471-bc52-4310b46137e8/00000-d6152239-5f89-4f0e-8dd8-f0d27bda9bf7.metadata.json",
        "storageType": "hdfs",
        "creationTime": 1747076463360
      },
      {
        "tableId": "t2",
        "databaseId": "d1",
        "tableVersion": "/data/openhouse/d1/t2-6ed48b5e-98c1-46e1-8b73-abf608d9d85a/00000-907448a8-c81a-46e0-993e-161e17970ed3.metadata.json",
        "metadataLocation": "/data/openhouse/d1/t2-6ed48b5e-98c1-46e1-8b73-abf608d9d85a/00000-907448a8-c81a-46e0-993e-161e17970ed3.metadata.json",
        "storageType": "hdfs",
        "creationTime": 1747076490423
      },
      {
        "tableId": "t3",
        "databaseId": "d1",
        "tableVersion": "/data/openhouse/d1/t3-f9a507c6-f3c2-45e2-8cca-f82df7cb8ebb/00000-f9a4a0d5-a5c4-4652-8042-4e234223dbdd.metadata.json",
        "metadataLocation": "/data/openhouse/d1/t3-f9a507c6-f3c2-45e2-8cca-f82df7cb8ebb/00000-f9a4a0d5-a5c4-4652-8042-4e234223dbdd.metadata.json",
        "storageType": "hdfs",
        "creationTime": 1747076511381
      },
      {
        "tableId": "t4",
        "databaseId": "d1",
        "tableVersion": "/data/openhouse/d1/t4-7b32ea67-6bf9-41c9-ba97-92883779a1f2/00000-9aac909b-81ae-4384-a030-63de8d6d2436.metadata.json",
        "metadataLocation": "/data/openhouse/d1/t4-7b32ea67-6bf9-41c9-ba97-92883779a1f2/00000-9aac909b-81ae-4384-a030-63de8d6d2436.metadata.json",
        "storageType": "hdfs",
        "creationTime": 1747076529280
      },
      {
        "tableId": "t5",
        "databaseId": "d1",
        "tableVersion": "/data/openhouse/d1/t5-799ff9e9-518d-4385-befc-36b8c056a688/00000-3f73f6aa-bb87-44fc-94d6-a28ba47b41e0.metadata.json",
        "metadataLocation": "/data/openhouse/d1/t5-799ff9e9-518d-4385-befc-36b8c056a688/00000-3f73f6aa-bb87-44fc-94d6-a28ba47b41e0.metadata.json",
        "storageType": "hdfs",
        "creationTime": 1747076546654
      }
    ],
    "pageable": {
      "sort": {
        "empty": false,
        "sorted": true,
        "unsorted": false
      },
      "offset": 0,
      "pageNumber": 0,
      "pageSize": 50,
      "paged": true,
      "unpaged": false
    },
    "last": true,
    "totalPages": 1,
    "totalElements": 5,
    "size": 50,
    "number": 0,
    "sort": {
      "empty": false,
      "sorted": true,
      "unsorted": false
    },
    "first": true,
    "numberOfElements": 5,
    "empty": false
  }
}

Query for 1st page:

% curl -XGET -v 'http://localhost:8001/v1/hts/tables/query?databaseId=d1&page=0&size=2' -H 'Content-Type: application/json' | jq

{
  "results": null,
  "pageResults": {
    "content": [
      {
        "tableId": "t1",
        "databaseId": "d1",
        "tableVersion": "/data/openhouse/d1/t1-6da57fce-6eba-4471-bc52-4310b46137e8/00000-d6152239-5f89-4f0e-8dd8-f0d27bda9bf7.metadata.json",
        "metadataLocation": "/data/openhouse/d1/t1-6da57fce-6eba-4471-bc52-4310b46137e8/00000-d6152239-5f89-4f0e-8dd8-f0d27bda9bf7.metadata.json",
        "storageType": "hdfs",
        "creationTime": 1747076463360
      },
      {
        "tableId": "t2",
        "databaseId": "d1",
        "tableVersion": "/data/openhouse/d1/t2-6ed48b5e-98c1-46e1-8b73-abf608d9d85a/00000-907448a8-c81a-46e0-993e-161e17970ed3.metadata.json",
        "metadataLocation": "/data/openhouse/d1/t2-6ed48b5e-98c1-46e1-8b73-abf608d9d85a/00000-907448a8-c81a-46e0-993e-161e17970ed3.metadata.json",
        "storageType": "hdfs",
        "creationTime": 1747076490423
      }
    ],
    "pageable": {
      "sort": {
        "empty": false,
        "sorted": true,
        "unsorted": false
      },
      "offset": 0,
      "pageNumber": 0,
      "pageSize": 2,
      "paged": true,
      "unpaged": false
    },
    "last": false,
    "totalPages": 3,
    "totalElements": 5,
    "size": 2,
    "number": 0,
    "sort": {
      "empty": false,
      "sorted": true,
      "unsorted": false
    },
    "first": true,
    "numberOfElements": 2,
    "empty": false
  }
}

Query for the 2nd page:

% curl -XGET -v 'http://localhost:8001/v1/hts/tables/query?databaseId=d1&page=1&size=2' -H 'Content-Type: application/json' | jq

{
  "results": null,
  "pageResults": {
    "content": [
      {
        "tableId": "t3",
        "databaseId": "d1",
        "tableVersion": "/data/openhouse/d1/t3-f9a507c6-f3c2-45e2-8cca-f82df7cb8ebb/00000-f9a4a0d5-a5c4-4652-8042-4e234223dbdd.metadata.json",
        "metadataLocation": "/data/openhouse/d1/t3-f9a507c6-f3c2-45e2-8cca-f82df7cb8ebb/00000-f9a4a0d5-a5c4-4652-8042-4e234223dbdd.metadata.json",
        "storageType": "hdfs",
        "creationTime": 1747076511381
      },
      {
        "tableId": "t4",
        "databaseId": "d1",
        "tableVersion": "/data/openhouse/d1/t4-7b32ea67-6bf9-41c9-ba97-92883779a1f2/00000-9aac909b-81ae-4384-a030-63de8d6d2436.metadata.json",
        "metadataLocation": "/data/openhouse/d1/t4-7b32ea67-6bf9-41c9-ba97-92883779a1f2/00000-9aac909b-81ae-4384-a030-63de8d6d2436.metadata.json",
        "storageType": "hdfs",
        "creationTime": 1747076529280
      }
    ],
    "pageable": {
      "sort": {
        "empty": false,
        "sorted": true,
        "unsorted": false
      },
      "offset": 2,
      "pageNumber": 1,
      "pageSize": 2,
      "paged": true,
      "unpaged": false
    },
    "last": false,
    "totalPages": 3,
    "totalElements": 5,
    "size": 2,
    "number": 1,
    "sort": {
      "empty": false,
      "sorted": true,
      "unsorted": false
    },
    "first": false,
    "numberOfElements": 2,
    "empty": false
  }
}

Query for the 3rd page:

% curl -XGET -v 'http://localhost:8001/v1/hts/tables/query?databaseId=d1&page=2&size=2' -H 'Content-Type: application/json' | jq

{
  "results": null,
  "pageResults": {
    "content": [
      {
        "tableId": "t5",
        "databaseId": "d1",
        "tableVersion": "/data/openhouse/d1/t5-799ff9e9-518d-4385-befc-36b8c056a688/00000-3f73f6aa-bb87-44fc-94d6-a28ba47b41e0.metadata.json",
        "metadataLocation": "/data/openhouse/d1/t5-799ff9e9-518d-4385-befc-36b8c056a688/00000-3f73f6aa-bb87-44fc-94d6-a28ba47b41e0.metadata.json",
        "storageType": "hdfs",
        "creationTime": 1747076546654
      }
    ],
    "pageable": {
      "sort": {
        "empty": false,
        "sorted": true,
        "unsorted": false
      },
      "offset": 4,
      "pageNumber": 2,
      "pageSize": 2,
      "paged": true,
      "unpaged": false
    },
    "last": true,
    "totalPages": 3,
    "totalElements": 5,
    "size": 2,
    "number": 2,
    "sort": {
      "empty": false,
      "sorted": true,
      "unsorted": false
    },
    "first": false,
    "numberOfElements": 1,
    "empty": false
  }
}

Query for the 4th Page (page results are empty as expected and last page flag is true):

% curl -XGET -v 'http://localhost:8001/v1/hts/tables/query?databaseId=d1&page=3&size=2' -H 'Content-Type: application/json' | jq

{
  "results": null,
  "pageResults": {
    "content": [],
    "pageable": {
      "sort": {
        "empty": false,
        "sorted": true,
        "unsorted": false
      },
      "offset": 6,
      "pageNumber": 3,
      "pageSize": 2,
      "paged": true,
      "unpaged": false
    },
    "last": true,
    "totalPages": 3,
    "totalElements": 5,
    "size": 2,
    "number": 3,
    "sort": {
      "empty": false,
      "sorted": true,
      "unsorted": false
    },
    "first": false,
    "numberOfElements": 0,
    "empty": true
  }
}

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

@abhisheknath2011 abhisheknath2011 changed the title Added pagination in the house table layer for get all tables/databases PR1: Added pagination in the house table layer for get all tables/databases May 12, 2025
@abhisheknath2011 abhisheknath2011 marked this pull request as ready for review May 12, 2025 22:34
Copy link
Collaborator

@jiang95-dev jiang95-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick and clean PR on this! Just some minor comments.

Copy link
Collaborator

@jiang95-dev jiang95-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@abhisheknath2011 abhisheknath2011 merged commit d81c346 into linkedin:main May 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants