Skip to content

Cognition integration provider #302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 71 commits into
base: dev
Choose a base branch
from

Conversation

andhreljaKern
Copy link
Contributor

@andhreljaKern andhreljaKern commented May 15, 2025

This is the main PR.

Related PRs

New repository

Important

Retrieve:

Use dev-setup@cognition-integration-provider to run cognition-integration-provider (bash start -a -b cognition-integration-provider)

Tests

Tests were not developed for this container due to long running extraction and transformation tasks

Affected areas

  • dev-setup, deployment-cognition, deployment-managed-cognition
    • added a new internal container cognition-integration-provider
  • refinery-submodule-model
    • added 2 new cognition schema tables (integrations + access) and individual integrations tables (new integration schema)
  • cognition-task-master
    • added a new "INTEGRATION" task that runs "execute-integration" (delta loads)
  • admin-dashboard
    • added the ability to "assign" integrations to organizations
  • cognition-ui
    • added a new "Integrations" page next to ETL page
  • refinery-ui
    • added info tooltips that label refinery projects as integration created
  • refinery-gateway
    • alembic migrations
  • cognition-gateway
    • routes to cognition-integration-provider (CRUD, Sync, Execute, Check for Updates)
    • added support for creating Cognition projects from Integrations
  • cognition-integration-provider
    • new container with integration extraction/transformation logic

Performance

MP - multiprocessing (# workers)
SP - singleprocessing
image

@andhreljaKern
Copy link
Contributor Author

andhreljaKern commented Jun 24, 2025

I have two minio refinery projects though only one exists on the db. Not sure when this exactly happen but my best guess is on the integration deletion we aren't deleting the minio project. Maybe calling the endpoint for deletion to ensure we dont miss anything in the future (when "also delete refinery" is selected"

  • resolved

Do you mean calling s3.delete_object? I feel like this should be a part of the refinery-gateway delete-many-records endpoint, would like to hear your thoughts

I see what you mean, I added an internal endpoint to refinery-gateway. Currently, CognitionIntegration model cascades when a Project is deleted. Should we use SET_NULL instead? Due to CASCADE, if I call the refinery-gateway project-delete endpoint, it will automatically delete the integration

SET_NULL - YES

  • resolved

@andhreljaKern
Copy link
Contributor Author

andhreljaKern commented Jun 24, 2025

How can i update only the name / description?

[x] resolved

Update is blocked on cognition-ui because we don't allow same integration names. This isn't the case with update, i.e. the "Update" button shouldn't be blocked if the integration name remains the same.

  • resolved

andhreljaKern and others added 15 commits June 25, 2025 09:21
perf: rename internal delete endpoint
* Oidc identifier migrated to the users table

* Search for user only if there is not oidc identifier

* migrate

* model

* alembic merge

* model

* projects with access management

* add access management attribute

* deactivate mock up

* deactivate access management

* add groups/users to records

* fix smaller issues

* embeddings

* model

* update payloads

* alembic fix

* endpoint for sync internal

* model

* update logic

* error handling

* chore: update submodules

* chore: update submodules

* chore: update submodules

---------

Co-authored-by: LennartSchmidtKern <[email protected]>
Co-authored-by: andhreljaKern <[email protected]>
@JWittmeyer
Copy link
Member

JWittmeyer commented Jun 26, 2025

Ran the "test me" folder of the test sharepoint. it has ~ 140 documents. The project & records were created however the running_id seems to be created faulty:
image

  • resolved

@JWittmeyer
Copy link
Member

I am able to manually add users to the project record permissions even with auto sync (groups=> permission view => edit access => select user & records)

  • resolved

@JWittmeyer
Copy link
Member

Sync failed after changes were detected:
image

the database table for sharepoint also only has 1 record, not sure if before or after the sync was triggered
Initial sync worked without issues (besides the running id topic from a different note)

  • resolved

print(traceback.format_exc(), flush=True)
return ["tokenization failed"]
# TODO check if this is still needed for access management updates
if not only_access_management_update:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still needed?

@@ -331,6 +336,100 @@ def delete_records(
__delete_records(project_id, record_ids)


def sync_access_groups_and_users_sharepoint(project_id: str, integration_id: str, permissions_users: Dict[str, Any], record_ids: Optional[List[str]]) -> None:
try:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long try: body



def add_access_groups_or_users(project_id: str, record_ids: List[str], group_ids: Optional[List[str]] = None, user_ids: Optional[List[str]] = None) -> None:
try:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@@ -263,3 +263,15 @@ def check_user_exists(email: str) -> bool:
if i["traits"]["email"].lower() == email.lower():
return True
return False


def get_user_from_search(email: str) -> bool:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the existing function get_userid_from_mail be used to avoid code duplication?

prev_filter_attributes = embedding_item.filter_attributes or []
new_filter_attributes = list(set(prev_filter_attributes + filter_attributes))
embedding_item.filter_attributes = new_filter_attributes
general.commit()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit after the for loop? the if statement below also issues another commit

return project.get_all_with_access_management(organization_id)


def activate_access_management(project_id):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing

str(embedding_item.id),
record_ids=changed_records_ids if partial_update else None,
)
return errors
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing


except Exception as e:
print(traceback.format_exc(), flush=True)
return [str(e)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing

@@ -162,4 +162,20 @@ def __migrate_kratos_users():
if user_database.sso_provider != sso_provider:
user_database.sso_provider = sso_provider

if user_database.oidc_identifier is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is the oidc_identifier used?

@@ -70,6 +70,38 @@ def get_all_projects(request: Request) -> Dict:
return pack_json_result(projects)


# TO DO, some admin check should be added here
@router.get("/all-projects-with-access-management")
def get_all_projects_with_tokens(request: Request) -> Dict:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be mistaken, we don't usually add return typing in routes right?

@JWittmeyer
Copy link
Member

JWittmeyer commented Jun 27, 2025

I seem to be hanign an issue with the test me folder that an excel file is considered faulty:

image

  • resolved

With the error it didn't run through. maybe a try catch it missing somewhere to finalize the process if something doesn't work as expected

@JWittmeyer
Copy link
Member

JWittmeyer commented Jun 27, 2025

The documents libary sync button should have "Uses the certificate, client & tenant id to collect all available document libraries"
Also the button icon should be IconRefresh

image

  • resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants