Skip to content

StatusPage: Who should have access? #2304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MattIPv4 opened this issue Apr 22, 2020 · 5 comments
Closed

StatusPage: Who should have access? #2304

MattIPv4 opened this issue Apr 22, 2020 · 5 comments
Labels

Comments

@MattIPv4
Copy link
Member

MattIPv4 commented Apr 22, 2020

Main issue: #2265

Creating a dedicated issue to house discussion around who should have access to the status page either through the control panel provided by StatusPage or through a custom solution using their API.

Ref: #2265 (comment), #2265 (comment), #2299 (comment), #2299 (comment) & #2299 (comment)

My baseline opinion is that as many people as possible should have access to create an incident with as little red tape as possible to block them getting there so that if something does go wrong, we can quickly & easily create an incident to communicate what's happening with the wider community.

I think there are a couple of overarching options here:

1. Use the StatusPage control panel

This limits how many folks we can give access, as our StatusPage plan gives us 25 email addresses we can add (two already in use currently for myself & ops@jsf). Do note though that these emails can be individuals, or if needed could be used for shared logins.

In addition, anyone who has access does have access to everything on the control panel (changing design, changing components, managing incidents, seeing who is subscribed, etc.).

However, this has the massive advantage of being able to use the StatusPage UI directly rather than having to emulate it for folks to use elsewhere.

Within this, I see two immediate solutions, both with an org team behind them:

a. Have an org team for membership, with an administrator that manually adds/remove people on the StatusPage account. This is definitely the simplest solution of them all.

b. Have an org team for membership, use a custom script to interface with the StatusPage API to automate the additional & removal of folks from the StatusPage account.

2. Emulate StatusPage control in a repository

A few different variants of this have been suggested, each with different access implications.

Any implementation here would need to have a way to emulate all of the following parts of incident management on StatusPage effectively:

  • Incident title (creating, changing)
  • Incident status (setting, updating [investigating, identified, monitoring, resolved]
  • Incident severity (updating)
  • Update message (during creation & each subsequent update)
  • Components affected (for each update message)
    • Severity of each affected component (operational, degraded performance, partial outage, major outage, under maintenance)
  • Notifications to send (subscribers [global & component-specific], tweet)

With this strategy, there is also the big question of what happens if something goes wrong with the custom implementation. Who still has access to the control panel to fix a broken incident? What's the timeframe on this, as we'd essentially be miscommunicating an incident to the wider community until it was fixed?

a. Issues in the main node.js repository to power

Micheal suggested:

Ideally I think what would work best is if it was based on an issue in the node.js repo (for greatest visibility) and then being approved by two approvals from Node.js collaborators. We trust our collaborators to push code so it quite likely makes sense to trust them with decide what is a reportable incident and when it is resolved. Even better if after the approvals simply adding a tag (which any collaborator can do) would result in the incident being pushed to the status page. If this could not be automated, it could be done manually to start.

Sam suggested:

Ideally, a GH team would be allowed to use as an auth source, or even better, an issue in a specific repo would be enough to drive status page changes (with repo access controlled by a team).

To implement something like this, we'd likely want to use something like YAML frontmatter in issue comments to control what everything is set to on the incident. Labels could be used to control overall incident severity & status, though this would be harder for component severity.

Using labels to act as approval before it gets posted on the status page provides the only easy security to limit who can post to the page, requiring someone with maintain access to that repo to add the label to the issue.

Using a major repository, such as node.js, would give basically every collaborator there permission to post to the status page, which whilst incredibly useful might not be desired as this isn't a dedicated form of access for status page.

b. Dedicated repository for status page administration

The alternative to using issues in an existing repository would be to create a new repository that is used just too controlling the status page.

This would allow for a dedicated team to be created (with lots of members) that has access to the repo to be able to post incidents to the status page.

Using a dedicated repository also gives more options with how exactly the repository will integrate with the StatusPage API.

Maybe instead of an issue, each incident becomes a folder in the repo, with each update being a file in that repo. Then, PRs and approvals can be used to ensure incidents & updates are approved before being merged and posted.

I welcome all thoughts and feedback on how access to StatusPage should be configured and how we should post incidents & updates to the page.

@AshCripps
Copy link
Member

Can we use both? so for example any collaborator can mark an incident with a label (or some other implemenation) and then members of build are given access to the control panel to be able to manage incidents etc.

@MattIPv4
Copy link
Member Author

I think that makes sense, so the issue essentially becomes a request for someone with access to make the incident on StatusPage? So there'd be no need for API integration, as a human with access would actually create the incident?

@sam-github
Copy link
Contributor

^--- yes.

Which requires a group of humans to volunteer to monitor for such issues and be responsible for updating the status page.

And given the issue tracker notifications are a fire hose and its inpractical (and cruel) to expect anybody to monitor them for issues of a specfic label, perhaps a Status Page WG should be setup.
The status stuff here could be moved over into it, and the people in the Status Page WG can subscribe to notifications on its issue tracker, which should be less fire hosy. <-- Just a suggestion as to how it might work.

@MylesBorins You should weigh in on how you see this working from a process point of view.

@MattIPv4
Copy link
Member Author

As an outsider with no idea what process for all this looks like, this certainly sounds like a great idea. As long as we can find 20-25 humans that are willing to monitor for such issues and provide reliable 24/7 coverage, I think having a dedicated repo where anyone can make an incident request makes sense. I also like that it gives a dedicated home for all the other status page resources currently in the PR here.

@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants