Skip to content

Commit 05e03ca

Browse files
committed
Use port 3000 instead of 3001 for Grafana
Signed-off-by: Stefan Büringer [email protected]
1 parent 7a71f4b commit 05e03ca

File tree

5 files changed

+8
-8
lines changed

5 files changed

+8
-8
lines changed

Tiltfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -447,7 +447,7 @@ def deploy_observability():
447447

448448
if "grafana" in settings.get("deploy_observability", []):
449449
k8s_yaml(read_file("./.tiltbuild/yaml/grafana.observability.yaml"), allow_duplicates = True)
450-
k8s_resource(workload = "grafana", port_forwards = "3001:3000", extra_pod_selectors = [{"app": "grafana"}], labels = ["observability"], objects = ["grafana:serviceaccount"])
450+
k8s_resource(workload = "grafana", port_forwards = "3000:3000", extra_pod_selectors = [{"app": "grafana"}], labels = ["observability"], objects = ["grafana:serviceaccount"])
451451

452452
if "prometheus" in settings.get("deploy_observability", []):
453453
k8s_yaml(read_file("./.tiltbuild/yaml/prometheus.observability.yaml"), allow_duplicates = True)

docs/book/Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ BOOK_DEPS := runtimesdk-yaml
5050

5151
.PHONY: serve
5252
serve: $(MDBOOK) $(TABULATE) $(EMBED) $(RELEASELINK) runtimesdk-yaml
53-
$(MDBOOK) serve
53+
$(MDBOOK) serve -p 3001
5454

5555
.PHONY: build
5656
build: $(MDBOOK) $(TABULATE) $(EMBED) $(RELEASELINK) runtimesdk-yaml

docs/book/src/developer/core/logging.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ extra_args:
170170
```
171171
The above options can be combined with other settings from our [Tilt](tilt.md) setup. Once Tilt is up and running with these settings users will be able to browse logs using the Grafana Explore UI.
172172
173-
This will normally be available on `localhost:3001`. To explore logs from Loki, open the Explore interface for the DataSource 'Loki'. [This link](http://localhost:3001/explore?datasource%22:%22Loki%22) should work as a shortcut with the default Tilt settings.
173+
This will normally be available on `localhost:3000`. To explore logs from Loki, open the Explore interface for the DataSource 'Loki'. [This link](http://localhost:3000/explore?datasource%22:%22Loki%22) should work as a shortcut with the default Tilt settings.
174174

175175
### Example queries
176176

docs/book/src/developer/core/testing.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -321,7 +321,7 @@ analyzing them via Grafana.
321321
* GCS path: `gs://kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api/6189/pull-cluster-api-e2e-main/1496954690603061248`
322322
* Local folder: `./_artifacts`
323323
4. Now the logs are available:
324-
* via [Grafana](http://localhost:3001/explore)
324+
* via [Grafana](http://localhost:3000/explore)
325325
* via [Loki logcli](https://grafana.com/docs/loki/latest/getting-started/logcli/)
326326
```bash
327327
logcli query '{app="capi-controller-manager"}' --timezone=UTC --from="2022-02-22T10:00:00Z"

docs/book/src/developer/core/tuning.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -56,18 +56,18 @@ Once you know the scenario you are looking at and what you are tuning for, you c
5656

5757
Among the many possible strategies, one usually very effective is to look at the KPIs you are aiming for, and then, if the current system performance is not good enough, start looking at other metrics trying to identify the biggest factor that is impacting the results. Usually by removing a single, performance bottleneck the behaviour of the system changes in a significant way; after that you can decide if the performance is now good enough or you need another round of tuning.
5858

59-
Let's try to make this more clear by using an example, *machine provisioning time is degrading when running CAPI at scale* (machine provisioning time can be seen in the [Cluster API Performance dashboard](http://localhost:3001/d/b2660352-4f3c-4024-837c-393d901e6981/cluster-api-performance?orgId=1)).
59+
Let's try to make this more clear by using an example, *machine provisioning time is degrading when running CAPI at scale* (machine provisioning time can be seen in the [Cluster API Performance dashboard](http://localhost:3000/d/b2660352-4f3c-4024-837c-393d901e6981/cluster-api-performance?orgId=1)).
6060

6161
When running at scale, one of the first things to take care of is the client-go rate limiting, which is a mechanism built inside client-go that prevents a Kubernetes client from being accidentally too aggressive to the API server.
6262
However this mechanism can also limit the performance of a controller when it actually requires to make many calls to the API server.
6363

64-
So one of the first data point to look at is the rate limiting metrics; given that upstream CR doesn't have metric for that we can only look for logs containing "client-side throttling" via [Loki](http://localhost:3001/explore) (Note: this link should be open while tilt is running).
64+
So one of the first data point to look at is the rate limiting metrics; given that upstream CR doesn't have metric for that we can only look for logs containing "client-side throttling" via [Loki](http://localhost:3000/explore) (Note: this link should be open while tilt is running).
6565

6666
If rate limiting is not your issue, then you can look at the controller's work queue. In an healthy system reconcile events are continuously queued, processed and removed from the queue. If the system is slowing down at scale, it could be that some controllers are struggling to keep up with the events being added in the queue, thus leading to slowness in reconciling the desired state.
6767

68-
So then the next step after looking at rate limiting metrics, is to look at the "work queue depth" panel in the [Controller-Runtime dashboard](http://localhost:3001/d/abe29aa7-e44a-4eef-9474-970f95f08ee6/controller-runtime?orgId=1).
68+
So then the next step after looking at rate limiting metrics, is to look at the "work queue depth" panel in the [Controller-Runtime dashboard](http://localhost:3000/d/abe29aa7-e44a-4eef-9474-970f95f08ee6/controller-runtime?orgId=1).
6969

70-
Assuming that one controller is struggling with its own work queue, the next step is to look at why this is happening. It might be that the average duration of each reconcile is high for some reason. This can be checked in the "Reconcile Duration by Controller" panel in the [Controller-Runtime dashboard](http://localhost:3001/d/abe29aa7-e44a-4eef-9474-970f95f08ee6/controller-runtime?orgId=1).
70+
Assuming that one controller is struggling with its own work queue, the next step is to look at why this is happening. It might be that the average duration of each reconcile is high for some reason. This can be checked in the "Reconcile Duration by Controller" panel in the [Controller-Runtime dashboard](http://localhost:3000/d/abe29aa7-e44a-4eef-9474-970f95f08ee6/controller-runtime?orgId=1).
7171

7272
If this is the case, then it is time to start looking at traces, looking for the longer spans in average (or total). Unfortunately traces are not yet implemented in Cluster API, so alternative approaches must be used, like looking at condition transitions or at logs to figure out what the slowest operations are.
7373

0 commit comments

Comments
 (0)