Skip to content

Commit cca894c

Browse files
billburton67garethoraclearjunupadhyay90
authored
Pull Request to Merge WMS ID 11635 updates. (oracle-livelabs#577)
* Initial Setup of ahf-24-insights Initial Setup of ahf-24-insights * Update intro.md * Update intro.md * Moved initial OCW24 folder work to main ahf folder * Update for intro .. * Update intro.md Gareths test * remove GC test * Initial upload of lab2 * Lab 2 Initial Completion * Initial checkin for Lab5 Still some work and cleanup to do. Also set anaonymous hostnames in other files. * Added comman AHF commands per the AHF Labs slides * Added Lab 8 to generate an Insights report * Lab 9 - First Upload Overview , System Toplogy Sections * Initial Lab 6 with Auto only. * More diagcollect * Fix images for lab6 * Screenshots and New Insights data Screenshots and New Insights data * Insights Tasks Write Up Insights Tasks Write Up New Insights reports * More work on Lab 6-8 and cleanup Did not complete 6 to 8 as hasd env issues but need to get merged, and will complete then * Additional Insights Comments and Screenshots Additional Insights Comments and Screenshots --------- Co-authored-by: garethoracle <[email protected]> Co-authored-by: Gareth Chapman <[email protected]> Co-authored-by: arjunupadhyay90 <[email protected]>
1 parent fdca1a1 commit cca894c

File tree

100 files changed

+816
-106
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+816
-106
lines changed
+70-13
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,90 @@
11
# AHF Common Commands
22

33
## Introduction
4-
TODO
54

6-
Estimated Lab Time: 20 Minutes
5+
Welcome to the "Try out some commonly used AHF commands" lab.
6+
7+
In this lab you will be guided through various common AHF tasks.
8+
9+
Estimated Lab Time: 5 Minutes
710

811
### Prerequisites
9-
- TODO 1
10-
- TODO 2
12+
- You are connected to one of the DB System Nodes as described in Lab 1: Connect to your DB System
13+
- You have performed the tasks to generate some incidents as described in Lab 5: Generate Database and Clusterware Incidents for AHF to Detect and take Action on
14+
15+
16+
## Task 1: Common post installation configuration tasks
17+
18+
1. Configure notification of compliance results and critical event notification:
19+
20+
After configuring and email address for notifications you will receive Orachk/Exachk reports to that address and notification of any
21+
automatic diagnostic collections that are completed.
22+
```
23+
<copy>
24+
ahfctl set [email protected]
25+
</copy>
26+
```
27+
>Note: You may also need to set up smtp server
28+
29+
2. Configure MOS (My Oracle Support) upload:
1130
31+
Once you have configured an upload name for MOS upload you can use that to upload a manual diagnostic collection to a specific SR
32+
```
33+
<copy>
34+
ahfctl setupload –name mos_config –type https –url https://transport.oracle.com/upload/issue -proxy www-proxy.acme.com:80 -user [email protected] -password
35+
</copy>
36+
```
37+
> You can now add the **-upload mos_config -sr <mysrnumber>** to a `tfactl diagcollect` command to upload the collection directly upon completion.
38+
1239
13-
### About AHF Command Line Interfaces
40+
3. Configure Auto Upgrade:
41+
```
42+
<copy>
43+
ahfctl setupgrade –swstage /mysharedlocation/ahf_upgrade –autoupgrade on –frequency 30 –upgradetime 00:15
44+
</copy>
45+
```
1446
15-
## Task 1: TODO
16-
1. TODO
17-
2. TODO
18-
## Task 2: TODO
47+
4. Configure storage cells for diagnostic collections and compliance checks
48+
```
49+
<copy>
50+
tfactl cell configure
51+
</copy>
52+
```
1953
20-
1. TODO
54+
## Task 2: Check resource limits
2155
56+
1. Check AHF resource limits
2257
23-
2. TODO
58+
On Linux systems AHF can restrict the CPU and Memory usage of the TFAMain process and it's children using `cgroups`
59+
You can check the limits using:-
60+
```
61+
<copy>
62+
ahfctl getresourcelimit
63+
</copy>
64+
```
2465
66+
## Task 3: Proactively run health checks
2567
68+
1. Change critical checks to run at 8am every Monday and Thursday:
69+
As previously noted AHF sets up some default compliance run schedules.
70+
You can change these with `ahfctl compliance`
71+
```
72+
<copy>
73+
ahfctl compliance –id exachk.autostart_client_exatier1 –set “AUTORUN_SCHEDULE=* 8 * * 1,4”
74+
</copy>
75+
```
2676
77+
2. Run compliance checks on-demand for only the Database Administrator (DBA) Checks.
78+
Compliance can also be run on demand with `ahfctl compliance`
79+
```
80+
<copy>
81+
ahfctl compliance -profile dba
82+
</copy>
83+
```
2784
2885
You may now *proceed to the next lab*.
2986
3087
## Acknowledgements
31-
* **Authors** - Troy Anthony, Bill Burton
32-
* **Contributors** -
88+
* **Authors** - Bill Burton
89+
* **Contributors** - Troy Anthony, Gareth Chapman
3390
* **Last Updated By/Date** - Bill Burton, July 2024
+216-11
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,225 @@
1-
# AHF Diagcollect Commands
1+
2+
# AHF Incident Diagnostic Collections
23

34
## Introduction
4-
TODO
5+
Welcome to the "AHF Incident Diagnostic Collections" lab. In this lab you will learn about AHF diagnostic collections and then be guided through
6+
viewing and generating AHF Diagnostic Collections. First we will check that AHF knows about the Incidents you generated in the previous labs
7+
and then learn how to check diagnostic collections for those Incidents.
58

6-
Estimated Lab Time: 20 Minutes
9+
Estimated Lab Time: 10 Minutes
710

811
### Prerequisites
9-
- TODO 1
10-
- TODO 2
12+
- You are connected to one of the DB System Nodes as described in Lab 1: Connect to your DB System
13+
- You have performed the tasks to generate some incidents as described in Lab 5: Generate Database and Clusterware Incidents for AHF to detect and take action on.
1114

15+
### Objectives
1216

13-
### About AHF Command Line Interfaces
17+
In this lab, you will:
18+
* Understand the different types of AHF Diagnostic collections.
19+
* Determine if AHF has detected Incidents and has already taken Diagnostic Collections Automatically
20+
* Take manual collections for any type of Incident
21+
* Review Collection contents
22+
* Confirm the health of AHF on the System
23+
24+
### About AHF Diagnostic Collection Options
25+
AHF has 4 basic types of Incident Diagnostic Collections:-
26+
* Automatic, based on a limited set of detected Incidents.
27+
* Internal Errors, Node and Instance Evictions, hangs
28+
* Manual, based on a specific incident type Support Request Driven Collection (SRDC)
29+
* Manual, based on a looking for issues in a time range (Problem Chooser)
30+
* Manual, based on a time range and component (CRS, RDBMS,...)
31+
> Note: Bypassing problem chooser and using long collection times for multiple components can lead to very large collections.
32+
33+
34+
## Task 1: Learn about Automatic Diagnostic Collections
35+
![Automatic Diagnostic Collections](./images/auto_collections.png =40%x*)
36+
37+
AHF monitors various system logs to determine if critical errors are being generated.
38+
If one of the monitored errors is seen then it will prepare to start an automatic diagnostic collection.
39+
AHF determines what needs to be collected for the specific Incident and gather that data for all nodes if required.
40+
All collections are copied back to the initiating node ready for analysis or upload to Oracle Support.
41+
42+
## Task 2: Review Automatic Diagnostic Collection for Lab 5 Incidents
43+
44+
1. Use the `tfactl get` command to check auto collection was enabled (ON).
45+
```
46+
<copy>
47+
tfactl get autodiagcollect
48+
</copy>
49+
```
50+
Command Output:
51+
<pre>
52+
.-------------------------------------------------.
53+
| lldbcs61 |
54+
+-----------------------------------------+-------+
55+
| Configuration Parameter | Value |
56+
+-----------------------------------------+-------+
57+
| Auto Diagcollection ( autodiagcollect ) | ON |
58+
'-----------------------------------------+-------'
59+
</pre>
60+
2. Use the `tfactl print collections` command to confirm that AHF completed an auto collection for the 2 Incidents you generated in Lab 5.
61+
```
62+
<copy>
63+
tfactl print collections -json -pretty -status completed
64+
</copy>
65+
```
66+
Command Output:
67+
<pre>
68+
[
69+
{
70+
"CollectionId": "20240715172659lldbcs61",
71+
"InitiatedNode": "lldbcs61",
72+
"CollectionType": "Auto Collection",
73+
"RequestUser": "oracle",
74+
"NodeList": "[lldbcs61, lldbcs62]",
75+
"StartTime": "2024-07-15T16:57:14.000+0000",
76+
"EndTime": "2024-07-15T17:32:41.000+0000",
77+
"ComponentList": "[rdbms, cvu, os, compliance, tns, chmos, asm, asmproxy, asmio, cha, afd]",
78+
"UploadStatus": "FAILED",
79+
"CollectionStatus": "COMPLETED",
80+
"Events": [
81+
{
82+
"Name": ".*ORA-0403(0|1).*",
83+
"Time": "2024-07-15T17:27:14.000+0000",
84+
"SourceFile": "/u01/app/oracle/diag/rdbms/raccvxfe_d3w_lhr/racCVXFE1/trace/alert_racCVXFE1.log"
85+
},
86+
{
87+
"Name": ".*ORA-00600.*",
88+
"Time": "2024-07-15T17:26:53.000+0000",
89+
"SourceFile": "/u01/app/oracle/diag/rdbms/raccvxfe_d3w_lhr/racCVXFE1/trace/alert_racCVXFE1.log"
90+
}
91+
],
92+
"NodeCollection": [
93+
{
94+
"Host": "lldbcs61",
95+
"Tag": "/u01/app/oracle.ahf/data/repository/auto_srdcCompositeMon_Jul_15_17_27_14_UTC_2024_node_lldbcs61/",
96+
"ZipFileName": "/u01/app/oracle.ahf/data/repository/auto_srdcCompositeMon_Jul_15_17_27_14_UTC_2024_node_lldbcs61/lldbcs61.tfa_srdc_autosrdc_Mon_Jul_15_17_32_46_UTC_2024.zip",
97+
"ZipFileSize": "20320",
98+
"CollectionTime": "410",
99+
"CheckSum": "bbfc92de15cf04c19875cf4bb1eda025c9749cdb1118d05c8f30c330b87e2189",
100+
"checksum_algo": "sha256",
101+
"UploadStatus": "FAILED"
102+
},
103+
{
104+
"Host": "lldbcs62",
105+
"Tag": "/u01/app/oracle.ahf/data/repository/auto_srdcCompositeMon_Jul_15_17_27_14_UTC_2024_node_lldbcs61/",
106+
"ZipFileName": "/u01/app/oracle.ahf/data/repository/auto_srdcCompositeMon_Jul_15_17_27_14_UTC_2024_node_lldbcs61/lldbcs62.tfa_srdc_autosrdc_Mon_Jul_15_17_32_46_UTC_2024.zip",
107+
"ZipFileSize": "21200",
108+
"CollectionTime": "420",
109+
"CheckSum": "bbfc92de15cf04c19875cf4bb1eda025bbfc92de15cf04c219875cf4bb1eda02",
110+
"checksum_algo": "sha256",
111+
"UploadStatus": "FAILED"
112+
}
113+
114+
]
115+
}
116+
]
117+
</pre>
118+
You can see from the above that this Collection is an *Auto Collection* generated for the *oracle user* as the errors were found in the alert log for a database
119+
owned by the *oracle user*.
120+
The collection was due to 2 Events in the alert log for the database instance **racCVXFE1** one ORA-00600 and one ORA-04031.
121+
Within the collection itself you can see the exact events.
122+
This collection was a clusterwide collection as we have files from both nodes that are copied back to a common directory on the initiating node.
123+
>Note: Please ignore the "UploadStatus": "FAILED" as this is only valid when the collection is to be uploaded after completion.
124+
125+
3. Review the Contents of the Automatic Diagnostic Collection.
126+
127+
All of the collection files and logs are copied back to the Inititating node in a directory under that directory.
128+
We can now go to that directory and see what files were collected.
129+
Use the "Tag" in your print collections to determine the correct location.
130+
131+
>"Tag": "/u01/app/oracle.ahf/data/repository/auto_srdcCompositeMon_Jul_15_17_27_14_UTC_2024_node_lldbcs61/"
132+
133+
<pre>
134+
cd /u01/app/oracle.ahf/data/repository/auto_srdcCompositeMon_Jul_15_17_27_14_UTC_2024_node_lldbcs61
135+
</pre>
136+
137+
Now use the `ls` command to see the files.
138+
```
139+
<copy>
140+
ls -al
141+
</copy>
142+
```
143+
Command Output:
144+
<pre>
145+
drwx------ 2 oracle oinstall 4096 Jul 15 17:33 .
146+
drwxr-xr-t 4 root root 4096 Jul 15 17:39 ..
147+
-rw-r--r-- 1 oracle oinstall 3568 Jul 15 17:39 diagcollect_20240715172659_lldbcs61.log
148+
-rw-r--r-- 1 oracle oinstall 2214 Jul 15 17:33 diagcollect_20240715172659_lldbcs62.log
149+
-rw-r--r-- 1 oracle oinstall 1928 Jul 15 17:39 diagcollect_console_20240715172659_lldbcs61.log
150+
-rw-r--r-- 1 oracle oinstall 0 Jul 15 17:33 insightcollect_20240715172659_lldbcs62.log
151+
-rw-r--r-- 1 oracle oinstall 20355710 Jul 15 17:39 lldbcs61.tfa_srdc_autosrdc_Mon_Jul_15_17_32_46_UTC_2024.zip
152+
-rw-r--r-- 1 oracle oinstall 10157 Jul 15 17:32 lldbcs61.tfa_srdc_autosrdc_Mon_Jul_15_17_32_46_UTC_2024.zip.json
153+
-rw-r--r-- 1 oracle oinstall 2167 Jul 15 17:39 lldbcs61.tfa_srdc_autosrdc_Mon_Jul_15_17_32_46_UTC_2024.zip.txt
154+
-rw-r--r-- 1 oracle oinstall 7618260 Jul 15 17:33 lldbcs62.tfa_srdc_autosrdc_Mon_Jul_15_17_32_46_UTC_2024.zip
155+
-rw-r--r-- 1 oracle oinstall 2240 Jul 15 17:33 lldbcs62.tfa_srdc_autosrdc_Mon_Jul_15_17_32_46_UTC_2024.zip.txt
156+
</pre>
157+
The diagcollect log is the top level log for the collection from each node.
158+
The digcollect_console is the reduced log that is equivalent to what you would see on the console had this been a manual collection.
159+
There is a zip collection from each node and files that describe the collection is **txt** and **json** format.
160+
> Note: The **txt** and **json** files are also in the collection zip files that you supply to Oracle Support and help in Support Request automation.
161+
162+
4. Check the contents of the Automatic Diagnostic Collection.
163+
164+
You can quickly review all the files collected/generated in the node using the `unzip -l` command
165+
<pre>>
166+
unzip -l lldbcs61.tfa_srdc_autosrdc_Mon_Jul_15_17_32_46_UTC_2024.zip
167+
</pre>
168+
> Note: You would be uploading the *.zip* files to Oracle Support when you have to raise a Support Request for the Incidents
169+
170+
171+
## Task 4: Understand Manual Diagnostic Collections for a specific incident type
172+
![Manual Diagnostic Collections](./images/manual_collect.png =40%x*)
173+
AHF has manual collections for :-
174+
- When customers do not want Automatic Collections enabled.
175+
- Incidents AHF does not detect automatically such as install or some performance issues.
176+
177+
Manual collections are more configurable (through CLI opstions) allowing addition of certain components and uploads to remove endpoints such as My Oracle Support.
178+
Manual collections still work cross nodes and bring back all collected data to the originating node.
179+
They also in most cases gather an AHF Insights report but we will talk about those in a later Lab.
180+
181+
![SRDC Diagnostic Collections](./images/srdc_collect.png =40%x*)
182+
Previously when raising a Support request with Oracle Support you would have been guided to a set of instructions to gather diagnostics
183+
for you specific problem. The list of actions could be long and complicated which meant that often required data was missed. The list of
184+
actions were known as **Support Request Driven Collections(SRDC)**. AHF has taken the list of actions and integrated in to a single command for
185+
the incident type.
186+
> These Support Request Driven Collections are generated using the `tfactl diagcollect -srdc` command as shown.
187+
188+
![SRDC Diagnostic Collections](./images/srdc_dbperf.png =40%x*)
189+
In the slide above you can see the comparison of collecting performance diagnostics through a command list and running the `tfactl diagcollect -srdc dbperf` command.
190+
191+
192+
## Task 5: Understand Manual Diagnostic Collections with the problem chooser
193+
![Problem Choose Diagnostic Collections](./images/problem_choose.png =40%x*)
194+
195+
Before the problem chooser, running a default AHF diagnostic Collection would mean that default collection collection would be taken.
196+
This would mean gathering diagnostics from every diagnostic location that AHF had detected on a system for many possible Oracle products.
197+
Collections could be very large and take a long time to gather unless you knew exactly how to filter what was collected by providing specific
198+
components to collect such as 'CRS' or 'ASM'.
199+
200+
The problem Chooser take the options you provide on the command line such as time range and try to find any issues AHF has detected.
201+
You will be prompted to choose:-
202+
* Whether one of those options is what you want.
203+
* To choose a category of problem rather than a detected problem.
204+
* Try a different time
205+
* Carry on with a default collection but after you type in the problem this collection is for.
206+
207+
> Remember we are trying to ensure we collect the minimum required to diagnose the problem.
208+
> If you choose the last option we have to collect everything in the hope we get what you want.
209+
210+
211+
## Task 6: Generate a manual collection using problem chooser
212+
1. Simply run the `tfactl diagcollect` command and let the problem chhoser guide you.
213+
```
214+
<copy>
215+
tfactl diagcollect
216+
</copy>
217+
218+
Command Output:
219+
220+
221+
2. Generate Incidents
14222

15-
## Task 1: TODO
16-
1. TODO
17-
2. TODO
18223
## Task 2: TODO
19224

20225
1. TODO
@@ -28,6 +233,6 @@ Estimated Lab Time: 20 Minutes
28233
You may now *proceed to the next lab*.
29234

30235
## Acknowledgements
31-
* **Authors** - Troy Anthony, Bill Burton
32-
* **Contributors** -
236+
* **Authors** - Bill Burton
237+
* **Contributors** - Troy Anthony, Gareth Chapman
33238
* **Last Updated By/Date** - Bill Burton, July 2024
Loading
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)