Skip to content

XXX Log to perfdb.dcol1.delphix.com by default #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 48 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
8049b85
stbtrace can't find input files for package installations, collection…
brad-lewis Nov 19, 2019
247f1cc
Fix backend io collector post 5.x kernel upgrade (#18) (#20)
prashks Dec 6, 2019
b8e0e49
port txg dtrace script to ebpf
brad-lewis Jan 18, 2020
817bc78
Merge pull request #35 from brad-lewis/txg601
brad-lewis Feb 20, 2020
a4646a4
DLPX-69040 [Backport of DLPX-68396 to 6.0.2.0] port metaslab_alloc dt…
Mar 24, 2020
f19c981
DLPX-69162 [Backport of DLPX-69120 6.0.2.0] running estat against zvo…
Apr 9, 2020
e7b1156
stbtrace should have nonzero exit code on failure [Backport of #17 t…
May 11, 2020
fff5d57
commas in the output of keys in estat [Backport of #22 to 6.0.3.0] (#41)
May 11, 2020
9cf16a2
Compilation failures due to fentry issue [Backport of #33 to 6.0.3.0]…
May 14, 2020
607e52f
Switch to Python 3 [Backport of #9 to 6.0.3.0] (#43)
May 19, 2020
8e746fa
[Backport of DLPX-68397 to 6.0.3.0] need nfs threads to replace dtrac…
May 22, 2020
c27002d
Fix ZPL Collectors after ZFS changes to sync flags and include files …
brad-lewis Dec 6, 2019
c817b30
Fixes for estat usage and arg parsing error messages (#26)
gllghr Jan 10, 2020
e57bac9
Use copy of bcc_helper from performance-diagnositics
gllghr Jan 10, 2020
d80a17d
ZIL fixes
gllghr Jan 14, 2020
aba6e7f
Throughput values are not normalized
brad-lewis Jan 3, 2020
7835d1f
Invalid mem errors when using latest BCC
gllghr Feb 20, 2020
1e722e3
Add arc_prefetch script. (#34)
brad-lewis Mar 13, 2020
43b0a65
DLPX-72430 Estat zil fails to run because of unknown types (#50)
brad-lewis Oct 26, 2020
ccd9920
DLPX-72556 estat warning messages (#52)
brad-lewis Nov 2, 2020
2c79d81
Update estat iscsi, zvol, and zpl scripts. (#55)
brad-lewis Feb 3, 2021
a860fe3
Merge branch 'brad-lewis-estat67' into 6.0/stage
brad-lewis Feb 4, 2021
f0bb551
estat zil script compilation error on 6.1
brad-lewis Feb 23, 2021
b095412
estat zil script always reports an average of 10 allocations
brad-lewis Feb 23, 2021
8d5620b
DLPX-75711 [Backport of DLPX-75405 to 6.0.9.0] nfs_threads script sho…
May 19, 2021
ac43e68
DLPX-75470 estat zpl and arc_prefetch scripts need znode parameter (#…
brad-lewis Jun 11, 2021
3f98846
TOOL-11951 [Backport of TOOL-11731] performance-diagnostics: build-de…
pzakha Aug 3, 2021
5cb1777
estat backend-io script complains about missing blk_start_request kpr…
mr-t-73 Dec 9, 2021
fa44f67
DLPX-72683 [Backport of DLPX-72683 to 6.0.12.0] estat utility's help …
brad-lewis Dec 9, 2021
3d0e3f2
DLPX-77845 [Backport of DLPX-77532 to 6.0.12.0] The iscsi estat scrip…
brad-lewis Dec 9, 2021
729786d
DLPX-78812 Disk IO analytics collector not running on aws (#73)
brad-lewis Dec 21, 2021
2bb739d
DLPX-78888 [Backport of DLPX-78812 to 6.0.12] Disk IO analytics colle…
brad-lewis Dec 22, 2021
7a1656a
DLPX-78745 Apply estat iscsi approach to stbtrace script (#72)
brad-lewis Jan 21, 2022
d51210a
DLPX-78891 stbtrace zpl fails due to uid_t (#76)
brad-lewis Jan 21, 2022
ded9672
DLPX-79245 - [Backport of DLPX-78745 to 6.0.13.0] Apply estat iscsi a…
brad-lewis Jan 21, 2022
fe681cb
DLPX-79246 - [Backport of DLPX-78891 to 6.0.13.0] stbtrace zpl fails …
brad-lewis Feb 3, 2022
e20d662
TOOL-13470 add telegraf dependency to the performance-diagnostic pack…
grwilson Apr 13, 2022
8cd3f7a
TOOL-13515 [Backport of TOOL-13470 to 6.0.14.0] add telegraf dependen…
grwilson Apr 15, 2022
9a60780
Revert "TOOL-13515 [Backport of TOOL-13470 to 6.0.14.0] add telegraf …
sebroy Apr 16, 2022
6a20764
TOOL-13515 [Backport of TOOL-13470 to 6.0.15.0] add telegraf dependen…
grwilson Apr 20, 2022
18023e7
CP-8403 add telegraf-based metric collection
scottmdlpx Apr 15, 2022
92b0686
Merge pull request #81 from scottmdlpx/master
scottmdlpx Jun 28, 2022
4c0716d
CP-8525 [Backport of CP-8403 to 6.0.16.0] add telegraf-based metric c…
scottmdlpx Jul 19, 2022
e90f781
Merge remote-tracking branch 'origin/master' into 6.0/stage
Jul 19, 2022
9bbb16a
Merge pull request #87 from delphix/origin/projects/merge
Jul 20, 2022
ecf228d
DLPX-82298 Telegraf needs a restart delay for external programs (#90)
scottmdlpx Aug 19, 2022
1340e41
DLPX-82457 Telegraf: add zio-queue and metaslab-alloc to playbook col…
scottmdlpx Aug 19, 2022
07c6709
XXX Log to perfdb.dcol1.delphix.com by default
Mar 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions bpf/estat/backend-io.c
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ typedef struct {

BPF_HASH(io_base_data, u64, io_data_t);

// @@ kprobe|blk_start_request|disk_io_start
// @@ kprobe|blk_mq_start_request|disk_io_start
int
disk_io_start(struct pt_regs *ctx, struct request *reqp)
Expand All @@ -44,7 +43,7 @@ disk_io_start(struct pt_regs *ctx, struct request *reqp)
return (0);
}

// @@ kprobe|blk_account_io_completion|disk_io_done
// @@ kprobe|blk_account_io_done|disk_io_done
int
disk_io_done(struct pt_regs *ctx, struct request *reqp)
{
Expand Down
1 change: 0 additions & 1 deletion bpf/estat/zvol.c
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@
#define POOL (OPTARG)
#endif


// Structure to hold thread local data
typedef struct {
u64 ts;
Expand Down
2 changes: 1 addition & 1 deletion bpf/stbtrace/io.st
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ b = BPF(text=bpf_text)
if BPF.get_kprobe_functions(b'blk_start_request'):
b.attach_kprobe(event="blk_start_request", fn_name="disk_io_start")
b.attach_kprobe(event="blk_mq_start_request", fn_name="disk_io_start")
b.attach_kprobe(event="blk_account_io_completion", fn_name="disk_io_done")
b.attach_kprobe(event="blk_account_io_done", fn_name="disk_io_done")


helper = BCCHelper(b, BCCHelper.ANALYTICS_PRINT_MODE)
Expand Down
75 changes: 46 additions & 29 deletions bpf/stbtrace/iscsi.st
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ bpf_text += """
#define OP_NAME_LEN 6
typedef struct {
u64 ts;
u64 flags;
u64 size;
u32 direction;
} iscsi_data_t;

// Key structure for scalar aggegations maps
Expand All @@ -52,7 +52,8 @@ typedef struct {

HIST_KEY(iscsi_hist_key_t, iscsi_key_t);

BPF_HASH(iscsi_base_data, u64, iscsi_data_t);
BPF_HASH(iscsi_start_ts, u64, u64);
BPF_HASH(iscsi_base_data, u32, iscsi_data_t);
$maps:{map|
BPF_HASH($map.name$, iscsi_key_t, $map.type$);
}$
Expand All @@ -64,13 +65,31 @@ BPF_HASH($hist.name$, iscsi_hist_key_t, u64);
int iscsi_target_start(struct pt_regs *ctx, struct iscsi_conn *conn,
struct iscsi_cmd *cmd, struct iscsi_scsi_req *hdr)
{
iscsi_data_t data = {};
data.ts = bpf_ktime_get_ns();
data.flags = hdr->flags;
data.size = hdr->data_length;
iscsi_base_data.update((u64 *) &cmd, &data);
u64 ts = bpf_ktime_get_ns();
iscsi_start_ts.update((u64 *) &cmd, &ts);

return 0;
return (0);
}

int iscsi_target_response(struct pt_regs *ctx, struct iscsi_conn *conn,
struct iscsi_cmd *cmd, int state)
{
u32 tid = bpf_get_current_pid_tgid();
iscsi_data_t data = {};

u64 *tsp = iscsi_start_ts.lookup((u64 *) &cmd);
if (tsp == 0) {
return (0); // missed issue
}

data.ts = *tsp;
data.size = cmd->se_cmd.data_length;
data.direction = cmd->data_direction;

iscsi_base_data.update(&tid, &data);
iscsi_start_ts.delete((u64 *) &cmd);

return (0);
}

static int aggregate_data(iscsi_data_t *data, u64 ts, char *opstr)
Expand Down Expand Up @@ -99,33 +118,31 @@ static int aggregate_data(iscsi_data_t *data, u64 ts, char *opstr)
return 0;
}

int iscsi_target_end(struct pt_regs *ctx, struct iscsi_cmd *cmd)
int iscsi_target_end(struct pt_regs *ctx)
{
u64 ts = bpf_ktime_get_ns();
iscsi_data_t *data = iscsi_base_data.lookup((u64 *) &cmd);
u64 delta;
iscsi_key_t key = {};
char *opstr;

if (data == 0) {
return 0; // missed issue
}

if (data->flags & ISCSI_FLAG_CMD_READ) {
aggregate_data(data, ts, READ_STR);
} else if (data->flags & ISCSI_FLAG_CMD_WRITE) {
aggregate_data(data, ts, WRITE_STR);
}
iscsi_base_data.delete((u64 *) &cmd);

return 0;
u64 ts = bpf_ktime_get_ns();
u32 tid = bpf_get_current_pid_tgid();
iscsi_data_t *data = iscsi_base_data.lookup(&tid);

if (data == 0) {
return (0); // missed issue
}

if (data->direction == DMA_FROM_DEVICE) {
aggregate_data(data, ts, READ_STR);
} else if (data->direction == DMA_TO_DEVICE) {
aggregate_data(data, ts, WRITE_STR);
}
iscsi_base_data.delete(&tid);

return (0);
}

""" # noqa: W293
b = BPF(text=bpf_text)

b.attach_kprobe(event="iscsit_process_scsi_cmd", fn_name="iscsi_target_start")
b.attach_kprobe(event="iscsit_build_rsp_pdu", fn_name="iscsi_target_end")
b.attach_kprobe(event="iscsit_response_queue", fn_name="iscsi_target_response")
b.attach_kretprobe(event="iscsit_response_queue", fn_name="iscsi_target_end")

helper = BCCHelper(b, BCCHelper.ANALYTICS_PRINT_MODE)
$maps:{map|
Expand Down
11 changes: 6 additions & 5 deletions bpf/stbtrace/zpl.st
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ bpf_text += """
#include <linux/bpf_common.h>
#include <uapi/linux/bpf.h>

#include <sys/uio.h>
#include <sys/xvattr.h>
#include <sys/zfs_znode.h>

// Definitions for this script
#define READ_STR "read"
Expand Down Expand Up @@ -67,9 +68,9 @@ BPF_HASH($hist.name$, zpl_hist_key_t, u64);
}$

// Probe functions to initialize thread local data
int zfs_read_start(struct pt_regs *ctx, void *inode, uio_t *uio)
int zfs_read_start(struct pt_regs *ctx, struct znode *zn, zfs_uio_t *uio,
int flags)
{

u32 pid = bpf_get_current_pid_tgid();
zpl_data_t data = {};
data.ts = bpf_ktime_get_ns();
Expand All @@ -81,9 +82,9 @@ int zfs_read_start(struct pt_regs *ctx, void *inode, uio_t *uio)
}

// Probe functions to initialize thread local data
int zfs_write_start(struct pt_regs *ctx, void *inode, uio_t *uio)
int zfs_write_start(struct pt_regs *ctx, struct znode *zn, zfs_uio_t *uio,
int flags)
{

u32 pid = bpf_get_current_pid_tgid();
zpl_data_t data = {};
data.ts = bpf_ktime_get_ns();
Expand Down
24 changes: 16 additions & 8 deletions cmd/estat.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ def die(*args, **kwargs):
-q/-Q enable/disable latency histograms by size (default: off)
-y/-Y enable/disable the summary output (default: on)
-t/-T enable/disable emitting the summary total (default: on)
-j set output mode to JSON
-d LEVEL set BCC debug level
-e emit the resulting eBPF script without executing it

Expand All @@ -111,7 +112,6 @@ def die(*args, **kwargs):
particular the time spent allocating a block and time spent waiting for
the write I/O to complete. If POOL is not specified, defaults to tracing
the pool 'domain0'.

"""


Expand Down Expand Up @@ -149,6 +149,7 @@ def usage(msg):
script_arg = None
debug_level = 0
dump_bpf = False
output_mode = BCCHelper.ESTAT_PRINT_MODE


class Args:
Expand All @@ -161,6 +162,7 @@ class Args:
setattr(args, "latsize_hist", False)
setattr(args, "summary", True)
setattr(args, "total", True)
setattr(args, "json", False)

#
# We use getopt rather than argparse because it is very difficult to get
Expand All @@ -170,7 +172,7 @@ class Args:
# arguments.
#
try:
opts, rem_args = getopt.getopt(sys.argv[2:], "hmMa:lLzZqQyYnNtTd:e")
opts, rem_args = getopt.getopt(sys.argv[2:], "hmMa:lLjzZqQyYnNtTd:e")
except getopt.GetoptError as err:
die(err)

Expand All @@ -194,6 +196,7 @@ class Args:
dump_bpf = True
else:
switches = {'-l': "lat_hist",
'-j': "json",
'-z': "size_hist",
'-q': "latsize_hist",
'-y': "summary",
Expand All @@ -219,6 +222,9 @@ class Args:
if not (args.lat_hist or args.size_hist or args.latsize_hist):
args.lat_hist = True

if args.json:
output_mode = BCCHelper.ANALYTICS_PRINT_MODE

# Now that we are done parsing arguments, construct the text of the BPF program
try:
with open(base_dir + 'bpf/estat/' + program + '.c', 'r') as prog_file:
Expand Down Expand Up @@ -443,7 +449,7 @@ class Args:
probe_type + "'")

if args.lat_hist or args.size_hist or args.summary:
helper1 = BCCHelper(b, BCCHelper.ESTAT_PRINT_MODE)
helper1 = BCCHelper(b, output_mode)
helper1.add_key_type("name")
helper1.add_key_type("axis")

Expand All @@ -465,23 +471,24 @@ class Args:
"bytes")

if args.latsize_hist:
helper2 = BCCHelper(b, BCCHelper.ESTAT_PRINT_MODE)
helper2 = BCCHelper(b, output_mode)
helper2.add_aggregation("latsq", BCCHelper.LL_HISTOGRAM_AGGREGATION,
"microseconds")
helper2.add_key_type("size")
helper2.add_key_type("name")
helper2.add_key_type("axis")

if args.summary and args.total:
helper3 = BCCHelper(b, BCCHelper.ESTAT_PRINT_MODE)
helper3 = BCCHelper(b, output_mode)
helper3.add_aggregation("opst", BCCHelper.COUNT_AGGREGATION, "iops(/s)")
helper3.add_aggregation("datat", BCCHelper.SUM_AGGREGATION,
"throughput(k/s)")
helper3.add_key_type("name")

# Need real time;
print("%-16s\n" % strftime("%D - %H:%M:%S %Z")) # TODO deduplicate this line
print(" Tracing enabled... Hit Ctrl-C to end.")
if not args.json:
print("%-16s\n" % strftime("%D - %H:%M:%S %Z")) # TODO deduplicate line
print(" Tracing enabled... Hit Ctrl-C to end.")

# output
if monitor:
Expand All @@ -508,7 +515,8 @@ class Args:
helper1.printall(clear_data)
if args.summary and args.total:
helper3.printall(clear_data)
print("%-16s\n" % strftime("%D - %H:%M:%S %Z"))
if not args.json:
print("%-16s\n" % strftime("%D - %H:%M:%S %Z"))
except Exception as e:
die(e)
else:
Expand Down
2 changes: 1 addition & 1 deletion debian/control
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ Standards-Version: 4.1.2

Package: performance-diagnostics
Architecture: any
Depends: python3-bcc, python3-minimal, python3-psutil
Depends: python3-bcc, python3-minimal, python3-psutil, telegraf
Description: eBPF-based Performance Diagnostic Tools
A collection of eBPF-based tools for diagnosing performance issues.
3 changes: 3 additions & 0 deletions debian/rules
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,6 @@ override_dh_auto_install:
dh_install build/cmd/* /usr/bin
dh_install lib/* /usr/share/performance-diagnostics/lib
dh_install bpf/* /usr/share/performance-diagnostics/bpf
dh_install telegraf/delphix-telegraf-service telegraf/perf_playbook /usr/bin
dh_install telegraf/delphix-telegraf.service /lib/systemd/system
dh_install telegraf/telegraf* telegraf/*.sh /etc/telegraf
34 changes: 34 additions & 0 deletions telegraf/delphix-telegraf-service
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash
BASE_CONFIG=/etc/telegraf/telegraf.base
DOSE_INPUTS=/etc/telegraf/telegraf.inputs.dose
PLAYBOOK_INPUTS=/etc/telegraf/telegraf.inputs.playbook
PLAYBOOK_FLAG=/etc/telegraf/PLAYBOOK_ENABLED
TELEGRAF_CONFIG=/etc/telegraf/telegraf.conf


function engine_is_object_based() {
zdb -C | grep "type: 'object_store'" >/dev/null
[[ "$?" == "0" ]]
}

function playbook_is_enabled() {
[[ -f $PLAYBOOK_FLAG ]]
}

rm -f $TELEGRAF_CONFIG

if engine_is_object_based; then
if playbook_is_enabled; then
cat $PLAYBOOK_INPUTS $DOSE_INPUTS $BASE_CONFIG > $TELEGRAF_CONFIG
else
cat $DOSE_INPUTS $BASE_CONFIG > $TELEGRAF_CONFIG
fi
else
if playbook_is_enabled; then
cat $PLAYBOOK_INPUTS $BASE_CONFIG > $TELEGRAF_CONFIG
else
cat $BASE_CONFIG > $TELEGRAF_CONFIG
fi
fi

/usr/bin/telegraf -config $TELEGRAF_CONFIG
18 changes: 18 additions & 0 deletions telegraf/delphix-telegraf.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[Unit]
Description=Delphix Telegraf Metric Collection Agent
Documentation=https://github.com/influxdata/telegraf
PartOf=delphix.target
After=delphix-platform.service
PartOf=delphix-platform.service

[Service]
EnvironmentFile=-/etc/default/telegraf
User=root
ExecStart=/usr/bin/delphix-telegraf-service
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartForceExitStatus=SIGPIPE
KillMode=control-group

[Install]
WantedBy=delphix.target
3 changes: 3 additions & 0 deletions telegraf/nfs-threads.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/sh
nfs_threads | egrep --line-buffered -v "thr"

Loading