AIX Tips1: KDB Commands

KDB:

The KDB kernel debugger and the kdb command are useful for debugging device drivers, kernel extensions, and the kernel itself. Although they appear similar, the KDB kernel debugger and the kdb command are two separate tools:

KDB KERNEL DEBUGGER:

It is integrated into the kernel and allows full control of the system while a debugging session is in progress.

KDB AS COMMAND:

It is implemented as an ordinary user-space program and can be used for analyzing the following:

1. A running system: When used to analyze a running system, the kdb command opens the /dev/pmem special file, which allows direct access to the system's physical memory. The kdb command performs its own address translation internally using the same algorithms as the KDB kernel debugger.

2. A system dump file produced by a previously crashed-system: A system dump contains certain critical data structures. Only the memory belonging to the process that was running on the processor that created the dump image can be included in the dump file. When you work with a system dump, any subcommands that modify memory are not valid because the system dump is merely a snapshot of the real memory in a system.

When you are analyzing a system dump file, the kdb command must be started with arguments that specify the location of the dump file and the kernel file:

# kdb /var/adm/ras/vmcore.0 /unix

(The kernel file is used by the kdb command to resolve symbol names from the dump file.)

------------------------------------

A very valuable benefit of kdb, that a device setting stored in ODM (lsattr..) can be compared with the realtime value used in running kernel with kdb!!

------------------------------------

KDB COMMAND:

help display context                   lists subcommands with the context "display"
p -? list of parameters for the p subcommand and a brief description
! <command>                        shell escape (provides a convenient way to run UNIX commands without leaving kdb)
hi                               print history
lke                            list loaded extensions

pvol -M <major> -m <minor> display physical volume info

stat                                 system status info
status                           processor status
e                            exit from kdb

------------------------------------

echo vfcs fcs0 | kdb | grep num_cmd_elems shows num_cmd_elems in hex on VIO client with NPIV (compare with odm: lsattr -El fcs0)

(if you change num_cmd_elems with chdev, you can check in kdb if it really has been changed)

echo scsidisk hdisk0 | kdb | grep queue_depth shows real-time value in hex of queue_depth of given disk

------------------------------------

Check VSCSI adapter mapping:

(run this on vio client, not on vio server)

root@um_lpar: / # echo "cvai" | kdb | grep vscsi <--cvai is a kdb subcommand

read vscsi_scsi_ptrs OK, ptr = 0xF1000000C01A83C0

vscsi0 0x000007 0x0000000000 0x0 aix-vios1->vhost2 <--shows which vhost is used on which vio server for this client

vscsi1 0x000007 0x0000000000 0x0 aix-vios1->vhost1

vscsi2 0x000007 0x0000000000 0x0 aix-vios2->vhost2

Check NPIV adapter mapping:

(run this on vio client, not on vio server)

root@um_lpar: / # echo "vfcs" | kdb <--vfcs is a kdb subcommand

...

NAME ADDRESS STATE HOST HOST_ADAP OPENED NUM_ACTIVE

fcs0 0xF1000A000033A000 0x0008 aix-vios1 vfchost8 0x01 0x0000 <--shows which vfchost is used on vio server for this client

fcs1 0xF1000A0000338000 0x0008 aix-vios2 vfchost6 0x01 0x0000

------------------------------------

Check physical FC adapter setting (not in virtual environment):

(dyntrk, fc_err_recov, num_cmd_elems)

These are the settings what we would like to verify:

----------

root@um_lpar: / # lsattr -El fscsi0| egrep 'dyntrk|fc_err_recov'
dyntrk yes Dynamic Tracking of FC Devices True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True

root@um_lpar: / # lsattr -El fcs0| grep num_cmd_elems
num_cmd_elems 200 Maximum number of COMMANDS to queue to the adapter True

----------

Verifying the settings from kernel:

1. root@um_lpar: / # echo efscsi fscsi0 | kdb | grep efscsi_ddi

    struct efscsi_ddi ddi = 0xF1000A06007FA080                          <--this hexa value will be used

2. root@um_lpar: / # echo dd 0xF1000A06007FA080+20 2 | kdb <--"+20 2" should be added to the above hexa value
...                                                           (20 is a reserved number)
F1000A06007FA0A0: 0101020202010200 000000B400000028 ...............(     <--on the specified locations you can decode the numbers there

FFDD NNNNNNNN

FF = fc_error_recov:(we have "02" in this example here, which is fast_fail)

01 = delayed_fail

02 = fast_fail

DD = dyntrk: (we have "01" in this example here, which means "yes")

00 = disabled (no)

01 = enabled (yes)

NNNN = num_cmd_elems: (we have "B4" in this example here, but some calculation is still needed)

1. change to decimal value: 000000B4 --> 180

2. add 20 to the decimal number: 180 + 20 = 200

(you must always add "20" to the decimal value you get)

------------------------------------

Volume group and lv info:

The volgrp subcommand displays information about vg and its lvs.

The volgrp structure addresses are registered in the devsw table in the DSDPTR field.

(devsw: displays miscellaneous kernel data structures)

root@um_lpar: /dev # echo devsw | kdb | grep dsdptr | grep -v 00000000

dsdptr: F1000A0600751800 <--this will be used for "volgrp" command

dsdptr: 05A50280

dsdptr: F1000A0600751400

root@um_lpar: /dev # echo volgrp F1000A0600751800| kdb <--displays info about given volgrp

...

VOLGRP............. F1000A0600751800

vg_eyec............ 4C564D766F6C6772 (LVMvolgr)

vg_name............ rootvg

vg_ras_name........ rootvg

vg_id.............. 00080E820000D900000001335FBB8276

vg_lock.......... @ F1000A0600751868 vg_lock............ 0000000000000000

major_num.......... 0000000A flags.............. 00040001

snapshot_copy...... 0000 partshift.......... 0012 (128M)

ltg_shift.......... 0001 (256K) open_count......... 000A

max_lvs............ 0100 max_pvs............ 0020

....

------------------------------------

Check hcheck_interval value of a disk:

1. root@um_lpar: / # echo lke | kdb | grep pcm

59 F1000000A063D200 05A60000 00030000 02080242 /usr/lib/drivers/aixdiskpcmke <--this shows slot number, what we can use (here 59)

2. root@um_lpar: / # echo "lke -s 59" | kdb | grep le_data

le_data........ 0000000005A80000 le_datasize.... 0000000000002828 <--this shows le_data value
(we will use this in adevq subbcommand)

3. root@um_lpar: / # kdb

(0)> adevq

Unable to find <pcm_info>

Enter the pcm_info address (in hex): 0000000005A80000            <--the above value is given here
NAME      ADDR               STATE MACHINE ACTIVE_IO <--then we will see the list of hdisks
hdisk1    0xF1000A0600740400 0x0       0x       0 <--choose the address of a disk and run adevq against it

NAME ADDR STATE MACHINE ACTIVE_IO

hdisk2 0xF1000A0600740E00 0x0 0x 0

NAME ADDR STATE MACHINE ACTIVE_IO

hdisk3 0xF1000A0600741800 0x0 0x 0

NAME ADDR STATE MACHINE ACTIVE_IO

hdisk0 0xF1000A0600742200 0x0 0x 0

4. (0)> adevq 0xF1000A0600740400 | grep hcheck <--this shows the address of hcheck, what we will use

hcheck_t &hcheck = 0xF1000A0600740470

5. (0)> ahcheck 0xF1000A0600740470 | grep interval

uint interval = 0x0 <--this shows hcheck_interval value in hex (we have 0)

------------------------------------

Check for a process which is using a specific network port:

1. root@um_lpar: / # netstat -Aan | grep 22 <--check for address of the port

f1000e000330ebb8 tcp4 0 0 *.22 *.* LISTEN

2. root@um_lpar: / # kdb

(0)> sockinfo f1000e000330ebb8 tcpcb | grep pvproc <--feed the addres in sockinfo subcommand (grep for pvproc)

pvproc+016000 88*sshd ACTIVE 058000E 03A00A2 000000083846E480 0 0001

3. (0)> hcal 058000E <--calculate decimal value (this is the pid of the process)

Value hexa: 0058000E Value decimal: 5767182

(0)> e <--exit from kdb

4. root@um_lpar: / # ps -fp 5767182 <--shows the process of a given pid

UID PID PPID C STIME TTY TIME CMD

root 5767182 3801250 0 May 09 - 0:00 /usr/sbin/sshd

AIX Tips1

Tuesday, 28 July 2015

KDB Commands

No comments:

Post a Comment