This article describes
general step to collect necessary info in any session stuck/hung issue (Teradata, 2019).
1) Collect perflook (/opt/teradata/gsctools/bin/perflook)
Ideally, please collect perflook twice with 2~3 min interval.
We can
see transition of session/system state(really hang or slowly moving) if we have
multiple perflooks.
Example:
#
/opt/teradata/gsctools/bin/perflook
2020-01-17 15:43:38 - Running
/opt/teradata/gsctools/bin/perflook version # 01.00.00.12
2020-01-17 15:43:39 - Running node level collects
2020-01-17 15:44:16 - Running some automated checks
for commonly known performance problems
Archive and compressing perflook.2020-01-17-1543_1-1
files for downloading...
/var/opt/teradata/gsctools/perflook/perflook.2020-01-17-1543_1-1.zip
is ready to download.
2020-01-17 15:44:16 - Finished...
The zipped output file is created in the following directory:
- /var/opt/teradata/gsctools/perflook/
2) Collect session
status with qrysessn
qrysessn
output is collected by perflook.
Please
make sure hang session is captured there.
Example of collect session status with qrysessn:
# mkdir /tmp/RECJ8R7MN
# script /tmp/RECJ8R7MN/qrysessn.txt
Script
started, file is /tmp/RECJ8R7MN/qrysessn.txt
# cnsterm 6
Attempting
to connect to CNS...Completed.
Hello
Input
Supervisor Command:
> start qrysessn
start
qrysessn
Started
'qrysessn' in window 2
at Thu Jan 16 09:55:07 2020
Input
Supervisor Command:
> ^C
# cnsterm 2
Attempting
to connect to CNS...Completed.
Hello
There
are currently no host utility locks in the DBC
--ShowLocks Processing Complete--
_______
| | |
|
___ __ ____ |
____ __|__ ____
|
/ |/ \
____| ____| ____|
| ____|
|
--- | /
| / |
/ | |
/ |
|
\___ | \____|
\____| \____| |__
\____|
Release 16.20.32.11 Version 16.20.32.11
SESSION STATUS Utility (Dec 94)
Session state Query Utility : 20/01/16 09:55:07
Please
enter a logical host id (? for help):
> 1
1
Please enter session ids (? for help):
> 33248872
33248872
Please enter session ids (? for help):
>
Is
detail information needed if the session is involved in
HUT/FASTLOAD/MLOAD/EXPORT/Concurrency?
y-yes,
n-no
> y
y
QUERYSESSION
will run in Workload Def: WD-Default
Session state query results : 20/01/16 11:56:35
Host
Session PE DBC User ID
-----
-------- ----- -------------------------------
1
33248872 30719 IT_AXBO_STG_SOI004_00001
State details : ABORTING
Statement
Code Time CPU Usage Accesses
---------
----- -------- ------------- ------------
1
0 04:00:54 N/A N/A
===================================================================
End of session state report
Please enter a logical host id (? for help):
> ^C
# exit
exit
Script
done, file is /tmp/RECJ8R7MN/qrysessn.txt
3) Collect session
status with gtwglobal (/usr/tgtw/bin/gtwglobal)
Please
collect this info before trying to abort session if you did not try to abort
yet. (session state in gtwglobal may change due to abort attempt)
(run following
command on TPA node)
#
/usr/tgtw/bin/xgtwglobal
_______
| | |
| ___
__ ____ |
____ __|__ ____
| /
|/ \ ____|
____| ____| |
____|
| ---
| / |
/ | /
| | /
|
| \___
| \____| \____|
\____| |__ \____|
Enter gateway
command or enter h for Help:
se ho
1
Host 1 has been selected.
Enter gateway
command or enter h for Help:
1>di
se 33248872 long
Session 33248872 connected to GTW 22528 is assigned to
PE 30719
of host 1
User Name
Account IP Addr Port
-------------------- --------------------
--------------------------------------------- ----------
IT_AXBO_STG_SOI004_0 194.14.159.156 59206
State Event Action
---------------------------------------
------------------------ ------------------------
CS_WAITSTARTDBSRSPFORCED CE_KILL CA_NOP
Partition
Authentication
---------------- --------------
DBC/SQL
DATABASE
Client Security Policy Proxy Security Policy
--------------------------------
--------------------------------
No Policy
Recoverable Network Protocol
--------------------------------
Not Supported
StrMbx: 01 00
77ff 00000058 0000 00 00
CntMbx: 02 00
77ff 00000000 040d 00 00
AbtMbx: 02 00
77ff 00000000 030d 00 00
HostMessageReads
: 22
HostBlockReads
: 11
HostReadBytes
: 4501
DbsMessageReads
: 8
HostMessageWrites : 10
HostBlockWrites
: 10
HostWriteBytes
: 22296
DbsMessageWrites
: 10
4) Run lokdisp
and collect blocking info
Blocking
info may be helpful even when session is NOT in BLOCKED status.
Session status remains in PARSING when express request is blocked.
(express request is for PE to retrieve data from AMP during optimizing process)
Type following command in lokdisp:
> bl
<<<< display blocking info
> a
<<<< request to all AMP
> tr <<<< display transaction lock
> (AMP
No) <<<< specify AMP # where blocking is
found
Please
repeat "tr" command on all AMPs where blocking is
found.
Example:
#
lokdisp
Amp Utility
_______
| | |
| ___
__ ____ |
____ __|__ ____
| /
|/ \ ____|
____| ____| |
____|
| ---
| / |
/ | /
| | /
|
| \___
| \____| \____|
\____| |__ \____|
Release
16.20.32.11 Version 16.20.32.11
LOCK DISPLAY
UTILITY (June 2000)
LOCK DISPLAY UTILITY Command String Syntax:
Help or ?
TRan [ProcId Uniq1 Uniq2] | [ALL]
Db [DBname] | [ALL]
TAble [DBname.Tablename] | [ALL]
ROWRange [DBname.Tablename
TypeAndIndex] | [ALL]
ROWKey [DBname.Tablename TypeAndIndex, Partition
RowHash1 RowHash2] |
[ALL]
Partition
[DBname.Tablename TypeAndIndex] | [ALL]
Blockers [TRAN [ProcId Uniq1
Uniq2] | [ALL]] | [LIMIT [Number] | [NONE]]
Quit
-> Please enter your selection from the list:
bl
- The following amps are available:
0 1
2 3 4
5 6 7
8 9 10
11 12 13
14 15
16 17 18
19
-> Which amp(s) do you want to request on
(S=Sampling/A=all/C=cancel/Q=quit):
a
*** No lock
request was found.
-------------< END OF OUTPUT >-------------
LOCK DISPLAY UTILITY Command String Syntax:
Help or ?
TRan [ProcId Uniq1 Uniq2] | [ALL]
Db [DBname] | [ALL]
TAble [DBname.Tablename] | [ALL]
ROWRange [DBname.Tablename
TypeAndIndex] | [ALL]
ROWKey [DBname.Tablename TypeAndIndex, Partition
RowHash1 RowHash2] |
[ALL]
Partition
[DBname.Tablename TypeAndIndex] | [ALL]
Blockers [TRAN [ProcId Uniq1 Uniq2] | [ALL]] | [LIMIT
[Number] | [NONE]]
Quit
-> Please enter your selection from the list:
tr
- The following amps are available:
0 1
2 3 4
5 6 7
8 9 10
11 12 13
14 15
16 17 18
19
-> Which amp(s) do you want to request on
(S=Sampling/A=all/C=cancel/Q=quit):
0
---------------- AMP 0 REPORTS 67 LOCK ENTRIES
-------------
GRANTED LOCK REQUEST(S):
Tran: 30718 47F21C78
Hash Locks
: 0
Range Locks
: 0
Host: 1025
Session: 02011325 (33624869) Mode: Ac
User: STATSMAN
Database: PDCRDATA
Table: DBQLOGTBL_HST
Tran: 30718 47F43302
Hash Locks
: 0
Range Locks
: 0
Host: 2049
Session: 02012A68 (33630824) Mode: WR
User: IT_COMMON_TRF_00001
Database: ITSTGSOFTONET Table: S1_SOFTONE_TRANSACTIONS_STOREID
Row Hash Lock Subtable ID: 1024
Row Hash1:
3097,32681
Host: 2049
Session: 02014FF0 (33640432) Mode: EX
User: IT_COMMON_TRF_00001
Database: ITSTGAXBOT
Table: WT_S0_AXBO_POS_LINEITEM
Tran: 30719 48E6A118
Hash Locks
: 0
Range Locks
: 0
Host: 2049
Session: 02014FFB (33640443) Mode: EX
User: IT_COMMON_TRF_00001
Database: ITSTGAXBOT
Table: WT_S1_AXBO_POS_RETAILPRICEMOD
Tran: 30719 48E6A34A
Hash Locks :
0
Range Locks
: 0
Host: 2049
Session: 0201500F (33640463) Mode: EX
User: IT_COMMON_TRF_00001
Database: ITSTGAXBOT
Table: WT_S0_AXBO_POS_TOTAL
-------------< END OF OUTPUT >-------------
LOCK DISPLAY UTILITY Command String Syntax:
Help or ?
TRan [ProcId Uniq1 Uniq2] | [ALL]
Db [DBname] | [ALL]
TAble [DBname.Tablename] | [ALL]
ROWRange [DBname.Tablename TypeAndIndex]
| [ALL]
ROWKey [DBname.Tablename TypeAndIndex, Partition
RowHash1 RowHash2] |
[ALL]
Partition
[DBname.Tablename TypeAndIndex] | [ALL]
Blockers [TRAN [ProcId Uniq1
Uniq2] | [ALL]] | [LIMIT [Number] | [NONE]]
Quit
5) (If session is in
aborting state) See if rollback is running
Aborted
session will remain in aborting state until rollback is completed.
Run this
command in rcvmanager utility:
> LIST ROLLBACK TABLES;
Example:
# cnsterm 6
Attempting to connect to CNS...Completed.
Hello
>
start rcvmanager
start rcvmanager
Started 'rcvmanager' in window 1
at Thu Jan 16
08:57:30 2020
Input Supervisor Command:
>
^C
# cnsterm 1
Attempting
to connect to CNS...Completed.
Hello
(A Large Cylinder is 6 times the size of a
small cylinder)
Ferret ==>
quit
Waiting
for Ferret Slave Tasks to exit
Ferret
Exited
_______
| | |
| ___
__ ____ |
____ __|__ ____
|
/ |/ \
____| ____| ____|
| ____|
|
--- | /
| / |
/ | |
/ |
|
\___ | \____|
\____| \____| |__
\____|
Release 16.20.32.11 Version 16.20.32.11
RCVMANAGER Utility (Jan 96)
Enter
command, "QUIT;" or "HELP;" :> list
rollback tables;
list rollback tables;
TABLES
BEING ROLLED BACK AT 09:00:02 20/01/16
ONLINE
USER ROLLBACK TABLE LIST
Host Session
User ID Performance Group AMP W/Count
---- --------
--------- ------------------------------ -----------
TJ
Rows Left TJ Rows Done Time Est.
------------- -------------
---------
Table
ID Name
---------
------------------------------------------------------------------
SYSTEM
RECOVERY ROLLBACK TABLE LIST
Host Session
TJ Row Count
---- --------
-------------
Table
ID Name
---------
------------------------------------------------------------------
6) See if ABORT
SESSION can abort hang session
Please
get customer approval before running ABORT SESSION.
If
ABORT SESSION works, you don't need to move on to next step.
Example:
#puma -TAXO | grep 33248872
9606 30719
13 9833 0x16007100 0 0 0 msgnowk idle disstart
168 2049 33248872
6 0-0 255 18:1 -
-
#puma
-b -u 9833
Stack Trace: Thread 9833 (0x2669), Process 9606
(0x2586), disstart
00 0x00007fffeaa3b9c7 Frame<00007fffeaa3b9c7>
Return<0000000000000000> libc.so.6!ioctl + 0x7
01 0x00007ffff05697b9 Frame<00007ffff05697b9>
Return<0000000000000000> syscalldriver + 0xf9 syscall.c:217
02 0x00007ffff0581ca4 Frame<00007ffff0581ca4>
Return<0000000000000000> msgrxseg + 0x54 msglib.c:642
03 0x00007ffff187af75 Frame<00007ffff187af75>
Return<0000000000000000> disprreq + 0x11f3 disdrivr.c:3064
04 0x00007ffff187e633 Frame<00007ffff187e633>
Return<0000000000000000> Process_Request + 0x49a disdrivr.c:5028
05 0x00007ffff187e71e Frame<00007ffff187e71e>
Return<0000000000000000> sqldrivr_thread + 0xc1 disdrivr.c:5079
06 0x00007ffff055e121 Frame<00007ffff055e121>
Return<0000000000000000> tsknewthread + 0x3f1 tskport.c:1130
07 0x00007fffef5a1806 Frame<00007fffef5a1806>
Return<0000000000000000> libpthread.so.0!start_thread + 0xe6
08 0x00007fffeaa4369d Frame<00007fffeaa4369d>
Return<0000000000000000> libc.so.6!clone + 0x6d
#cnsterm
6
Attempting to connect to CNS...Completed.
Hello
Input Supervisor Command:
>
abort session 1:33248872 logoff list
abort session 1:33248872 logoff list
HostID:SessNo
Status/UserName
------------------ ------------------------
1: 33248872
/IT_AXBO_STG_SOI004_00001
Input Supervisor Command:
>
^C
7) Run tpareset -d if
it's unable to abort session.
Please make sure to get customer approval before tpareset.
Engage
GSO/DBS and let them review system before tpareset so we can identify the node
to be uploaded.
8) Upload dump
Don't upload full dump if system has more than 4 nodes. Please
upload node(s) with hang session tasks only.
References
Teradata. (2019). How to collect necessary info
for session stuck/hang issue? Retrieved from
https://support.teradata.com/knowledge?id=kb_article_view&sys_kb_id=bbd554bb1b173f00682ca8233a4bcb5a