How to collect necessary info for session stuck/hang issue in Teradata

This article describes general step to collect necessary info in any session stuck/hung issue (Teradata, 2019).

1) Collect perflook (/opt/teradata/gsctools/bin/perflook)

Ideally, please collect perflook twice with 2~3 min interval.

We can see transition of session/system state(really hang or slowly moving) if we have multiple perflooks.

Example:

# /opt/teradata/gsctools/bin/perflook

2020-01-17 15:43:38 - Running /opt/teradata/gsctools/bin/perflook version # 01.00.00.12

2020-01-17 15:43:39 - Running node level collects

2020-01-17 15:44:16 - Running some automated checks for commonly known performance problems

Archive and compressing perflook.2020-01-17-1543_1-1 files for downloading...

/var/opt/teradata/gsctools/perflook/perflook.2020-01-17-1543_1-1.zip is ready to download.

2020-01-17 15:44:16 - Finished...

The zipped output file is created in the following directory:

  • /var/opt/teradata/gsctools/perflook/

2) Collect session status with qrysessn

qrysessn output is collected by perflook.

Please make sure hang session is captured there.

Example of collect session status with qrysessn:

# mkdir /tmp/RECJ8R7MN

# script /tmp/RECJ8R7MN/qrysessn.txt

Script started, file is /tmp/RECJ8R7MN/qrysessn.txt

# cnsterm 6

Attempting to connect to CNS...Completed.

Hello

Input Supervisor Command:

> start qrysessn

start qrysessn

Started 'qrysessn' in window 2

at Thu Jan 16 09:55:07 2020

Input Supervisor Command:

> ^C

# cnsterm 2

Attempting to connect to CNS...Completed.

Hello

There are currently no host utility locks in the DBC

--ShowLocks Processing Complete--

_______

| | |

| ___ __ ____ | ____ __|__ ____

| / |/ \ ____| ____| ____| | ____|

| --- | / | / | / | | / |

| \___ | \____| \____| \____| |__ \____|

Release 16.20.32.11 Version 16.20.32.11

SESSION STATUS Utility (Dec 94)

Session state Query Utility : 20/01/16 09:55:07

Please enter a logical host id (? for help):

> 1

1

Please enter session ids (? for help):

> 33248872

33248872

Please enter session ids (? for help):

>

Is detail information needed if the session is involved in HUT/FASTLOAD/MLOAD/EXPORT/Concurrency?

y-yes, n-no

> y

y

QUERYSESSION will run in Workload Def: WD-Default

Session state query results : 20/01/16 11:56:35

Host Session PE DBC User ID

----- -------- ----- -------------------------------

1 33248872 30719 IT_AXBO_STG_SOI004_00001

State details : ABORTING

Statement Code Time CPU Usage Accesses

--------- ----- -------- ------------- ------------

1 0 04:00:54 N/A N/A

===================================================================

End of session state report

Please enter a logical host id (? for help):

> ^C

# exit

exit

Script done, file is /tmp/RECJ8R7MN/qrysessn.txt

3) Collect session status with gtwglobal (/usr/tgtw/bin/gtwglobal)

Please collect this info before trying to abort session if you did not try to abort yet. (session state in gtwglobal may change due to abort attempt)

(run following command on TPA node)

# /usr/tgtw/bin/xgtwglobal

_______

| | |

| ___ __ ____ | ____ __|__ ____

| / |/ \ ____| ____| ____| | ____|

| --- | / | / | / | | / |

| \___ | \____| \____| \____| |__ \____|

Enter gateway command or enter h for Help:

se ho 1

Host 1 has been selected.

Enter gateway command or enter h for Help:

1>di se 33248872 long

Session 33248872 connected to GTW 22528 is assigned to PE 30719

of host 1

User Name Account IP Addr Port

-------------------- -------------------- --------------------------------------------- ----------

IT_AXBO_STG_SOI004_0 194.14.159.156 59206

State Event Action

--------------------------------------- ------------------------ ------------------------

CS_WAITSTARTDBSRSPFORCED CE_KILL CA_NOP

Partition Authentication

---------------- --------------

DBC/SQL DATABASE

Client Security Policy Proxy Security Policy

-------------------------------- --------------------------------

No Policy

Recoverable Network Protocol

--------------------------------

Not Supported

StrMbx: 01 00 77ff 00000058 0000 00 00

CntMbx: 02 00 77ff 00000000 040d 00 00

AbtMbx: 02 00 77ff 00000000 030d 00 00

HostMessageReads : 22

HostBlockReads : 11

HostReadBytes : 4501

DbsMessageReads : 8

HostMessageWrites : 10

HostBlockWrites : 10

HostWriteBytes : 22296

DbsMessageWrites : 10

4) Run lokdisp and collect blocking info

Blocking info may be helpful even when session is NOT in BLOCKED status.

Session status remains in PARSING when express request is blocked.

(express request is for PE to retrieve data from AMP during optimizing process)

Type following command in lokdisp:

> bl <<<< display blocking info

> a <<<< request to all AMP

> tr <<<< display transaction lock

> (AMP No) <<<< specify AMP # where blocking is found

Please repeat "tr" command on all AMPs where blocking is found.

Example:

# lokdisp

Amp Utility

_______

| | |

| ___ __ ____ | ____ __|__ ____

| / |/ \ ____| ____| ____| | ____|

| --- | / | / | / | | / |

| \___ | \____| \____| \____| |__ \____|

Release 16.20.32.11 Version 16.20.32.11

LOCK DISPLAY UTILITY (June 2000)

LOCK DISPLAY UTILITY Command String Syntax:

Help or ?

TRan [ProcId Uniq1 Uniq2] | [ALL]

Db [DBname] | [ALL]

TAble [DBname.Tablename] | [ALL]

ROWRange [DBname.Tablename TypeAndIndex] | [ALL]

ROWKey [DBname.Tablename TypeAndIndex, Partition RowHash1 RowHash2] |

[ALL]

Partition [DBname.Tablename TypeAndIndex] | [ALL]

Blockers [TRAN [ProcId Uniq1 Uniq2] | [ALL]] | [LIMIT [Number] | [NONE]]

Quit

-> Please enter your selection from the list:

bl

- The following amps are available:

0 1 2 3 4 5 6 7 8 9 10 11 12 13

14 15 16 17 18 19

-> Which amp(s) do you want to request on (S=Sampling/A=all/C=cancel/Q=quit):

a

*** No lock request was found.

-------------< END OF OUTPUT >-------------

LOCK DISPLAY UTILITY Command String Syntax:

Help or ?

TRan [ProcId Uniq1 Uniq2] | [ALL]

Db [DBname] | [ALL]

TAble [DBname.Tablename] | [ALL]

ROWRange [DBname.Tablename TypeAndIndex] | [ALL]

ROWKey [DBname.Tablename TypeAndIndex, Partition RowHash1 RowHash2] |

[ALL]

Partition [DBname.Tablename TypeAndIndex] | [ALL]

Blockers [TRAN [ProcId Uniq1 Uniq2] | [ALL]] | [LIMIT [Number] | [NONE]]

Quit

-> Please enter your selection from the list:

tr

- The following amps are available:

0 1 2 3 4 5 6 7 8 9 10 11 12 13

14 15 16 17 18 19

-> Which amp(s) do you want to request on (S=Sampling/A=all/C=cancel/Q=quit):

0

---------------- AMP 0 REPORTS 67 LOCK ENTRIES -------------

GRANTED LOCK REQUEST(S):

Tran: 30718 47F21C78

Hash Locks : 0

Range Locks : 0

Host: 1025 Session: 02011325 (33624869) Mode: Ac User: STATSMAN

Database: PDCRDATA Table: DBQLOGTBL_HST

Tran: 30718 47F43302

Hash Locks : 0

Range Locks : 0

Host: 2049 Session: 02012A68 (33630824) Mode: WR User: IT_COMMON_TRF_00001

Database: ITSTGSOFTONET Table: S1_SOFTONE_TRANSACTIONS_STOREID

Row Hash Lock Subtable ID: 1024

Row Hash1: 3097,32681

Host: 2049 Session: 02014FF0 (33640432) Mode: EX User: IT_COMMON_TRF_00001

Database: ITSTGAXBOT Table: WT_S0_AXBO_POS_LINEITEM

Tran: 30719 48E6A118

Hash Locks : 0

Range Locks : 0

Host: 2049 Session: 02014FFB (33640443) Mode: EX User: IT_COMMON_TRF_00001

Database: ITSTGAXBOT Table: WT_S1_AXBO_POS_RETAILPRICEMOD

Tran: 30719 48E6A34A

Hash Locks : 0

Range Locks : 0

Host: 2049 Session: 0201500F (33640463) Mode: EX User: IT_COMMON_TRF_00001

Database: ITSTGAXBOT Table: WT_S0_AXBO_POS_TOTAL

-------------< END OF OUTPUT >-------------

LOCK DISPLAY UTILITY Command String Syntax:

Help or ?

TRan [ProcId Uniq1 Uniq2] | [ALL]

Db [DBname] | [ALL]

TAble [DBname.Tablename] | [ALL]

ROWRange [DBname.Tablename TypeAndIndex] | [ALL]

ROWKey [DBname.Tablename TypeAndIndex, Partition RowHash1 RowHash2] |

[ALL]

Partition [DBname.Tablename TypeAndIndex] | [ALL]

Blockers [TRAN [ProcId Uniq1 Uniq2] | [ALL]] | [LIMIT [Number] | [NONE]]

Quit

5) (If session is in aborting state) See if rollback is running

Aborted session will remain in aborting state until rollback is completed.

Run this command in rcvmanager utility:

> LIST ROLLBACK TABLES;

Example:

# cnsterm 6

Attempting to connect to CNS...Completed.

Hello

> start rcvmanager

start rcvmanager

Started 'rcvmanager' in window 1

at Thu Jan 16 08:57:30 2020

Input Supervisor Command:

> ^C

# cnsterm 1

Attempting to connect to CNS...Completed.

Hello

(A Large Cylinder is 6 times the size of a small cylinder)

Ferret ==>

quit

Waiting for Ferret Slave Tasks to exit

Ferret Exited

_______

| | |

| ___ __ ____ | ____ __|__ ____

| / |/ \ ____| ____| ____| | ____|

| --- | / | / | / | | / |

| \___ | \____| \____| \____| |__ \____|

Release 16.20.32.11 Version 16.20.32.11

RCVMANAGER Utility (Jan 96)

Enter command, "QUIT;" or "HELP;" :> list rollback tables;

list rollback tables;

TABLES BEING ROLLED BACK AT 09:00:02 20/01/16

ONLINE USER ROLLBACK TABLE LIST

Host Session User ID Performance Group AMP W/Count

---- -------- --------- ------------------------------ -----------

TJ Rows Left TJ Rows Done Time Est.

------------- ------------- ---------

Table ID Name

--------- ------------------------------------------------------------------

SYSTEM RECOVERY ROLLBACK TABLE LIST

Host Session TJ Row Count

---- -------- -------------

Table ID Name

--------- ------------------------------------------------------------------

6) See if ABORT SESSION can abort hang session

Please get customer approval before running ABORT SESSION.

If ABORT SESSION works, you don't need to move on to next step.

Example:

#puma -TAXO | grep 33248872

9606 30719 13 9833 0x16007100 0 0 0 msgnowk idle disstart 168 2049 33248872 6 0-0 255 18:1 - -

#puma -b -u 9833

Stack Trace: Thread 9833 (0x2669), Process 9606 (0x2586), disstart

00 0x00007fffeaa3b9c7 Frame<00007fffeaa3b9c7> Return<0000000000000000> libc.so.6!ioctl + 0x7

01 0x00007ffff05697b9 Frame<00007ffff05697b9> Return<0000000000000000> syscalldriver + 0xf9 syscall.c:217

02 0x00007ffff0581ca4 Frame<00007ffff0581ca4> Return<0000000000000000> msgrxseg + 0x54 msglib.c:642

03 0x00007ffff187af75 Frame<00007ffff187af75> Return<0000000000000000> disprreq + 0x11f3 disdrivr.c:3064

04 0x00007ffff187e633 Frame<00007ffff187e633> Return<0000000000000000> Process_Request + 0x49a disdrivr.c:5028

05 0x00007ffff187e71e Frame<00007ffff187e71e> Return<0000000000000000> sqldrivr_thread + 0xc1 disdrivr.c:5079

06 0x00007ffff055e121 Frame<00007ffff055e121> Return<0000000000000000> tsknewthread + 0x3f1 tskport.c:1130

07 0x00007fffef5a1806 Frame<00007fffef5a1806> Return<0000000000000000> libpthread.so.0!start_thread + 0xe6

08 0x00007fffeaa4369d Frame<00007fffeaa4369d> Return<0000000000000000> libc.so.6!clone + 0x6d

#cnsterm 6

Attempting to connect to CNS...Completed.

Hello

Input Supervisor Command:

> abort session 1:33248872 logoff list

abort session 1:33248872 logoff list

HostID:SessNo Status/UserName

------------------ ------------------------

1: 33248872 /IT_AXBO_STG_SOI004_00001

Input Supervisor Command:

> ^C

7) Run tpareset -d if it's unable to abort session.

Please make sure to get customer approval before tpareset.

Engage GSO/DBS and let them review system before tpareset so we can identify the node to be uploaded.

8) Upload dump

Don't upload full dump if system has more than 4 nodes. Please upload node(s) with hang session tasks only.

References

Teradata. (2019). How to collect necessary info for session stuck/hang issue? Retrieved from https://support.teradata.com/knowledge?id=kb_article_view&sys_kb_id=bbd554bb1b173f00682ca8233a4bcb5a