How to collect necessary info for session stuck/hang issue in Teradata

This article describes general step to collect necessary info in any session stuck/hung issue (Teradata, 2019).

 

1) Collect perflook (/opt/teradata/gsctools/bin/perflook)
   Ideally, please collect perflook twice with 2~3 min interval.

   We can see transition of session/system state(really hang or slowly moving) if we have multiple perflooks.

Example:

# /opt/teradata/gsctools/bin/perflook

2020-01-17 15:43:38 - Running /opt/teradata/gsctools/bin/perflook version # 01.00.00.12

2020-01-17 15:43:39 - Running node level collects

2020-01-17 15:44:16 - Running some automated checks for commonly known performance problems

Archive and compressing perflook.2020-01-17-1543_1-1 files for downloading...

/var/opt/teradata/gsctools/perflook/perflook.2020-01-17-1543_1-1.zip is ready to download.

2020-01-17 15:44:16 - Finished...

 

The zipped output file is created in the following directory:

  • /var/opt/teradata/gsctools/perflook/

2) Collect session status with qrysessn

   qrysessn output is collected by perflook.

   Please make sure hang session is captured there. 

 Example of collect session status with qrysessn:

# mkdir /tmp/RECJ8R7MN

# script /tmp/RECJ8R7MN/qrysessn.txt

Script started, file is /tmp/RECJ8R7MN/qrysessn.txt

# cnsterm 6

Attempting to connect to CNS...Completed.

Hello

 Input Supervisor Command:

> start qrysessn

start qrysessn

 Started 'qrysessn' in window 2

 at Thu Jan 16 09:55:07 2020

 Input Supervisor Command:

> ^C

 # cnsterm 2

Attempting to connect to CNS...Completed.

Hello

 There are currently no host utility locks in the DBC

              --ShowLocks Processing Complete--

 _______

    |                                |              |

    |    ___     __     ____         |    ____    __|__    ____

    |   /      |/  \    ____|    ____|    ____|     |      ____|

    |   ---    |       /    |   /    |   /    |     |     /    |

    |   \___   |       \____|   \____|   \____|     |__   \____|

 

    Release 16.20.32.11 Version 16.20.32.11

    SESSION STATUS Utility (Dec 94)

  Session state Query Utility : 20/01/16  09:55:07

 Please enter a logical host id (? for help):

 > 1

1

  Please enter session ids (? for help):

> 33248872

33248872

  Please enter session ids (? for help):

Is detail information needed if the session is involved in HUT/FASTLOAD/MLOAD/EXPORT/Concurrency?

y-yes, n-no

> y

y

QUERYSESSION will run in Workload Def: WD-Default

 Session state query results : 20/01/16  11:56:35

 Host   Session   PE     DBC User ID

 -----  --------  -----  -------------------------------

     1  33248872  30719  IT_AXBO_STG_SOI004_00001

 State details : ABORTING

 Statement  Code   Time         CPU Usage     Accesses

 ---------  -----  --------     ------------- ------------

         1      0  04:00:54     N/A          N/A

 ===================================================================

 End of session state report

 Please enter a logical host id (? for help):

> ^C

# exit

exit

Script done, file is /tmp/RECJ8R7MN/qrysessn.txt

3) Collect session status with gtwglobal (/usr/tgtw/bin/gtwglobal)

    Please collect this info before trying to abort session if you did not try to abort yet. (session state in gtwglobal may change due to abort attempt)

 (run following command on TPA node)

# /usr/tgtw/bin/xgtwglobal

 _______

    |                                |              |

    |    ___     __     ____         |    ____    __|__    ____

    |   /      |/  \    ____|    ____|    ____|     |      ____|

    |   ---    |       /    |   /    |   /    |     |     /    |

    |   \___   |       \____|   \____|   \____|     |__   \____|

 

 Enter gateway command or enter h for Help:

se ho 1

Host 1 has been selected.

 Enter gateway command or enter h for Help:

1>di se 33248872 long

Session 33248872 connected to GTW 22528 is assigned to PE 30719

of host 1

User Name            Account              IP Addr                                       Port

-------------------- -------------------- --------------------------------------------- ----------

IT_AXBO_STG_SOI004_0                      194.14.159.156                                59206

State                                   Event                    Action

--------------------------------------- ------------------------ ------------------------

CS_WAITSTARTDBSRSPFORCED                CE_KILL                  CA_NOP

 

Partition        Authentication

---------------- --------------

DBC/SQL          DATABASE

Client Security Policy           Proxy Security Policy

-------------------------------- --------------------------------

No Policy

Recoverable Network Protocol

--------------------------------

Not Supported

StrMbx:  01 00 77ff 00000058 0000 00 00

CntMbx:  02 00 77ff 00000000 040d 00 00

AbtMbx:  02 00 77ff 00000000 030d 00 00

HostMessageReads  :                  22

HostBlockReads    :                  11

HostReadBytes     :                4501

DbsMessageReads   :                   8

HostMessageWrites :                  10

HostBlockWrites   :                  10

HostWriteBytes    :               22296

DbsMessageWrites  :                  10

 

4) Run lokdisp and collect blocking info

    Blocking info may be helpful even when session is NOT in BLOCKED status.

    Session status remains in PARSING when express request is blocked.

    (express request is for PE to retrieve data from AMP during optimizing process)

     Type following command in lokdisp:     

    > bl                            <<<< display blocking info

    > a                              <<<< request to all AMP

    > tr                            <<<< display transaction lock

    > (AMP No)          <<<< specify AMP # where blocking is found

 

     Please repeat  "tr" command on all AMPs where blocking is found.

Example:

# lokdisp

Amp Utility

_______

    |                                |              |

    |    ___     __     ____         |    ____    __|__    ____

    |   /      |/  \    ____|    ____|    ____|     |      ____|

    |   ---    |       /    |   /    |   /    |     |     /    |

    |   \___   |       \____|   \____|   \____|     |__   \____|

 

    Release 16.20.32.11 Version 16.20.32.11

    LOCK DISPLAY UTILITY (June 2000)

 

LOCK DISPLAY UTILITY Command String Syntax:

    Help or ?

    TRan      [ProcId Uniq1 Uniq2] | [ALL]

    Db        [DBname] | [ALL]

    TAble     [DBname.Tablename] | [ALL]

    ROWRange  [DBname.Tablename TypeAndIndex] | [ALL]

    ROWKey    [DBname.Tablename TypeAndIndex, Partition RowHash1 RowHash2] |

              [ALL]

    Partition [DBname.Tablename TypeAndIndex] | [ALL]

    Blockers  [TRAN [ProcId Uniq1 Uniq2] | [ALL]] | [LIMIT [Number] | [NONE]]

    Quit

-> Please enter your selection from the list:

 bl

- The following amps are available:

    0     1     2     3     4     5     6     7     8     9    10    11    12    13

   14    15    16    17    18    19

-> Which amp(s) do you want to request on (S=Sampling/A=all/C=cancel/Q=quit):

 a

*** No lock request was found.

-------------< END OF OUTPUT >-------------

LOCK DISPLAY UTILITY Command String Syntax:

    Help or ?

    TRan      [ProcId Uniq1 Uniq2] | [ALL]

    Db        [DBname] | [ALL]

    TAble     [DBname.Tablename] | [ALL]

    ROWRange  [DBname.Tablename TypeAndIndex] | [ALL]

    ROWKey    [DBname.Tablename TypeAndIndex, Partition RowHash1 RowHash2] |

              [ALL]

    Partition [DBname.Tablename TypeAndIndex] | [ALL]

    Blockers  [TRAN [ProcId Uniq1 Uniq2] | [ALL]] | [LIMIT [Number] | [NONE]]

    Quit

-> Please enter your selection from the list:

 tr

- The following amps are available:

    0     1     2     3     4     5     6     7     8     9    10    11    12    13

   14    15    16    17    18    19

-> Which amp(s) do you want to request on (S=Sampling/A=all/C=cancel/Q=quit):

 0

---------------- AMP 0 REPORTS 67 LOCK ENTRIES -------------

                   GRANTED LOCK REQUEST(S):

Tran: 30718 47F21C78

Hash Locks     :          0

Range Locks    :          0

Host:  1025 Session: 02011325 (33624869) Mode: Ac   User: STATSMAN

Database: PDCRDATA  Table: DBQLOGTBL_HST

Tran: 30718 47F43302

Hash Locks     :          0

Range Locks    :          0

Host:  2049 Session: 02012A68 (33630824) Mode: WR   User: IT_COMMON_TRF_00001

Database: ITSTGSOFTONET  Table: S1_SOFTONE_TRANSACTIONS_STOREID

Row Hash Lock                       Subtable ID:  1024

Row Hash1:  3097,32681

Host:  2049 Session: 02014FF0 (33640432) Mode: EX   User: IT_COMMON_TRF_00001

Database: ITSTGAXBOT  Table: WT_S0_AXBO_POS_LINEITEM

Tran: 30719 48E6A118

Hash Locks     :          0

Range Locks    :          0

Host:  2049 Session: 02014FFB (33640443) Mode: EX   User: IT_COMMON_TRF_00001

Database: ITSTGAXBOT  Table: WT_S1_AXBO_POS_RETAILPRICEMOD

Tran: 30719 48E6A34A

Hash Locks     :          0

Range Locks    :          0

Host:  2049 Session: 0201500F (33640463) Mode: EX   User: IT_COMMON_TRF_00001

Database: ITSTGAXBOT  Table: WT_S0_AXBO_POS_TOTAL

-------------< END OF OUTPUT >-------------

LOCK DISPLAY UTILITY Command String Syntax:

    Help or ?

    TRan      [ProcId Uniq1 Uniq2] | [ALL]

    Db        [DBname] | [ALL]

    TAble     [DBname.Tablename] | [ALL]

    ROWRange  [DBname.Tablename TypeAndIndex] | [ALL]

    ROWKey    [DBname.Tablename TypeAndIndex, Partition RowHash1 RowHash2] |

              [ALL]

    Partition [DBname.Tablename TypeAndIndex] | [ALL]

    Blockers  [TRAN [ProcId Uniq1 Uniq2] | [ALL]] | [LIMIT [Number] | [NONE]]

    Quit

 

5) (If session is in aborting state) See if rollback is running

    Aborted session will remain in aborting state until rollback is completed.

    Run this command in rcvmanager utility:

    > LIST ROLLBACK TABLES; 

Example:

# cnsterm 6

Attempting to connect to CNS...Completed.

Hello

> start rcvmanager

start rcvmanager

Started 'rcvmanager' in window 1

 at Thu Jan 16 08:57:30 2020

Input Supervisor Command:

> ^C

# cnsterm 1

Attempting to connect to CNS...Completed.

Hello

  (A Large Cylinder is 6 times the size of a small cylinder)

Ferret  ==>

 quit

Waiting for Ferret Slave Tasks to exit

Ferret Exited

 _______

    |                                |              |

    |    ___     __     ____         |    ____    __|__    ____

    |   /      |/  \    ____|    ____|    ____|     |      ____|

    |   ---    |       /    |   /    |   /    |     |     /    |

    |   \___   |       \____|   \____|   \____|     |__   \____|

 

    Release 16.20.32.11 Version 16.20.32.11

    RCVMANAGER Utility (Jan 96)

 Enter command, "QUIT;" or "HELP;" :>  list rollback tables;

 list rollback tables;

TABLES BEING ROLLED BACK AT 09:00:02 20/01/16

ONLINE USER ROLLBACK TABLE LIST

Host  Session   User ID     Performance Group               AMP W/Count

----  --------  ---------   ------------------------------  -----------

TJ Rows Left   TJ Rows Done   Time Est.

-------------  -------------  ---------

Table ID   Name

---------  ------------------------------------------------------------------

SYSTEM RECOVERY ROLLBACK TABLE LIST

Host  Session   TJ Row Count

----  --------  -------------

Table ID   Name

---------  ------------------------------------------------------------------

6) See if ABORT SESSION can abort hang session

   Please get customer approval before running ABORT SESSION.

   If ABORT SESSION works, you don't need to move on to next step.


Example:

#puma -TAXO | grep 33248872

    9606 30719   13     9833 0x16007100  0  0  0 msgnowk idle  disstart    168  2049  33248872    6           0-0               255 18:1   -      -

#puma -b -u 9833

Stack Trace: Thread 9833 (0x2669), Process 9606 (0x2586), disstart

00 0x00007fffeaa3b9c7 Frame<00007fffeaa3b9c7> Return<0000000000000000> libc.so.6!ioctl + 0x7

01 0x00007ffff05697b9 Frame<00007ffff05697b9> Return<0000000000000000> syscalldriver + 0xf9 syscall.c:217

02 0x00007ffff0581ca4 Frame<00007ffff0581ca4> Return<0000000000000000> msgrxseg + 0x54 msglib.c:642

03 0x00007ffff187af75 Frame<00007ffff187af75> Return<0000000000000000> disprreq + 0x11f3 disdrivr.c:3064

04 0x00007ffff187e633 Frame<00007ffff187e633> Return<0000000000000000> Process_Request + 0x49a disdrivr.c:5028

05 0x00007ffff187e71e Frame<00007ffff187e71e> Return<0000000000000000> sqldrivr_thread + 0xc1 disdrivr.c:5079

06 0x00007ffff055e121 Frame<00007ffff055e121> Return<0000000000000000> tsknewthread + 0x3f1 tskport.c:1130

07 0x00007fffef5a1806 Frame<00007fffef5a1806> Return<0000000000000000> libpthread.so.0!start_thread + 0xe6

08 0x00007fffeaa4369d Frame<00007fffeaa4369d> Return<0000000000000000> libc.so.6!clone + 0x6d

 

#cnsterm 6

Attempting to connect to CNS...Completed.

Hello

Input Supervisor Command:

> abort session 1:33248872 logoff list

abort session 1:33248872 logoff list

HostID:SessNo      Status/UserName

------------------ ------------------------

     1:   33248872  /IT_AXBO_STG_SOI004_00001

Input Supervisor Command:

> ^C

 

7) Run tpareset -d if it's unable to abort session.

    Please make sure to get customer approval before tpareset.

    Engage GSO/DBS and let them review system before tpareset so we can identify the node to be uploaded. 


8) Upload dump
    Don't upload full dump if system has more than 4 nodes. Please upload node(s) with hang session tasks only.

 

References

Teradata. (2019). How to collect necessary info for session stuck/hang issue? Retrieved from https://support.teradata.com/knowledge?id=kb_article_view&sys_kb_id=bbd554bb1b173f00682ca8233a4bcb5a 

Comments