

**HARRIS 2023** 

## Graph Neural Network for Circuit Netlist Analysis

LIN Tong

Senior Research Scientist School of Electrical and Electronic Engineering

January 2023



# Outline

- Netlist Analysis Tasks
  - 'Divide-and-conquer' approach consisting of netlist partition and identification
- Netlist Partition
  - The Problem
  - Graph Neural Network (GNN) for netlist partition
- Netlist Identification
  - The Problem
  - GNN for netlist identification
- Conclusions & Discussions

### **Netlist Analysis Tasks**

- Modern SoC netlists consist of many functional blocks and sub-circuits:
  - Difficult to analyse as a whole.
  - Not all functional blocks or sub-circuits are of interest.
- A 'divide-and-conquer' approach is usually adopted, which consists of:
  - Netlist Partition: to partition a large circuit netlist into smaller sub-circuits.
  - **Netlist Identification**: to identify the functionality of a sub-circuit.



[1] X. Hong, T. Lin, Y. Shi and B. H. Gwee, "GraphClusNet: A Hierarchical Graph Neural Network for Recovered Circuit Netlist Partitioning," in *IEEE Transactions on Artificial Intelligence*, 2022, doi: 10.1109/TAI.2022.3198930.

## **Netlist Partition: The Problem**

- To solve the 'Normalized-cut' (N-cut) graph partition/clustering problem:
  - Observation: sub-circuits have more connections within than in-between.
  - To 'cut' as little connections as possible yet to have meaningful size for each partition.

$$n\text{-}cut = \frac{1}{k} \sum_{i=1}^{k} \frac{link\left(V_i, V_i \setminus V\right)}{link(V_i, V)}$$

- Existing methods and issues:
  - N-cut problem is NP-hard and its solution is usually approximated.
  - Existing methods either do not optimize for n-cut directly such as spectral clustering or may stuck at local minima such as methods based on iterative search algorithms.
  - Further, existing methods only leverage on connectivity but not node features.



#### Graph Neural Network (GNN) for Netlist Partition

- Advantages of GNN:
  - GNN leverages on both connectivity and node features.
  - Can optimize for an objective function (e.g. N-cut) directly as a loss function (unsupervised setting).

#### Challenges of GNN:

- GNN is inherently **local** and deep architecture is difficult.
- Need to find a meaningful node feature for the intended task.

#### • We propose a novel GNN for netlist partition named 'GraphClusNet' <sup>[1]</sup>:

- A novel **hierarchical architecture** which finds clusters from local to global.
- An n-cut-based loss function to optimize for the objective function directly.
- A location-based node feature which suits the partition task and avoids local minima.



5

[1] X. Hong, T. Lin, Y. Shi and B. H. Gwee, "GraphClusNet: A Hierarchical Graph Neural Network for Recovered Circuit Netlist Partitioning," in *IEEE Transactions on Artificial Intelligence*, 2022, doi: 10.1109/TAI.2022.3198930.

#### **Proposed Architecture**

#### • Multi-stage hierarchical architecture:

- Intuition: sub-circuits group hierarchically into larger circuits.
- Optimize for 'n-cut' loss at each stage.
- Final stage can perform either **bipartition** or **multiway** partition.



#### Architecture of GraphClusNet

### **Proposed Loss Function**

- 'N-cut' based loss function:
  - The numerator computes the **intra-cluster** connections of each cluster.
  - The denominator computes the **total connections** of each cluster.
  - Effectively searches for clusters that have more connections within and less connections in-between.

$$\mathcal{L}_{ncut} = 1 - \frac{\text{Diag}(S^T A S)}{\text{Diag}(S^T D S)}$$

where *A* is the adjacency matrix, *D* is the degree matrix, and *S* is the cluster assignment matrix.

Allows direct optimization of the N-cut objective function.

### **Proposed Node Feature**

Location-based node feature:



Location-based Node Feature

- Intuition: logic gates from the same subcircuit tend to locate close to each other on the floorplan.
- Divide floorplan into squares of different sizes.
- Assign node feature to nodes based on their location number at each square size.
- Effectively provides a node feature where nodes close to each other have more similar entries.

### Partition Results: Bipartition on SoC Netlists

- Performed bipartition on real FPGA SoC circuit netlists:
  - To extract major functional block from a netlist.
  - Our proposed GraphClusNet achieved highest NMI and usually lowest n-cut among competing methods.
  - It avoided local minima and can obtain more meaningful partitions.

| FPGA Circuits      | Metrics | Ground Truth | SC [6]              | Louvain <sup>2</sup> [10] | Graclus [7]         | ARVGA [18]           | GraphClusNet-RI     | GraphClusNet        | GraphClusNet-LR     |
|--------------------|---------|--------------|---------------------|---------------------------|---------------------|----------------------|---------------------|---------------------|---------------------|
| 8051 SoC           | NMI     | 1            | 0.579±0.297         | 0.891±0.023               | $0.752 {\pm} 0.248$ | $0.823 {\pm} 0.018$  | $0.867 \pm 0.232$   | 0.967±0.030         | $0.965 \pm 0.032$   |
|                    | n-cut   | 1.060        | $2.664 {\pm} 1.574$ | $1.289 {\pm} 0.082$       | $1.289 {\pm} 0.079$ | $3.227 \pm 0.441$    | $1.037 \pm 0.041$   | $1.009 {\pm} 0.065$ | $1.026 \pm 0.084$   |
| ARM CORTEX SoC     | NMI     | 1            | $0.982 {\pm} 0.002$ | $0.946 {\pm} 0.004$       | $0.986 {\pm} 0.003$ | $0.858 {\pm} 0.0017$ | $0.963 \pm 0.038$   | $0.987 {\pm} 0.006$ | 0.990±0.000         |
|                    | n-cut   | 1.376        | $1.397 {\pm} 0.041$ | $2.511 \pm 1.771$         | $1.364 {\pm} 0.000$ | $3.170 {\pm} 0.422$  | $1.511 \pm 0.211$   | $1.362 \pm 0.000$   | $1.356 {\pm} 0.000$ |
| RISC-V-I SoC       | NMI     | 1            | $0.858 {\pm} 0.101$ | $0.838 {\pm} 0.018$       | $0.805 {\pm} 0.055$ | $0.581 {\pm} 0.055$  | $0.886 {\pm} 0.070$ | $0.928 {\pm} 0.009$ | $0.921 \pm 0.008$   |
|                    | n-cut   | 2.940        | $3.557 {\pm} 0.962$ | $5.851 \pm 3.312$         | $3.145{\pm}0.251$   | $9.132{\pm}1.032$    | $2.867 \pm 0.114$   | $2.794 {\pm} 0.046$ | $2.787 {\pm} 0.025$ |
| RISC-V-IMSU SoC    | NMI     | 1            | $0.850 {\pm} 0.016$ | $0.869 {\pm} 0.083$       | $0.798 {\pm} 0.076$ | $0.210 {\pm} 0.067$  | $0.847 \pm 0.034$   | $0.857 \pm 0.064$   | 0.896±0.075         |
|                    | n-cut   | 2.775        | $3.775{\pm}0.288$   | $11.55 \pm 14.76$         | $3.010 {\pm} 0.090$ | $27.34{\pm}13.16$    | $3.607 \pm 0.545$   | $3.629 \pm 0.284$   | $2.883 {\pm} 0.090$ |
| RISC-V-IMZICSR SoC | NMI     | 1            | $0.865 {\pm} 0.055$ | $0.886 {\pm} 0.005$       | $0.856 {\pm} 0.078$ | $0.349 {\pm} 0.122$  | $0.930 {\pm} 0.055$ | $0.986 {\pm} 0.005$ | 0.988±0.005         |
|                    | n-cut   | 2.254        | $2.871 {\pm} 0.539$ | $5.149 {\pm} 7.039$       | $2.603 {\pm} 0.246$ | $17.11 \pm 6.762$    | $2.539 \pm 1.089$   | $2.268 {\pm} 0.046$ | $2.257 {\pm} 0.043$ |
| openFPU            | NMI     | 1            | $0.792 {\pm} 0.005$ | $0.776 \pm 0.089$         | $0.812 \pm 0.136$   | $0.318 \pm 0.110$    | $0.782 \pm 0.162$   | $0.865 \pm 0.128$   | 0.874±0.123         |
|                    | n-cut   | 4.929        | $5.675 {\pm} 0.080$ | $6.180 \pm 0.661$         | $5.802 \pm 0.963$   | $67.12 \pm 33.09$    | $5.117 \pm 0.241$   | $5.305 \pm 0.879$   | $5.280 {\pm} 0.870$ |
| aoOCS <sup>3</sup> | NMI     | 1            | $0.542 {\pm} 0.066$ | $0.542 {\pm} 0.032$       | $0.777 \pm 0.096$   | $0.419 {\pm} 0.003$  | $0.638 {\pm} 0.082$ | $0.906 \pm 0.083$   | 0.906±0.083         |
|                    | n-cut   | 1.605        | 38.95±4.239         | 24.33±1.813               | $1.730{\pm}0.876$   | $107.5 \pm 0.666$    | $2.788 \pm 0.479$   | $1.756 {\pm} 0.785$ | $1.739 \pm 0.771$   |

## Partition Results: Multiway Partition

- Performed multiway partition on 8051 microcontroller core netlist:
  - To extract multiple functional blocks from a netlist.
  - Our proposed GraphClusNet achieved highest NMI and F1-score among competing methods.



#### 8051 Core Circuit

| SFR         1027         F1-score         0.8968±0.0263         0.7658±0.1074         0.8869±0.1013         0.7216±0.0110         0.9431±0.004           Memory Interface         494         F1-score         0.5805±0.0986         0.5489±0.1242         0.7125±0.1324         0.5767±0.0188         0.8060±0.066           Decoder         252         F1-score         0.6738±0.1434         0.6426±0.0810         0.8221±0.1580         0.5928±0.0070         0.9260±0.010 | Functional Blocks/IC | No. Nodes | Metrics  | SC [6]                | Louvain [10]          | Graclus [7]         | ARVGA               | GraphClusNet          |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|-----------|----------|-----------------------|-----------------------|---------------------|---------------------|-----------------------|
| Memory Interface         494         F1-score $0.5805\pm0.0986$ $0.5489\pm0.1242$ $0.7125\pm0.1324$ $0.5767\pm0.0188$ $0.8060\pm0.066$ Decoder         252         F1-score $0.6738\pm0.1434$ $0.6426\pm0.0810$ $0.8221\pm0.1580$ $0.5928\pm0.0070$ $0.9260\pm0.010$                                                                                                                                                                                                            | ALU                  | 456       | F1-score | $0.8896 {\pm} 0.0387$ | $0.9251 \pm 0.0294$   | $0.9270 \pm 0.0074$ | $0.6661 \pm 0.0109$ | 0.9339±0.0322         |
| Decoder         252         F1-score $0.6738 \pm 0.1434$ $0.6426 \pm 0.0810$ $0.8221 \pm 0.1580$ $0.5928 \pm 0.0070$ $0.9260 \pm 0.010$                                                                                                                                                                                                                                                                                                                                         | SFR                  | 1027      | F1-score | $0.8968 {\pm} 0.0263$ | $0.7658 {\pm} 0.1074$ | $0.8869 \pm 0.1013$ | $0.7216 \pm 0.0110$ | 0.9431±0.0048         |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | Memory Interface     | 494       | F1-score | $0.5805 {\pm} 0.0986$ | $0.5489 {\pm} 0.1242$ | $0.7125 \pm 0.1324$ | $0.5767 \pm 0.0188$ | $0.8060 {\pm} 0.0661$ |
| 8051 Core 2229 NMI 0.5966±0.0574 0.5621±0.0329 0.6574±0.0683 0.4742±0.0070 <b>0.7176±0.042</b>                                                                                                                                                                                                                                                                                                                                                                                  | Decoder              | 252       | F1-score | $0.6738 {\pm} 0.1434$ | $0.6426 \pm 0.0810$   | $0.8221 \pm 0.1580$ | $0.5928 \pm 0.0070$ | 0.9260±0.0103         |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 8051 Core            | 2229      | NMI      | $0.5966 \pm 0.0574$   | $0.5621 \pm 0.0329$   | 0.6574±0.0683       | $0.4742 \pm 0.0070$ | 0.7176±0.0429         |

### **Visualization of Partition Results**

- We visualized node embeddings after each stage of GNN:
  - Local clusters were merged into higher level clusters.
  - Cluster purity also improved at higher levels.



t-SNE Visualization of Node Embeddings after Each Stage of GNN (8051 SoC)

## Netlist Identification: The Problem

#### • To identify the functionality of a flattened netlist:

- Used to be done manually with expert knowledge.
- Observation: different circuit graphs have distinctive structures and gate compositions.
- Netlist identification problem may thus be formulated as a graph classification problem using machinelearning methods.



## **GNN for Netlist Identification**

#### • Train a GNN to classify unknown netlists into known classes:

- Input is a circuit graph with gate type as node feature and output is a class label indicating the type of circuit.
- Our GNN consists of two layers of Graph Convolutional Network (GCN).



#### Our Proposed GNN for Netlist Identification <sup>[2]</sup>

[2] X. Hong, T. Lin, Y. Shi and B. H. Gwee, "ASIC Circuit Netlist Recognition Using Graph Neural Network," in *Proc. 2021 IEEE International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA)*, 2021, pp. 1-5, doi: 10.1109/IPFA53173.2021.9617311.

#### Case Study: Adder Circuit Classification

#### Classify four types of adder circuits:

 Four adder structures: Ripple Carry Adder (RCA), Carry Look-Ahead Adder (CLA), Carry Select Adder (CSLA), and Carry Skip Adder (CSKA).



12-bit Ripple Carry Adder (RCA)



12-bit Carry Look-Ahead Adder (CLA)

#### Case Study: Adder Circuit Classification



12-bit Carry Select Adder (CSLA)



12-bit Carry Skip Adder (CSKA)

#### **Data Preparation**

- Synthesized circuit netlists of varying bit-widths for training and testing data:
  - Synthesized 4 types of adder circuits from 5-bit to 64-bit resulting a total of 240 netlists.
  - Used 40 netlists for training GNN and remaining 200 netlists for testing.
  - Used one-hot encoded gate type as node features.



Netlist of A 4-bit RCA



**Circuit Graph of 4-bit RCA** 

Different node colours represent different gate types

## **Graph Visualization**



#### **Graph Visualization of Adder Circuit Netlists**

### **Netlist Classification Results**

- Our GNN achieved high classification accuracy on unseen test data:
  - GNN achieved classification accuracy of **99%** on unseen test data.
  - Graph embeddings of different class netlists grow separated after each layer of GNN demonstrating its discriminating power.



#### **Conclusions & Discussions**

- GNN has demonstrated some unique advantages over conventional machine-learning methods for netlist analysis including its ability to process graph connectivity together with node features.
- GNN can **automate** certain analysis tasks such as netlist identification.
- A major limitation of GNN, i.e. its inherent local nature and difficulty with deep architecture can be alleviated by introducing **hierarchical** architecture and clustering objective.
- While a shallow GNN seems effective at identifying simple circuits such as adders, a deeper GNN (with hierarchical architecture) may eventually be needed to reason at higher structural levels for more complex circuit such as a microcontroller.