

Songklanakarin J. Sci. Technol. 45 (4), 464–469, Jul. – Aug. 2023



**Original** Article

# Low power design of 16-bit synchronous counter by introducing effective clock monitoring circuits

Vivek Kumar Singh\*, Abhishek Nag, Apangshu Das, and Sambhu Nath Pradhan

Department of Electronics and Communication Engineering, National Institute of Technology, Agartala, Tripura, 799046 India

Received: 11 June 2022; Revised: 30 May 2023; Accepted: 6 June 2023

## Abstract

Most of the system-level designs contain sequential circuits. Power optimization of these circuits at many levels is required to build a portable device with a long battery life. A dynamic clock gating technique was used in this work to reduce the power and temperature of a 16-bit counter. The simulation was performed on cadence SCL 180 nm technology, for a supply voltage of 1.8 V at a frequency of 500 MHz. With the proposed approach, a 77.16% power reduction was achieved at the cost of 14.83% in area overhead. Moreover, the layout of the circuits was also designed in the Innovus tool to obtain a more accurate silicon area and gate count. The Innovus output files ".flp file" and ".pptrace file" were used as inputs to the HotSpot tool for determining the absolute temperature of the integrated circuits (ICs). The obtained temperature results were compared with the ordinary 16-bit counter, and it was found that the proposed approach was able to reduce temperature by 14.34%.

Keywords: clock gating, sequential circuit, hotspot, low power, and chip temperature

# 1. Introduction

The majority of system-level designs are made up of sequential circuits, and the design of these circuits is crucial in lowering the system's total power consumption. Counters are fundamental building blocks in many VLSI applications, including timers, memory, ADCs and DACs, frequency dividers, and other electronic gadgets. Nowadays, most devices are portable, so the battery life can be increased by reducing the power consumption of various ICs installed inside an electronic gadget. Moreover, today's processor performance depends somewhat on the performance of counters (Rodrigues, Annamalai, Koren, & Kundu, 2013). Present CPUs include a slew of these counters to keep track of different CPU and memory metrics, including usage, occupancy, bandwidth, and page, cache, and branch buffer hit rates. A binary counter with N bits may be considered a 2<sup>N</sup>state state machine (Katreepalli & Haniotakis, 2019). The Nth bit in the binary counter has a switching activity of  $1/2^N$ , which implies that the least significant bit has the highest

\*Corresponding author

Email address: erviveksingh77@gmail.com

switching activity of 1, and the switching activity of the remaining counter bits decreases by half as their importance increases (Kim *et al.*, 2009). Because new data for all counter bits is captured by a clock-based storage element flip-flop (FFs) at every triggering edge of the clock, in the conventional counter, an unnecessary clock is present regardless of the change in the value stored in storing elements, as shown in Figure 1 for the 4-bit counter. As a result, unnecessary clock is used in storage elements whose values are the same as the previous values, particularly for counter bits of higher importance.

To avoid these superfluous transitions, all unrequired clock cycles in the most significant bit (MSB) FFs are cut based on detecting the actual transition required in the



Figure 1. Conventional 4-bit counter design with reset pin

FFs for triggering data transitions. Only those storage elements whose outputs are updated should be activated using a set of specific local control signals that are generated by clock monitoring circuits. Moreover, temperature analysis is also very important in nanoscale IC design. High-temperature hotspot generation in the ICs drastically decreases the lifetime of the ICs (Koren & Krishna, 2011). According to the survey done by the authors (Koren & Krishna, 2011), the life span of the VLSI circuit is reduced by 50% for a 10°C increase in temperature. High clock speeds necessitate high powerconsumption, and high packing densities necessitate high power density (power per unit area). The huge increase in on-chip temperature directly affects increased power density (Cheng & Kang, 2000). Heating due to power dissipation is also a part of a sneaky positive feedback loop. Dynamic power dominates the overall power dissipation in ICs (Kim et al., 2009; Peddersen & Parameswaran, 2008). For the reduction of total power, various methods have been proposed. The most widely used method for reducing power is supply voltage reduction (Aliato, 2012; Hwang & Lin, 2012; Kawaguchi & Yachi, 2015; Lin, Hu, & Chen, 2011). Reduced supply voltage increases the circuit's delay (Moss, Boland, & Leong, 2018) and overall performance (Park, Kim, & Baek, 2010). It is observed that clock signals in counter applications consume about 40% of total power (Katreepalli & Haniotakis, 2019; Sharma & Kumre, 2021). Stopping the clock in sequential circuits reduces the total power at various levels of the circuit's architecture (Bhattacharjee & Majumder, 2019; Majumder, Bhattacharjee, & Das, 2018; Mohamed, Jaison, & Anto, 2021; Singh, Nag, & Pradhan, 2022) without degrading the overall performance.

This work introduces a dynamic clock gating technique to extract the unrequired clock signals from the flip flops (FFs) of a 16-bit counter to reduce dynamic power dissipation during operation. The proposed technique uses a clock-enabled circuit and a dynamic clock gating circuit. The clock-enabled sub-circuit is continually monitored for the required clock in the MSB's transition. The clock is then cut or passed to the flip flop for MSB bit transition, depending on the enabled output. Moreover, temperature dependency on power and area are also calculated and simulated in the Hotspot tool (Huang *et al.*, 2006) for the proposed approach.

This paper is organized as follows: Section 2 discusses the related work. Section 3 discusses the proposed dynamic clock gating approach. The results and discussion are in Section 4, and the final conclusion is in Section 5.

#### 2. Related Work

Various studies have been done on the counter circuits in the recent past to reduce power during operation. For example, the authors of (Alioto, 2012; Hawang & Lin, 2011; Kawaguchi & Yachi, 2015; Lin, Hu, & Chen, 2011) reduced the dynamic power by supply voltage scaling, but the reduction in the supply voltage will affect the overall performance of the circuits.

The studies (Katreepalli & Haniotakis, 2019; Kim *et al.*, 2009; Mohamed, Jaison, & Anto, 2021) used clock gating to reduce the dynamic power at different technology nodes. The study (Katreepalli & Haniotakis, 2019) used a simple gate and control transistor in a 45 nm technology node to reduce power by 47.74% with a small area overhead of 5.76% for an

8-bit up-down counter. The study (Kim *et al.*, 2009) proposed CMOS clock gating in a synchronous counter 16-bit counter at a 180 nm technology node to reduce the transition states of the MSB's flip-flop. The proposed work reported a 64% power reduction with 15% fewer devices. Another similar work has been reported using a 16 nm technology node in (Mohamed, Jaison, & Anto, 2021). They used programmable logic controllers with integrated clock gating to reduce the transition states of MSBs. Table 7 of (Mohamed, Jaison, & Anto, 2021) shows the conventional 16-bit counter having a power consumption of 0.04535 mW, and by his proposed method of low power design, the authors reduced the power to 0.02611 mW. It is evident from the reported data that the study reduced the total power dissipation by 42.43% from that of an ordinary 16-bit counter.

Moreover, temperature issues are also a very important concerns with the growing packaging density of VLSI circuits (Cheng & Kang, 2000; Pedram & Nazarian, 2006). The temperature is directly related to a circuit's power density. Temperatures can be limited by setting power density as a logic constraint. The study (Koren & Krishna, 2011) conducted a survey and produced an idea about the effects of temperature on the lifespan of ICs. The studies (Choudhary, Manna, Rai, & Pradhan, 2019; Das & Pradhan, 2020; Das, Hareesh, & Pradhan, 2020; Das, Singh & Pradhan, 2022; Sarkar, Paul, & Pradhan, 2017) calculated the chip temperature based on the total power and chip area as in Equation 1.

$$Temp_{chip} = Temp_{amb} + R_{th} [\frac{Total_{power}}{Total_{area}}]$$
(1)

Here '*Temp<sub>chip</sub>*' represents the average chip temperature, '*Temp<sub>amb</sub>*' the ambient temperature (typical CPU ambient temperature is *Temp<sub>amb</sub>*=45°C), 'R<sub>th</sub>' the thermal resistance of silicon substrate and heat sink (constant for a specific element), '*Total<sub>power</sub>*' the total power consumption, and '*Total<sub>area</sub>*' is the chip area.

The suggested method is validated using calculations of absolute temperature, silicon chip area, and overall power usage. Cadence's Innovus tool (SCL 180 nm technology lib file) is used to report overall power consumption and silicon chip area. In addition, the studies (Das & Pradhan, 2020; Das, Hareesh, & Pradhan, 2020) obtained the temperature profiles for various benchmark circuits using the Hotspot tool. In this current work, the HotSpot tool was used to get the temperature profile of the 16-bit counter circuit.

## 3. Proposed Technique for Dynamic Power and *Temp*<sub>chip</sub> Reduction

The proposed work is based on dynamic clock gating in a 16-bit counter by continuously monitoring the required bit transitions. Figure 2 shows the complete set-up of the proposed clock gating to reduce the dynamic power of the counter. The dynamic clock gating is achieved by using two separate circuits called clock enable circuit (CLK\_En) and integrated clock gating circuit (ICG). The ICG uses enabled outputs 'En2 –En15' for clock gating flip flop numbers "FF\_2" to "FF\_15". The first enable output "En2" is generated by the AND Gate, the input of the AND gate is



Figure 2. The proposed clock gating technique in 16-bit counter

taken from the output state of the "FF\_0" and "FF\_1". The other "En3" to "En15" are generated by previous enable output (En2-En14) as one of the input and state the of previous FFs (Q2-Q14). The first two LSB states of the counter (Q<sub>0</sub>-Q<sub>1</sub>) have high transition frequencies, so clock gating for these two is not beneficial. The implementation was by using cadence Genus tool, and is presented in Figure 6 (b) of the manuscript.

Case1: when LSB ( $Q_0$ ) or next to the LSB ( $Q_1$ ) is 0 or 1 then, start searching for the clock enable output 'En2' state reset "RST = 0".

 $En2=Q_0.Q_1=0$ ; Cut the clock for all preceding flip flops from "FF\_2" to "FF\_15", where En2 is the clock enable output for the  $Q_2$  flip-flop.

When  $En2=Q_0.Q_1 = 1$ , enable the clock for "FF\_2" and allow the state transition and simultaneously search for the next enable output 'En3' to be high. However, clock for "FF\_3" to "FF\_15" is still disabled.

Case 2: When En3 is high (that means  $Q_0$ ,  $Q_1$  is high ) and  $Q_2$  is also high, then clock enabled circuit has high 'En3' output state and allows the state transition of the "FF\_3" flip flop. During that instant of time clocks for "FF\_4" to "FF\_15" remain disabled. These phenomena can be observed in Figure 6 (b). The Boolean expression

"En3= En2.Q<sub>2</sub>" is used for generating clock enable output for  $Q_3$  flip flop.

Similarly, for all higher-order flip flops (Q4-Q15) transitions is achieved. The generalized equation for clock enable circuits is as follows:

$$E_n = \sum_{n=2}^{15} E_{n-1} \cdot Q_{n-1} \tag{2}$$

The working of the proposed method in better understood by considering the timing analysis of the first four LSBs (Q0-Q3) flip flop states and clock enabling circuit output. Figure 3 shows the timing diagram of the clock enable output and counter's flip flop (FF) state during operation. The 16-bit counter consists of 16 FFs. Here, in the proposed technique, Q<sub>0</sub> and Q<sub>1</sub> are connected with the continuous clock supply as their state changes continuously. Based on the states of Q<sub>0</sub> and Q<sub>1</sub>, the first clock enable output 'En2' is generated. The generated 'En2' is used as input to the ICG. When both Q<sub>0</sub> and Q<sub>1</sub> are high then 'En2' is high and allows the clock and hence state transition of the FF\_2. Similarly, when 'En2' and Q<sub>2</sub> are high the next clock enable output 'En3' is high and allows the transition of the "FF 3" (Q<sub>3</sub>) flip flop. Similarly, the proposed technique is working for the 16-bit counter for the reduction of unnecessary transitions and hence of dynamic power consumption.



Figure 3. Timing diagram for proposed method of clock gating for  $(Q_0-Q_3)$ 

Figure 4 shows the overall flow chart of the proposed work: Xilinx Vivado tool was used to design register register-transfer level (RTL) using Verilog coding of 16-bit counter with/without clock gating; and test case has been generated using Verilog coding for verification of the design. For generating the netlist and to get the overall power, area, and cell count, synthesis has been done using Cadence Genus tool. The layout design was performed using cadence Innovus tool. The Innovus report produces floor-plan information (.flp file) and power profile information (.pptrace file), used as inputs to the HotSpot tool for determining absolute temperature of the silicon chip.

#### 4. The Simulation, Results and Discussion

The 16-bit ordinary counter and clock-gated synchronous counter were coded in Verilog HDL, Xilinx Vivado 2017.4. The designed counter was then simulated at a 500 MHz clock frequency at the behavioral level to verify the designed counter. The obtained output waveforms for both the ordinary and clock gated counters with the same clock and reset conditions were equal. Figure 5 depicts the output waveform of the behavioral level simulation. Here, "t" is the enabled input of the counter, 'clk' is the clock, "rst" is reset, and 'Q' is the state of the counter.

After the verification at the behavioral level, this design was simulated in Cadence Genus for design rule checking. It was found that the Max transition design rule, the Max capacitance design rule, and the Max fan-out design rule had no violations. During simulation, SCL 180 nm technology libraries have been used for schematic design in Genus. The obtained schematics for both the ordinary and gated counters are given in Figure 6(a) and 6(b), respectively.

After completing the Genus simulation, the layout was also designed using the Innovus tool of the Cadence suit, for conventional and clock gated 16-bit counters. Innovus tools use synthesized netlists, which are generated during Genus simulation.

The Innovus report produces floor-plan information (.flp file) and power profile information (.pptrace files), which are used as inputs to the HotSpot tool (Huang, 2005) for determining absolute chip temperature. Table 1 shows all the essential parameters configured in the HotSpot tool for generating the temperature profile of ICs. The gate count file is also generated by the Innovus tool for the proposed circuit,

466



Figure 4. Complete flow chart of the proposed work



Figure 5. Behavioral level simulated output for 16-bit counter in Xilinx Vivado



Figure 6. (a) Genus schematic design of conventional 16-bit counter, and (b) its equivalent with clock gating

which has data like gate area of 9.4080  $\mu m2$ , gates=224, Cells=93, and total area= 2113.7  $\mu m2.$ 

Table 1. Thermal package information for HotSpot tool

| Parameter                            | Value     |
|--------------------------------------|-----------|
| Chip thickness                       | 0.15mm    |
| heat sink side                       | 60mm      |
| convection capacitance               | 140.4 J/K |
| Ambient temperature                  | 45 °C     |
| convection resistance                | 0.1 K/W   |
| spreader thickness                   | 1mm       |
| spreader side                        | 30mm      |
| chip to spreader interface thickness | 0.020mm   |
| heat sink thickness                  | 6.9mm     |

# 4.1 Analyses of power, area, and gate count

The power, area, and number of cell counts have been noted for ordinary and clock-gated counters and are

468

given in Table 2, which shows the results from simulation in Cadence Genus using the SCL 180 nm technology library. The cell counts of both the conventional and gated 16-bit counters were 79 and 93, respectively, as per the log file generated after simulation in the Genus tool. The maximum power reduction obtained by the proposed method was about 77.16% compared to a conventional 16-bit counter, at the cost of 14.84% of overhead area. The overhead area is due to the clock enable circuits and dynamic clock gating circuits. Moreover, Tempchip was also calculated using the HotSpot tool, and the values are given in the last column of Table 2. Using the proposed technique, Tempchip was reduced by 14.34% compared to the ordinary counter design. Table 1 also shows a comparison of results obtained by the proposed method and the existing design. The negative area shows an increment in the area due to the clock monitoring and gating circuits.

The existing design 16-bit counter was done by the author's (Kim *et al.*, 2009) on a 180nm technology node, but the used supply power was 2.5V, much higher than in the proposed work, namely 1.8 V. Therefore, the power dissipation of (Kim *et al.*, 2009) is 1.7 mW, much higher than the 9291.383 nW of the proposed design. Hence, in this manuscript, only parentage improvement was assessed. The calculated data leads to the conclusion that about 13.16% total power improvement is possible from the prior work (Kim *et al.*, 2009).

#### **5.** Conclusions

In this work, a low-power design of a 16-bit synchronous counter was done in the Xilinx Vivado 2017.4 EDA tool. The unnecessary transition of MSBs of the counter were identified by the clock enable circuit and stopped by using dynamic clock gating circuits. The behavioral level simulation first verified the design in Xilinx Vivado. Then, the ordinary and clocked gated 16-bit counters were simulated in the Cadence Genus EDA tool using the SCL 180nm technology library to estimate power, area, and cell count. The result showed that the proposed method could reduce total power consumption by 77.16% from that of a conventional 16-bit counter. The circuit layout was also designed using the Innovus, and the report produced floor-plan information (.flp file) and power profile information (.pptrace file), used as inputs to the HotSpot tool for determining absolute temperature. The result was a 14.34% lower chip absolute temperature than in the ordinary 16-bit counter design. Hence, the proposed design is competitive when temperature, power consumption, and heating of the device are the main concerns.

## Acknowledgements

This work was endorsed by the project SMDP-C2SD funded by Meity, Government of India.

### References

- Alioto, M. (2012). Ultra-low power VLSI circuit design demystified and explained: A tutorial. IEEE Transactions on Circuits and Systems I: Regular Papers, 59(1), 3-29.
- Bhattacharjee, P., & Majumder, A. (2019). A variation-aware robust gated flip-flop for power-constrained FSM application. *Journal of Circuits, Systems and Computers, 28*(07), 1950108.
- Cheng, Y. K., & Kang, S. M. (2000). A temperature-aware simulation environment for reliable ULSI chip design. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 19(10), 1211-1220.
- Choudhury, P., Manna, K., Rai, V., & Pradhan, S. N. (2019). Thermal-aware partitioning and encoding of powergated FSM. *Journal of Circuits, Systems and Computers, 28*(09), 1950144.
- Das, A., Kumar Singh, V., & Nath Pradhan, S. (2022). Shared reduced ordered binary decision diagram-based thermal-aware network synthesis. *International Journal of Circuit Theory and Applications*, 50(6), 2271-2286.
- Das, A., & Pradhan, S. N. (2020). An elitist non-dominated multi-objective genetic algorithm based temperature aware circuit synthesis. *International Journal of Interactive Multimedia and Artificial Intelligence*, 6(4).
- Das, A., Hareesh, Y. C., & Pradhan, S. N. (2020). NSGA-II based thermal-aware mixed polarity dual reedmuller network synthesis using parallel tabular technique. *Journal of Circuits, Systems and Computers, 29*(15), 2020008.
- Huang, W., Ghosh, S., Velusamy, S., Sankaranarayanan, K., Skadron, K., & Stan, M. R. (2006). HotSpot: A compact thermal modeling methodology for earlystage VLSI design. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 14(5), 501-513.
- Hwang, Y. T., & Lin, J. F. (2011). Low voltage and low power divide-by-2/3 counter design using pass transistor logic circuit technique. *IEEE Transactions* on Very Large Scale Integration (VLSI) Systems, 20(9), 1738-1742.

Table 2. Simulated results and their comparison with prior work for total power, area, and cell count of a 16 bit counter

| Circuit                                         | Number of gates used | Power obtained by genus (mW) | Area after layout $(\mu m^2)$ | Temperature<br>(°C) |
|-------------------------------------------------|----------------------|------------------------------|-------------------------------|---------------------|
| Conversional counter design by Kim et al., 2009 | 607                  | 4.7                          | Not reported                  | Not reported        |
| Low power counter design by Kim et al., 2009    | 517                  | 1.7                          | Not reported                  | Not reported        |
| Improvement                                     | ~15%                 | ~64%                         | NA                            | NĀ                  |
| Conventional counter design in this manuscript  | 195                  | 0.040673727                  | 1840.80                       | 76.16               |
| Proposed counter                                | 224                  | 0.009291383                  | 2113.70                       | 65.24               |
| Improvement                                     | -14.87 %             | 77.16 %                      | -14.83 %                      | 14.34 %             |

- Katreepalli, R., & Haniotakis, T. (2019). Power efficient synchronous counter design. Computers and Electrical Engineering, 75, 288-300.
- Kawaguchi, S., & Yachi, T. (2015). Adaptive power efficiency control by computer power consumption prediction using performance counters. *IEEE Transactions on Industry Applications*, 52(1), 407-413.
- Kim, Y. W., Kim, J. S., Oh, J. H., Park, Y. S., Kim, J. W., Park, K. I., . . Jun, Y. H. (2009). Low-power CMOS synchronous counter with clock gating embedded into carry propagation. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 56(8), 649-653.
- Koren, I., & Krishna, C. M. (2011). Temperature-aware computing. Sustainable Computing: Informatics and Systems, 1(1), 46-56.
- Lin, J., Hu, J., & Chen, Q. (2011). Low voltage adiabatic flipflops based on powergating CPAL circuits. *Procedia Engineering*, 15, 3144-3148.
- Majumder, A., Bhattacharjee, P., & Das, T. D. (2018). A novel gating approach to alleviate power and ground noise in silicon chips. *Journal of Circuits, Systems* and Computers, 27(09), 1850146.
- Mohamed Sulaiman, S., Jaison, B., & Anto Bennet, M. (2021). Design of low power 16-bit counter with Programmable Combinational Logic and Integrated Clock Gating using 16-nm technology. *International Journal of Electronics*, 108(2), 163-179.
- Moss, D. J., Boland, D., & Leong, P. H. (2018). A two-speed, radix-4, serial-parallel multiplier. *IEEE*

Transactions on Very Large Scale Integration (VLSI) Systems, 27(4), 769-777.

- Park, J., Kim, S., & Baek, K. H. (2010). A low-power MDDIclient architecture using on-off byte counter. *IEEE Transactions on Consumer Electronics*, 56(3), 1283-1287.
- Peddersen, J., & Parameswaran, S. (2008). Low-impact processor for dynamic runtime power management. *IEEE Design and Test of Computers*, 25(1), 52-62.
- Pedram, M., & Nazarian, S. (2006). Thermal modeling, analysis, and management in VLSI circuits: Principles and methods. *Proceedings of the IEEE*, 94(8), 1487-1501.
- Rodrigues, R., Annamalai, A., Koren, I., & Kundu, S. (2013). A study on the use of performance counters to estimate power in microprocessors. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 60(12), 882-886.
- Sarkar, T., Chakraborty, S., Paul, B., & Pradhan, S. N. (2017). Thermal aware SOC testing by introducing cooling period. *IETE Technical Review*, 34(2), 113-121.
- Sharma, T., & Kumre, L. (2021). Design of unbalanced ternary counters using shifting literals based D-Flip-Flops in carbon nanotube technology. *Computers* and Electrical Engineering, 93, 107249.
- Singh, V. K., Nag, A., & Pradhan, S. N. (2022). Design and analysis of a low power strategy in finite state machines implemented in configurable logic blocks. *International Journal of Embedded Systems*, 15(4), 326-332.