

Songklanakarin J. Sci. Technol. 45 (2), 182–188, Mar. – Apr. 2023



**Original Article** 

# A power-efficient pipeline based clock gating FIFO for a dual ported memory array

S. Dhanasekar<sup>1\*</sup>, V. Govindaraj<sup>2</sup>, P. Malin Bruntha<sup>3</sup>, and L. Jubair Ahmed<sup>4</sup>

<sup>1</sup> Department of Electronics and Communication Engineering, Sri Eshwar College of Engineering, Coimbatore, Tamil Nadu, 641202 India

<sup>2</sup> Department of Electronics and Communication Engineering, Dr.NGP Institute of Technology, Coimbatore, Tamil Nadu, 641048 India

<sup>3</sup> Department of Electronics and Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, 641114 India

<sup>4</sup> Department of Electronics and Communication Engineering, Akshaya College of Engineering and Technology, Coimbatore, Tamil Nadu, 642109 India

Received: 8 November 2021; Revised: 27 August 2022; Accepted: 31 January 2023

### Abstract

A FIFO is a special type of buffer that controls the data flow between the sender and receiver. It is used to monitor the serial data flow and to avoid mismatch conditions. In general, the dual-ported memory cell array suffers from dynamic power dissipation. In this research article, a 128 x 128-bit Synchronous First-In-First-Out (FIFO) buffer is designed for dual-ported memory cell array with pipeline architecture using clock gating technique, which reduces power dissipation significantly. The FIFO-based dual-ported memory cell array will store a large amount of data and minimize the clock skew. A circular FIFO used in dual-ported memory is organized in a circular queue fashion with two pointers for write and read. Conventional FIFO designs use more power and hardware area on the silicon chip. The FIFO-based pipelining and clock gating approach will improve throughput while reducing dynamic power. The proposed FIFO design is simulated and implemented using the Cadence-Encounter tool using 180nm and 45nm Technology. The parameters power consumption, cell utilization, and clock frequency have been analyzed. The synchronous FIFO design reduces the area by 70.3%, power dissipation by 10.6%, and operates at clock frequency up to 322 MHz.

Keywords: synchronous FIFO, pipeline, clock gating, read, write

# 1. Introduction

Today, electronic devices and their components are consistently shrunk in size due to the customer demands for making nanodevices with high performance and reduced cost. Hence various researchers are pursuing the shortening of transistor size to make miniature IC (Nagendran & Subramaniyam, 2020; Subramaniyam & Mani 2022).

Digitization will be overwhelming in all disciplines of engineering, which creates more impact on wideband wireless communication for high-speed data transmission (Subramaniyam, 2018). In wideband wireless applications, OFDM transceiver IC supports high-speed data transmission wireless communication (Subramaniyam, Paul & in Kulandaivel. 2019). FIFO is a data buffer, similar to that of a queue with a First-come First-served (FCFS) response. It is arranged by a series of flip-flops or read/write memory cells which can store the data and it is transferred from one clock domain to another clock domain based on a request.

<sup>\*</sup>Corresponding author

Email address: dhanasekar.sm@gmail.com

Generally, a clock domain that supplies the data to FIFO is known as a write or input signal, similarly the clock domain that reads data from FIFO is known as read or output signal (Yantchev, Huang & Josephs, 1995). (ShilpiMaurya, 2016) proposes a design of RTL Synthesizable 32-Bit FIFO memory with less storage capacity. A new reconfigurable 64 bit FIFO memory circuit for synchronous and asynchronous communication is proposed by (Hafeez & Ross, 2021) and operates at the clock frequency of 1 GHz consuming 2 mW of power. An Optimal FIFO design methodology based on input and output data transfer rates was designed by (Rafi & Venkateshwara Rao, 2014). FIFO has less data transmission. In CMOS 180 nm technology, a novel effective asynchronous FIFO has been synthesized, with a maximum throughput rate of 10.88 Gbps at 340 MHz operating frequency (Nguyen & Tran, 2014). A 28 nm ultra-Low Power First-In-First-Out (FIFO) memory was designed for the Multi-Bio-Signal Sensing platform, which achieved reduced area and power (Hsu, Huang & Wu, 2018) Several FIFO memory designs are used in various applications, achieving low power use (Chang, 2008, 2009; Chiu, 2010; Du, 2011).

In high-performance digital systems, low power design is a major concern. Designers are always expecting optimized power in the circuits to attain their power constraints. Clock gating is an essential power reduction method preferred by many designers, and it is consistently used in gate-level power synthesis tools. Clock-Gating is among the most extensively utilized VLSI power optimization techniques (Attaoui, 2021). The challenge of the clock gating technique is when and where to insert the clock gating in the digital circuits or VLSI circuits, to optimize the power. Clock gating is an effective technique that reduces the switching activity significantly (Srinivasana, 2015). The switching capacitance reduction in the clock network and switching activity occurring at the time of inactive states of the clock reduces the power. There are various clock gating techniques that optimize power (Prakash, 2013; Weng & Weng, 2012). The clock gating is achieved using three alternative designs namely 1) Flip flop based design, 2) Gate based design, or 3) Latch based design. (Sharma & Rana, 2013) proposed gatebased clock gating design, which is more desirable than flip flop-based design or latch-based design when power is considered as a major constraint.

Several FIFO memory designs have been reviewed for attaining optimized power in digital circuits. The existing FIFO designs consume more power and occupy more hardware area in the silicon chip. Hence, there is a necessity to implement a power-optimized FIFO design. A low power clock gating FIFO for dual-port memory has been proposed.

In this article, a novel 128 x 128-bit synchronous FIFO is designed and implemented in a dual-ported memory cell array by using pipelining and clock gating techniques. In the low power design, dynamic power is the major source of power consumption and is due to the switching activity. The pipelining and clock gating technique based on FIFO will increase the throughput and reduce the dynamic power. In the clock gating technique AND gate-based clock, gating is used to reduce the glitching activity of the device (Jaiwal, Paul & Mahto, 2014). (Datta, 2021) introduces a FIFO synchronizer with a time-domain crossover interface providing unidirectional and bidirectional data transfer with minimal clock frequency degradation. In this proposed work, the

pipelining technique is used to reduce the cell count, which in turn enhances the speed of operation (Jeon & Seok, 2012). FPGA technology provides a reliable and flexible platform to test digital electronics circuits (Subramaniyam & Jayabalan, 2015, 2018).

This paper methodized the synchronous FIFO in Dual ported memory. The proposed pipeline-based clock gating synchronous FIFO is portrayed. In the analysis of the results, the 128-bit synchronous FIFO is compared with the existing FIFO design, attaining significant improvements in area and power.

### 2. Materials and Methods

## 2.1 Synchronous FIFO in dual ported memory

In this article, a circular FIFO is designed with memory components that are arranged in a circular queue fashion with two pointers, namely for write and read (Zhao, 2011). The read and write pointer indicate the endpoint and start point of the circle. During read or write operation, these pointers will be incremented one after another. This buffer consists of two flags namely FULL and EMPTY. These flags are used to support whether the FIFO is EMPTY (cannot be read from) or FULL (cannot be written to). The Full and Empty flags generated in a circular FIFO buffer are shown in Figure 1 (Apperson, 2007). By using a circular FIFO, we can store a larger amount of data compared to a linear FIFO. The write and read operations in the circular FIFO will be performed in the same as well as at different locations. This circular FIFO is mainly used to represent the full and empty condition of the circuit (Yang & Kim, 2009). In the conventional method, a large amount of data is stored in the dual-ported memory cell array using a circular buffer instead of a register file.

The dual-port RAM component is used to read and write the data at the same time. This special kind of RAM consists of two unidirectional data ports, namely an input port is used for writing data and an output port is used for reading data. The input and output ports will have their unique address and data buses (Kline & Xu, 2018). The read port and write port has two signals namely READ and WRITE. The READ signal is used to enable the data output and the WRITE signal will allow the writing of the data (Das & Basu, 2022). To write the data into a FIFO buffer, then first we have to enable the write to enable line, and then the data will be inserted through the write data line. After that, the inserted data will be written to the dual-port memory array and the written data address will be stored in the write pointer. Similarly, if you want to read the data then enable the reader to enable line, after that the read pointer goes to the dual-port memory array (Hafeez & Otoom, 2022). The read pointer will read the address of the data, and then the read data will be displayed in the reading data line (Dong, 2012). This is the basic operation of the synchronous FIFO with a dual-port memory array structure.

The dual-ported memory cell array architecture is described in this article. As the name dual port memory specifies, one port is used for reading operation and another port is used for a write operation, as shown in Figure 2. In synchronous FIFO both read and write operations perform only on the clock pulse by the rising edge. Initially, the buffer will be in an empty state, because no data is written into the



Figure 1. Implementation of FIFO using circular buffer



Figure 2. Block diagram for Synchronous FIFO with dual ported memory cell array

FIFO buffer so the memory indicates an empty flag (Wei & Jin 2022). To write the data into the FIFO, switch on the write E-enable pin then the write operation will be performed in the FIFO buffer during the rising edge of the clock. For the read operation switch on the read E-enable pin, then the read operation will be performed on the output side of the FIFO buffer (Zhao, 2013; Chelcea & Nowick, 2000; Sharma, 2012). Two pointers are used to store the address of the input and output data. In the FIFO buffer, the compare logic is used to compare both read and write pointers. Whenever the read pointer attains the write pointer then the FIFO buffer indicates memory is FULL, and in reverse when write pointer attains the read pointer then the FIFO buffer indicates memory is EMPTY (Huang & Chang, 2007; Ross, 2019).

In the existing Dual ported memory cell array the speed of reading and write operation was improved and accuracy of output was maintained but the power consumption increased. Hence there is a requirement for designing a low-power dual-ported memory cell array (Wimer, 2012; Kaushik & Gulhane, 2013; Duzer, 1995). The proposed pipeline-based clock gating FIFO used in the Dual ported memory cell array reduces the power consumption.

# 3. Proposed Pipeline Based Clock Gating in Synchronous FIFO

Pipelining technique is used to improve the performance of computation circuits by which the execution time will be increased. In this proposed synchronous FIFO, flip-flops are used as a basic component in pipelining, and data flow is synchronized during computation. The components used in the pipelining approach are selected for low latency, high throughput, and low power consumption. Pipelining is mainly used to predict an occurrence at a single clock cycle before it will produce the result. During prediction, a certain range of output values are set at the clock cycles and these new values are stored using flip-flops (usually D-type flip-flop is used). They appear at the output on the next clock cycle when the occurrence actually occurs. The latch-free clock gating method uses a simple AND & XOR gate shown in Figure 3 and the simulated waveform for clock gating D flip flop is shown in Figure 4.

The XOR gate will activate or set when the D and Q values are different. The clock signal & XOR gate flow as input to AND gate. The gated clock (gclk) obtained by the AND gate output will flow as the input to the D flip flop. The output of the XOR gate is enabled at two different inputs and the clock cycle is also enabled at the rising pulse then only a gated clock (gclk) will be enabled. The gate clock has a lesser number of cycles than the original clock. The proposed clock gating technique reduces the dynamic power and increases the throughput in Synchronous FIFO.



Figure 4. Simulation waveforms for clock gating D flip flop

#### 4. Results and Discussion

The proposed synchronous FIFO was implemented by the Cadence Encounter tool using 45nm & 180nmtechnology. The Functional simulations of  $128 \times 128$ -bit synchronous FIFO without and with the pipeline are shown in Figure 5 and Figure 6 respectively. The output waveform of



Figure 5. Simulation of 128 x 128 FIFO buffer in empty state without pipeline



Figure 6. Simulation of 128 x 128 FIFO buffer in empty state with pipeline

128 x 128 bit synchronous FIFO shows both full and empty conditions based on the circular FIFO buffer. Once the buffer is FULL then no data will be written into the FIFO buffer. In a normal case if any of the data can be read or written after the EMPTY or FULL indication in the FIFO buffer, then the buffer indicates FIFO is in overflow or underflow condition. Hence, this problem can be avoided by using circular FIFO with the help of a dual-ported memory cell array.

In the waveform counter, the pin denotes the number of data that is used to write in the FIFO buffer and the temp data pin denotes the number of data that will be popped (after read operation) from the synchronous FIFO buffer. From the simulation results, it is inferred that 128 x 128 synchronous FIFO without pipeline produces 2ns delay in the empty state whereas synchronous FIFO with pipeline has no delay. The proposed 128 x 128 synchronous FIFO is implemented using 45 nm technology occupying 14,069 in total of logic cells, which attains reduced area and delay as compared to a 180 nm technology synchronous FIFO. The RTL Diagram of Synchronous FIFO using 45 nm Technology is shown in Figure 7.

Table 1 exhibits the performance comparison of various FIFOs. The implementation of area, delay, and power parameters for conventional and proposed synchronous FIFO is shown in Figure 8. The proposed 128-bit synchronous FIFO implemented using 45 nm technology uses 25.2 % less power and 59.1 % less area as compared to the existing FIFO (Ross, 2019). The clock gating-based 128-bit FIFO obtain a 77.6% area reduction, 35.7% power consumption decrease, and 75.7% less gate delay over the FIFO (Hsu, Huang & Wu, 2018). The proposed FIFO implemented using 180 nm technology occupies 70.3% less area, consumes 10.6% less power, and reduces the gate delay by 38% when compared to the existing FIFO (Nguyen & Tran, 2014). Table 2 shows Hardware utilization of 128 x 128 bit Synchronous FIFO. The 128-bit Synchronous FIFO and clock gate-based Synchronous FIFO are also implemented using the FPGA chip QUARTUS II device, which has 4,608 memory bits of system gate capacity. As compared to the normal Synchronous FIFO, clock gating-based Synchronous FIFO occupies 11.1% fewer memory bits.



Figure 7. RTL diagram using 45nm technology



Figure 8. Implementation of area, delay and power parameters for conventional and proposed synchronous FIFO

| Table 2. | Hardware utilization of | 128 x 128 bit s | ynchronous FIFO |
|----------|-------------------------|-----------------|-----------------|
|----------|-------------------------|-----------------|-----------------|

| Number of logic<br>requirement                | 128 bit synchronous<br>FIFO buffer | Proposed 128 bit clocked<br>gating synchronous<br>FIFO buffer |
|-----------------------------------------------|------------------------------------|---------------------------------------------------------------|
| Number of total<br>combinational<br>functions | 192                                | 158                                                           |
| Total number of registers                     | 275                                | 275                                                           |
| Total number<br>of pins                       | 270                                | 270                                                           |
| Total number of memory bits                   | 4608                               | 4096                                                          |

Table 1. Performance comparison of various FIFOs

## 4.1 Physical level implementation of 128 x 128 synchronous FIFO

The Metal-fill insertion is a manufacturability step at the advanced nodes. Metal fills are dummy fills of metal pieces to avoid minimum density problems as shown in Figure 9. If density on-chip is less than specified value, this can cause problems during chemical-mechanical planarization, which in turn affects the planarity of subsequent layers. This can cause a dishing effect. As these are just dummy metal pieces, they have no impact on timing and cross-talk.

### **5.** Conclusions

In this paper, the pipelining-based clock gating technique is introduced to obtain areal efficiency and low power in the synchronous FIFO memory design normally used for multiple read and write operations in a single clock domain. This work has discussed the relevance of FIFO in synchronization between input and output data. The synchronous FIFO is mainly used to avoid overflow and underflow conditions. It is done by using a full and empty flag in the buffer circuit. The proposed synchronous FIFO design implemented using the Cadence Encounter tool by 180nm and 45nm technology with the supply voltages of 1.8 V and 1 V respectively, has reduced power dissipation by 10.6%, and area by 70.3%. As compared to existing FIFO buffers, the proposed 128 bit synchronous FIFO has attained less power and area, and is operated at a maximum clock frequency of up to 322 MHz.



Figure 9. Metal fill of 128 x 128 bit synchronous FIFO buffer

| Parameters                 | Proposed FIFO     | Proposed FIFO    | Ross (2019)    | Hafeez & Ross<br>(2021) | Hsu, Huang & Wu<br>(2018) | Nguyen & Tran<br>(2014) |
|----------------------------|-------------------|------------------|----------------|-------------------------|---------------------------|-------------------------|
| Technology<br>System clock | 180 nm<br>322 MHz | 45 nm<br>324 MHz | 65 nm<br>1 GHz | 65 nm<br>1.15 GHz       | 28 nm<br>10 MHz           | 180 nm<br>340 MHz       |
| Cell utilization           | 14336             | 14069            | 34470          | *                       | 63083                     | 48406                   |
| Power (mW)                 | 12.53             | 5.83             | 7.8            | 2                       | 9.07                      | 14.025                  |
| Gate delay (ns)            | 2                 | 1.027            | 1.035          | *                       | 4.23                      | 3.23                    |

\* denotes not reported

\* All entries are estimated values based on QUARTUS II synthesis tool for cyclone II Family and device name is EP2C70F89618.

# References

- Apperson, R. (2007). A scalable dual-clock FIFO for data transfers between arbitrary and haltable clock domains. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 15, 1125 – 1134.
- Attaoui, Y., Chentouf, M. & Ismaili, Z, (2021). Clock gating efficiency and impact on power optimization during synthesis flow. *International Conference on Microelectronics (ICM)*. doi:10.1109/ICM52667. 2021.9664896.
- Chang, I. J. (2009). A 32 kb 10T sub-threshold SRAM array with bit- interleaving and differential read scheme in 90 nm CMOS. *IEEE Journal of Solid-State Circuits*, 650-658.
- Chang, M. T. (2008). A robust ultra-low power asynchronous FIFO memory with self-adaptive power control. *IEEE System-on-Chip Conference*, 175-178.
- Chiu, Y.T. (2010). Subthreshold asynchronous FIFO memory for wireless body area networks (WBANs). International Symposium on Medical Information and Communication Technology (ISMICT).
- Chelcea, T., & Nowick, S.M. (2000). A low-latency FIFO for mixed-clock systems. *Proceeding IEEE Computer* Society Workshop on VLSI, 119–126.
- Das, S., & Basu, U. (2022). FPGA Implementation of asynchronous FIFO. Proceedings of International Conference on Industrial Instrumentation and Control, 399–407. doi:10.1007/978-981-16-7011-4\_39
- Datta, G. & Lin, S. (2021). High performance and robust FIFO synchronizer-interface for crossing clock domains in SFQ logic. *IEEE Transactions on Circuits and Systems*-II. doi:10.48550/arXiv.2108. 03719
- Du, W. H. (2011). An energy-efficient 10T SRAM-based FIFO memory operating in near-/sub-threshold regions. *IEEE System-on-Chip Conference*, 19-23.
- Dong, X. (2012). NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. *IEEE Transactions on Computer -Aided Design of Integrated Circuits and Systems*, 31, 994– 1007.
- Duzer, T. (1995). Hybrid Josephson-CMOS FIFO. *IEEE Transactions on Applied Superconductivity*, 5, 2648-2651.
- Hafeez, S., & Otoom, S. (2022). Design of memory Alias Table based on the SRAM 8T-Cell. International Journal of Circuit Theory and Applications. doi: 10.1002/cta.3284
- Hafeez, S., & Ross, A. (2021). Reconfigurable FIFO memory circuit for synchronous and asynchronous communication. *International Journal of Circuit Theory Applications*, 49, 938-952.
- Huang, P. K., & Chang, C. (2007). Recursive constructions of parallel FIFO and LIFO queues with switched delay lines. *IEEE Transactions on Information Theory*, 53, 1778-1798.
- Hsu, W., Huang, P., & Wu, S. (2016). 28nm ultra-low power near-/sub-threshold first-in-first-out (FIFO) Memory for multi-bio-signal sensing platforms. *International Symposium on VLSI Design, Automation and Test*

(VLSI-DAT). doi:10.1109/VLSI-DAT.2016.7482551

- Jaiswal, R., Paul, R., & Mahto, V. (2014). Power reduction in CMOS technology by using tri-state buffer and clock gating. *International Journal of Advanced Research in Computer Engineering and Technology* (IJARCET), 3, 1853-1860.
- Jeon, D., & Seok, M. (2012). A super-pipelined energy efficient subthreshold 240 MS/s FFT core in 65 nm CMOS. *IEEE Journal of Solid-State Circuits*, 47, 23-34.
- Kaushik, P., & Gulhane, S. (2013). Dynamic power reduction of digital circuits by clock gating. *International Journal of Advancements in Technology*, 4, 79-88
- Kline, D., & Xu, H. (2018). Racetrack queues for extremely low-energy FIFOs. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 26, 1531-1544.
- Nagendran, A., & Subramaniyam, D. (2020). An ultra-lowpower static random-access memory cell using tunneling field effect transistor. *International journal of Engineering Transactions B: Applications*, 33, 2215-2221.
- Nguyen, T., & Tran, XT. (2014). A novel asynchronous firstin-first-out adapting to multi synchronous networkon-chips. *International Conference on Advanced Technologies for Communications*. doi:10.1109/ ATC.2014.7043413
- Prakash, N. (2013). Clock gating for dynamic power reduction in synchronous circuits. *International Journal of Engineering Trends and Technology*, 4(5).
- Rafi, S., & Venkateshwara Rao, K. (2014). FIFO design methodology based on input and output data transfer rate. *Third National Conference on Latest Trends in Signal Processing, VLSI and Embedded Systems*, 133-135.
- Ross, A. (2019). A one-cycle FIFO buffer for memory management units in manycore systems. *IEEE Computer Society Annual Symposium on VLSI* (*ISVLSI*). doi:10.1109/ISVLSI.2019.00056
- Sharma, D. (2012). Effects of different clock gating techniques on design. *International Journal of Scientific and Engineering Research, 3.*
- Sharma, H., & Rana, C. (2013). Designing of 8-bit synchronous FIFO memory using register file. *International Journal of Computer Applications*, 63, 23-26.
- ShilpiMaurya, (2016). Design of RTL synthesizable 32-bit FIFO memory. *International Journal of Engineering Research and Technology*, 5, 591-593.
- Srinivasana, N. (2015). Power reduction by clock gating technique. Procedia Technology, 21, 631-635
- Subramaniyam, D., & Jayabalan, R. (2018). VLSI implementation of variable bit rate OFDM transceiver system with multi-radix FFT/IFFT processor for wireless applications. *Journal of Electrical Engineering, 18*, Article 18.1.22.
- Subramaniyam, D., & Jayabalan, R. (2015). FPGA implementation of variable bit rate 16 QAM transceiver system. *International Journal of Applied Engineering Research*, 10, 26497-26507.
- Subramaniyam, D. (2018). A fast and compact multiplier for digital signal processors in sensor driven smart

S. Dhanasekar et al. / Songklanakarin J. Sci. Technol. 45 (2), 182-188, 2023

vehicles. International Journal of Mechanical Engineering and Technology, 9, 157–167.

- Subramaniyam, D., Paul, M., & Kulandaivel, M. (2019). An improved area efficient 16-QAM transceiver design using vedic multiplier for wireless applications. *International Journal of Recent Technology and Engineering*, 8, 4419-4425.
- Subramaniyam, D., Mani J. (2022). Study of polymer matrix composites for electronics applications. *Journal of Nanomaterials*, 2022, Article ID 8605099, 1-7. Retreived from https://doi.org/10.1155/2022/8605 099
- Wei, H., & Jin, X. (2022). A circuit model for working memory based on hybrid positive and negativederivative feedback mechanism. *Brain Sciences*, 12(5), 547. doi:10.3390/brainsci12050547
- Weng, S., & Weng, H. (2012). Timing optimization in sequential circuit by exploiting clock-gating logic. ACM Transactions on Design Automation of Electronics Systems, 17(2), 1-15.

- Wimer, S. (2012). The optimal fan-out of clock network for power minimization by adaptive gating. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1772-1780.
- Yang, H., & Kim, S. (2009). Low power input / output port design using clock gating technique. Recent Advances in Networking, VLSI and Signal Processing, 63-66.
- Yantchev, T., Huang, C. G., & Josephs, M. B. (1995). Low latency asynchronous FIFO buffers. Proceedings Second Working Conference on Asynchronous Design Methodologies, 24–31.
- Zhao, W. (2011). Domain wall shift register-based reconfigurable logic. *IEEE Transactions on Magnetics*, 47(10), 2966–2969.
- Zhao, W. (2013) Racetrack memory based reconfigurable computing. *IEEE Faible Tension Faible Consommation*, 1–4. doi:10.1109/FTFC.2013.65777 71.

188