discharge process of precharge-line PL1. Although there is a large shift in time between the two waveforms (PL1 and PL2), the gate output rises dramatically due to the steepest discharge process in precharge-line PL1.

![Waveform Diagram](image)

- **Fig. 2**: Simulated waveforms of precharge lines and gate outputs for NMOS = 20µm and driver = 30µm with £ PDP = 5
  - **a**: Simulated waveforms of precharge lines
  - **b**: Latency comparison of gate outputs

<table>
<thead>
<tr>
<th>Driver size</th>
<th>10 µm</th>
<th>20 µm</th>
<th>30 µm</th>
</tr>
</thead>
<tbody>
<tr>
<td>NMOS</td>
<td>SD</td>
<td>ID</td>
<td>SUP</td>
</tr>
<tr>
<td>40 µm</td>
<td>0.45</td>
<td>0.62</td>
<td>38 %</td>
</tr>
<tr>
<td>80 µm</td>
<td>0.48</td>
<td>0.71</td>
<td>47 %</td>
</tr>
</tbody>
</table>

Further simulations of the steepest-descent domino logic and improved domino logic [3] circuits were carried out with other possible transistor size combinations. Table 1 shows a latency comparison between the steepest-descent domino logic circuit and the improved domino logic circuit [3]. In this Table, SD and ID represent the steepest-descent domino and improved domino logic circuits, respectively. SUP is a figure of merit defined by the latency of the improved domino logic circuit over the latency of the steepest-descent domino logic circuit. As this Table shows, the steepest-descent domino logic technique outperforms the improved domino logic technique [3]. Note that the latency of the steepest-descent domino logic circuit with driver = 40µm and NMOS = 10µm shows the same performance as that of the improved domino logic circuit with driver = 40µm and NMOS = 30µm. Hence, in the steepest-descent domino logic technique either high-speed operation or low-area characteristics can be obtained.

**Conclusion**: A new structure for high-speed single-end domino logic has been proposed and its underlying processes, transition-forwarding and the steepest-descent technique, have been described. A timing comparison was made between two different types of domino logic circuit. In the domino logic circuit based on transition-forwarding and the steepest-descent technique, the latency is minimised without leading to an area penalty.

© IEE 1999

Electronics Letters Online No: 1999/1024
DOI: 10.1049/el:19991024

Jae Hun Choi and Guanghui Hu (Austin Design Center, Equator Technologies, 6850 Austin Center Blvd., Suite 520, Austin, TX 78731, USA)
E-mail: jchoi@equator.com

---

**Non-volatile programmable pulse computation cell**


Analog VLSI circuit elements for producing and processing pulses with programmable time delays are described. The elements can be interconnected to implement pulse computations or spiking neurons. As an example, the basic operation of a scale-invariant template matching system using coincidence detection is demonstrated.

**Introduction**: In this Letter, a compact pulse computation cell with programmable pulse time delays over six orders of magnitude is presented. The pulse time delays are set using a programmable current source, where the current is controlled by a floating gate device. By encoding the input signals using pulse time delays and applying coincidence detection, scale invariant pattern matching can be achieved using this cell.

(i) **Programmable current source**: Pulse time delays are controlled using a programmable current source that was implemented using a non-volatile floating-gate device [1] (Fig. 1). The floating gate forms one end of capacitor C1, which is connected to the feedback amplifier formed by transistors M1 and M2. Charge pushed onto or off the floating-gate capacitor C1 is seen as a voltage change at node $V_{out}$, determining the current $I_{delay}$ through transistor M3. Electrons are added to the floating gate using hot-electron injection via transistor M0, which has a bipolar transistor base implant (pbase) applied to its channel to enhance its injection rate. Hot-electron injection occurs when the drain of transistor M0 is set at 6V by the AND gate. Electrons are removed from the floating gate by gate-oxide tunnelling via node $V_{tun}$ (typically set at 25V for tunnelling).

![Diagram](image)

**Fig. 1**: Schematic diagram of programmable current source

(ii) **Programmable delay pulse generator**: The programmable current source was used to implement a pulse generator with programmable time delays. The schematic diagram for the programmable pulse generator (PPG) circuit is shown in Fig. 2. It produces a single delayed output pulse as a voltage $V_{out}$ or a summable current $I_{out}$ in response to a low level on the input signal $V_{in}$. The pulse time delay is controlled by the current $I_{delay}$ and the pulsewidth is adjustable via the bias voltage $V_{on/off}$.

In the untriggered state, the signal $V_{in}$ is high and the capacitor C1 is shorted to ground through transistor M1. The two inverters form a voltage comparator with the output being low if the volt-
age across capacitor C1 is below ~2.5V. Thus with signal \( V_{\text{in}} \) high, the gate of M4 will be low, causing capacitor C2 to be pulled high. The output of the AND gate, \( V_{\text{out}} \), will then be low because the top input is high and the bottom input is high. Under this condition, transistor M6 will be off and the current \( I_{\text{out}} \) will be zero.

![Schematic diagram of programmable pulse generator](image)

**Fig. 2** Schematic diagram of programmable pulse generator

When the signal \( V_{\text{in}} \) goes low, capacitor C1 will be charged by the programmable on-chip current \( I_{\text{delay}} \). The voltage across capacitor C1 will rise at a linear rate given by \( \frac{dV}{dt} = \frac{I_{\text{delay}}}{C_1} \). When the voltage reaches the transition point of the comparator, the gate of transistor M4, as well as both inputs to the AND gate, will be high. This causes the output \( V_{\text{out}} \) to go high and transistor M6 to be switched on, allowing the current \( I_{\text{delay}} \) to flow at a rate set by the bias voltage \( V_{\text{bias}} \). In this manner \( I_{\text{delay}} \) controls the time delay between the falling edge of the input and the onset of the output pulse.

When both inputs to the AND gate are high, transistor M4 is switched off and capacitor C2 is discharged through transistor M3 at a rate set by the bias voltage \( V_{\text{bias}} \), thus controlling the pulselength. As the voltage on capacitor C2 becomes less than the transition voltage of the AND gate, \( V_{\text{in}} \) goes low, turning off both transistor M6 and the output current \( I_{\text{out}} \).

**Results:** A test chip with the PPG circuit was designed and fabricated using the Orbit CMOS N-well double-metal, double-poly (with NPN transistor option) process. The NPN transistor option is required for the floating-gate devices. The size of the PPG cell (including the programmable current source) is 2\( \mu \)m scalable CMOS technology is 140 \( \times \) 114\( \mu \)m.

(i) **Pulse delay accuracy:** The precision of the pulse delay timing for a PPG cell was measured over several orders of magnitude by applying a square wave input to the cell. The pulse delay was adjusted from 1\( \mu \)s to 1\( \mu \)s (in powers of 10) by hot-electron injection and tunnelling. Over this range, which spans six orders of magnitude, the statistical functions of a LeCroy 9314 digital oscilloscope were used to automatically measure the time from the falling edge of the input signal to the start of the pulse output. The maximum standard deviation over this range was < 1\% since the timing of the PPG cell relies primarily on the integration of a current, the effects of uncorrelated noise are minimised.

![Circuit diagram for scale-invariant pattern matching](image)

**Fig. 3** Circuit diagram for scale-invariant pattern matching

(ii) **Scale-invariant pattern recognition:** Scale-invariant pattern recognition is possible using pulse computations [2]. Such a system is shown in Fig. 3. \( V_{\text{exp}} \) is a periodic, falling-exponential signal shown as the top oscilloscope trace in Fig. 4. The input signal was encoded as the voltage values \( V_1 \) and \( V_2 \), i.e. a two-dimensional analogue vector \( (V_1, V_2) \). The voltage comparators compare \( V_1 \) and \( V_2 \) with the falling exponential. The comparators output a low signal when \( V_{\text{exp}} \) is less than the respective signal voltages, triggering the PPG cells which then, after an appropriate time delay, produce current pulses. For \( V_1 \) and \( V_2 \) at arbitrary voltage levels, the resulting output seen with an oscilloscope is two non-overlapping pulses (second trace of Fig. 4).

![Oscilloscope trace demonstrating scale-invariant pattern matching](image)

**Fig. 4** Oscilloscope trace demonstrating scale-invariant pattern matching

‘Pattern recognition’ is achieved when time delays are set such that the pulses overlap. The third oscilloscope trace shows the case where \( V_1 \) and \( V_2 \) were set to a particular pattern (in this example, \( V_1 = 2V \) and \( V_2 = 4V \)) with the pulses suitably delayed so that they coincide, producing a current pulse twice the size of the original pulses. The primary advantage of this encoding is that it achieves scale-invariant pattern recognition in a simple manner. Let the background oscillation be defined by \( V_{\text{bg}} = Ae^{-kt} \), where \( A \) and \( \tau \) are the amplitude and time constant of the background exponential oscillation. The scaling of all the inputs by a constant \( k \) adds a constant time shift, \( \Delta t = \tau \log(k) \), to the original time delay, \( t_i = -\tau \log(V_i/A) \), of the \( i \)th input, \( V_i \). Thus the signals remain coincident. The fourth oscilloscope trace shows the result of halving the amplitudes of both inputs, \( V_1 \) and \( V_2 \). Both output pulses shift by the same amount so that coincidence still occurs. This coincidence can be detected using a voltage or current comparator set to an appropriate level. The precision of the coincidence required for a match is adjustable via the pulselength of the PPGs.

**Conclusion:** A programmable pulse generator circuit was implemented in which the pulse delay is programmable via a floating-gate device. Measurements demonstrated the feasibility of using this circuit to implement a scale-invariant pattern recognition system.

© IEE 1999

Electronics Letters Online No: 19990098

DOI: 10.1049/el:19990098

C.T. Jin (Computer Engineering Laboratory and The Auditory Neuroscience Laboratory, Blg 303, Department of Electrical Engineering, The University of Sydney, Sydney, Australia)

E-mail: craig@sedal.usyd.edu.au

P.L. Rolandi (SGS Thomson Microelectronics, via C. Olivetti, 2, 20041, Agrate Brianza 20041 (MI), Italy)

P.H.W. Leong (Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin NT, Hong Kong)

References
