# Voltage-Follower Coupling Quadrature Oscillator with Embedded Phase-Interpolator in 16nm FinFET

Xi Chen<sup>1</sup>, Sanquan Song<sup>1</sup>, John Poulton<sup>2</sup>, Nikola Nedovic<sup>1</sup>, Brian Zimmer<sup>1</sup>, Stephen Tell<sup>2</sup>, and C. Thomas Gray<sup>2</sup> <sup>1</sup>NVIDIA Corporation, Santa Clara, CA, 95051, <sup>2</sup>NVIDIA Corporation, Durham, NC, 27713 Email: xich@nvidia.com

Abstract—High-speed serial links require a very high frequency clock source. Multi-rings oscillator is a practical solution to this challenge. A new phase-interpolator embedded quadrature oscillator was designed and tested. Voltage-follower based cross-coupling loops create reliable and tunable phase relationship among OSC rings. The measurement results show that the proposed PI-OSC provides 1.25/0.97 LSB INL/DNL performance at 24GHz while consuming only 8.1mW power. This compact oscillator is suitable for clock generation in high-speed low-power links.

Keywords—Quadrature Oscillator; Phase Interpolator; High-Speed Serial Link; Cross-Coupling

# I. INTRODUCTION

Advances in wireline communications requires ever increasing data rate. With multi-level signaling or NRZ, the baud rate alone is approaching 50G range. In SerDes systems, reliable clock sources providing either half-rate differential or quarter-rate quadrature clocks are usually required. For lowpower designs, clock forwarding is a promising technique [1-3], which avoids clock recovery at RX end and relaxes the jitter budget in clock source. However, forwarded clock normally requires both I and Q phases from the source, or doubles the frequency target of oscillator (OSC) if full-rate PLL is used [3]. Compared to extremely fast OSC, quadrature oscillator is a more practical and energy efficient choice.

One popular way to generate a quadrature clock is using the Injection-Locked OSC. In [4], a four-stage multi-path Ring-OSC was used as the quadrature clock generator, for frequency up to 16GHz. Multi-path OSCs rely on feed-forward inverters' strength to oscillate at high-frequency but have the multi-mode (low-speed) risk [6]. Extra calibration loops are also required to extend the frequency range and constrain phase errors. Cascading tetrahedral oscillators is another way to convert differential clocks to quadrature [5]. The gradually phase pulling method provides better frequency stability. However, it comes with power and area penalties from more OSC stages. Also, the phase errors are still sensitive to frequency change.

In a forwarded clock link, Phase-Interpolators (PIs) are important for both performance characterization and dynamic mismatch compensation between data and clock lanes. However, PIs are quite expensive in terms of power and area, when good resolution is required. These costs increase for multiphase clocking, driven by the need for matched delays across all phases.



Fig.1. (a) Conventional Ring-OSC, (b) Voltage-Follower coupling differential OSC, and (c) VF coupling Quadrature OSC

This paper presents a high-speed multi-ring quadrature oscillator design with the PI function embedded. Voltage-Follower (VF) coupling devices are used to constrain the phase relationship and improve the speed of OSC rings. The phase coupling loops running in an orthogonal topology enable multiphase clock generation with very low power overhead. A PI realized by OSC coupling strength tuning provides better linearity, at significant lower area/power costs, compared to conventional designs. Overall, the proposed OSC design is a good candidate for a clock generator for high-speed forwarded clock links, especially with low power budgets. The very compact size also makes it suitable for other applications.

### II. VOLTAGE-FOLLOWER COUPLING

Conventional Ring-OSCs normally use inverter based coupling, as shown in Fig.1(a), which keeps two rings in opposite phases. Inverter coupling is only perfect when its insertion delay is zero. However, in real cases, the unavoidable delay will cause driving strength fighting between the main loops and coupling elements, which reduces Ring-OSC speed and wastes power.

A faster and more energy-efficient way to cross-couple the differential OSCs is illustrated as Fig.1(b). This feed-forward topology requires the coupling elements to be non-inverted, and the phase step across each stage is 60 degrees. A circuit structure called Voltage-Follower (VF) is proposed for this requirement. The VF circuit has two complementary source-follower devices stacked together, and it provides unity voltage gain over most of the supply range. Due to the threshold limitation, the effective working range of VF is smaller than the full supply. This is not



Fig.2. Comparison among different types of OSC cross-couplings (post-sim)

a problem because the inverters in the main loops pull the voltage to rails. Also, because the VF circuit is naturally faster than inverter (there is no Miller capacitance in the VF circuit), it increases the Ring-OSC frequency instead of reducing it. The feed-forward coupling could work even if passive components (resistors or capacitors) are used. However, VFs take output current (mostly) from the rails instead of the input nodes, therefore injecting energy to the main loops and boosting OSC frequency. Fig.2 shows the speed and current comparison among various types of cross-couplings, where VF1/2 means two-fin device and VF1/INV1 have four fins. Data were obtained from post-layout simulations. Compared to INV coupling, minimum sized VF (1-finger 2-fin device) could contribute 10% speed improvement at about the same power.

### III. MULTI-RING QUADRATURE OSCILLATOR

Instead of challenging the OSC rings' speed limit, a more realistic way to reach higher equivalent frequency is correlating the phases among multiple rings. As first shown in [7], multiring OSC designs avoids the problem of distributing highquality ultra-fast clock signals. The method we propose is to group the cross-coupling elements as assistant loops in an orthogonal topology and lock the phase relations among singleloop OSCs. The coupling loops need neither to be selfoscillating, nor to guarantee the full voltage swing, but only to constrain variations. A multi-ring OSC design is a good choice because it can operate at the natural frequency of a single ring and at about the same energy efficiency.

An example of a quadrature OSC with VF-coupling is shown in Fig.1(c). There are 24 minimum sized VFs used in this topology as cross-coupling devices. For easier reading, we could split them into two groups, which are 1) Feed-forward devices (all VFs heading down) and 2) Feedback (all heading right). The feed-forward (FF) VFs perform similar function as used in differential-OSCs, except that there are three orthogonal coupling loops, and the phase step across each FF VF device is 90 degrees. All these FF VFs together lock the phase relationships among four single-rings in quadrature while helping to increase the oscillating frequency. The feedback (FB) VFs added here are not always necessary, but they work with the FF VFs as distributive interpolators, which constrain the possible phase disturbance and avoid the FF coupling loops locking at half-frequency. This multi-ring OSC can provide the quadrature clock signals at the highest-possible frequency defined by a single three-stage ring. Adding a MUX stage, we can effectively generate twelve evenly spaced phases out of the



Fig.3. Phase-Interpolator (PI) OSC (a) topology and (b) phase tuning strategies

quadrature-OSC, and this is very useful for any further phaseinterpolation. Although inverter-coupling may also work in this quadrature topology (with opposite phases in loops B and D), VF-coupling has obvious speed and power advantages. Moreover, the distributive coupling structure provides considerable design flexibility. An embedded PI function can be built into the OSC with small overhead.

#### IV. PHASE-INTERPOLATOR OSCILLATOR

In multi-rings OSC (e.g. quadrature OSC), it is possible to manipulate the strength of the couplings and hence the phase relationships among main loops. This effectively integrates the PIs into the OSC, with very low speed and power overhead. The PI-OSC is much cheaper than conventional PIs for high-speed link, because 1) there is no need to replicate circuits only for matching among phases, 2) fan-out (power) requirement is alleviated since the PI is not in the distribution path.

The idea proposed in this work is to build the fine-tune portion of phase-interpolation based on tunable VF devices, and use the quadrature-OSC as two pairs of differential OSCs. Therefore, the PI-OSC could directly generate clocks for data and clock lanes in a forwarded clock link. The circuit topology and phase tuning strategies of PI-OSC are illustrated as Fig.3. As in the diagram, we can split the feed-forward VF controls into two complementary groups, marked as "Pha A" (red) and "Pha B" (blue). By sweeping the strength ratio between two groups of VFs, we can move the phases of all nodes in Rings B & D up or down referred to nodes in Rings A & C, ideally over a +/- 30-degree range. The tuning steps we put in this regional phase sweep can be defined as "Fine tune". Since we divided twelve internal nodes into two groups (shown as Green and Purple lines in the polar diagram), the "Coarse tune" function can be implemented as two 6-to-2 MUXes following the OSC. If more phases are needed (for example, for 4-to-1 TX), dividers can be used to generate quadrature clocks for data and clock lanes separately. Creating Fine-tune before the Coarse-tune stages is a major difference compared to conventional PI designs. This significantly saves the effort of duplicating phasemixers for each phase of the clock. Integrating the phase interpolator into the OSC solves the conflict of clock fan-out and PI resolution. The smaller and distributive PI structure saves area, power, and reduces variations statistically.



Fig.4. Application of PI-OSC in high-speed forwarded clock serial link

To further extend the phase tuning resolution, we can use the "Fractional tune" control. In integer steps, all VFs in the same phase control portion (same color in Fig.3) are set at an identical setting. However, it is also convenient to assign the control values for each VF column separately. If we choose to set one of the three VF columns (e.g. Col.1) with +/- 1LSB offset compared to other two columns, this will create a +/- 1/3 Fine-tune step, or a "Fractional-tune" step. "Fine+Fractional" tunes provide 12 steps out of 60-degree range, with very good linearity (INL<1.5°). The total tuning steps out of one OSC cycle will be 6x4x3=72. Higher PI resolution is also possible by adding Fine-tune step, which could be limited by layout complexity.

The two groups of clock signals generated by the PI-OSC can be used as either half-rate differential clocks for 2-to-1 TX or quarter-rate quadrature clocks (after dividers) for 4-to-1 TX. Fig.4 demonstrates an application case where PI-OSC is used as the clock source for high-speed forwarded clock transceiver.

Differential coupling as in Fig.1(b) can also be added in the quadrature OSC. The benefit is to lock the phase relationships within differential OSC pairs (rings A,C and B,D), to avoid the very small chance of coupling loops locking at half-frequency.

#### V. CIRCUIT IMPLEMENTATIONS

Circuits implemented in the test chip are illustrated as Fig.5, which includes the core OSC circuits under  $V_{OSC}$  supply and other building blocks for a PLL under 1V. Blocks in gray (PFD, LPF, and REG) are shared with other experiments. For application in the serial link, clock distributions and transceivers will be powered by the regulator which takes the  $V_{OSC}$  as reference. This topology uses OSC as the process indicator and will adjust the  $V_{REG}$  supply to flatten the performance variation cross corners. In the test circuits,  $V_{REG}$  is only used to power the on-die measurement block. Frequency measurement is done through a counter in the digital domain. Phase linearity and duty-cycle measurements are realized by the on-die asynchronous samplers triggered by an off-chip clock source. Supply voltages cross several domains can also be probed and measured.

Tuning capability of the Voltage-Follower blocks is realized by adding digitally trimmed tail devices in each cell, then having four cells in parallel for each feed-forward VF element (marked "A"). We reserved two of the four thermal tuning bits in each VF column for fractional tune function and connect the remaining two bits across VF rows together to limit routing complexity. All feedback VFs (marked "B") are hard-wired as a 1-bit constant. Two 6-to-2 MUXes perform the coarse tune of phase interpolation between the I path and the Q path. The two



Fig.5. Schematic of the PI-OSC test circuits

sets of half-rate differential outputs are further divided down to quarter-rate quadrature clock signals, before being sampled by the on-die measurement circuits. The PI-OSC circuits can be configured as both open-loop free-run mode and PLL mode.

# VI. EXPERIMENTAL RESULTS

The test chip was fabricated in TSMC 16nm FinFET CMOS. Fig.6 (a) shows in the free-run mode, measured OSC frequency versus effective supply voltage. The measured maximum frequency of quadrature PI-OSC is 25.9GHz at 0.946V. Frequency drops faster when  $V_{OSC}$  moves below 0.48V because quadrature coupling VFs start to lose effect at low voltage. Frequency sweep in PLL locked-mode is plotted as Fig.6 (b). When the PLL is locked, the OSC frequency can track reference (1/16 F<sub>OSC</sub>) smoothly in a wide range and saturates at ~25.5GHz, which is slightly lower than the free-run frequency.

Fig.7 shows the linearity measurements at 22GHz, over 66 steps. The reason why we removed the left-most step from each coarse-tune (60-degree) region is that the parasitic capacitances make the PI VF slower than simulation. This causes each coarse-tune region slightly larger than the ideal 60-degree range. The easy fix is just skipping the first step in each region and use 66 steps instead of 72. The LSB of PI-OSC is ~0.76% of a whole cycle after divided-by-2. Both INL and DNL variations cross the phase sweeping are well within +/-1LSB.

Fig.8 shows the maximum and RMS values of INL and DNL measured over varying frequency. The best performing frequency in this test chip is around 20GHz, but the total variations over the frequency are small. If we spent more effort in the layout and tried to reduce the parasitic in coupling devices, the performance could be even more converged over frequency.

The OSC's frequency range is different between coupling modes. The differential mode (using only A/C-rings or B/Drings as differential OSC) could reach very low frequency (<1.6GHz as Fig.6(b)) and up to PLL saturation. But at low frequency (low V<sub>OSC</sub>), the I/Q relationship between ring-pairs is not guaranteed, because the larger phase step (90-degree) across



Fig.6. OSC frequency (GHz): (a) free-run mode, (b) PLL locked mode.



Fig.7. (a) INL and (b) DNL performance at 22GHz (LSB=0.0076 or 1/132)



Fig.8. Linearity performance over OSC frequency

quadrature-coupling VF requires more voltage headroom. The lowest frequency for quadrature mode is ~12.8GHz. The frequency range for best PI linearity (DNL<1LSB) is about 15GHz~24GHz. However, low speed application is possible in feedforwarded clock links. For 2-to-1 TX slower than 25Gbps, clock signals after div-by-2 can be used as differential, and the PI tuning range is still enough to cover one UI range.

Fig.9 shows the chip micrograph, and more details in the base layer layout. The 16nm FinFET layouts were strictly done in the standard height (9-tracks) gate-array style, to minimize the variations. All circuits under OSC\_Core block (PI-OSC, MUXes, and high-speed dividers) were comfortably fitted into one 20umx20um pitch. This is a very compact implementation compared to other state-of-art designs.

From measurement results, the proposed quadrature PI-OSC shows good speed and power advantages, compared to previously published results. More performance data and comparisons are summarized in Table I.

# VII. CONCLUSION

We have demonstrated a phase interpolator embedded quadrature OSC based on a multi-ring topology with voltagefollower coupling. Test circuits were fabricated in a 16nm FinFET process. The measurement results show clear

| C <sub>OSC</sub> 20um | 1V blocks        | OSC_<br>Core        | Meas. |  |
|-----------------------|------------------|---------------------|-------|--|
|                       | C <sub>osc</sub> | <b>←</b> 20um→      |       |  |
| Cmp.                  | ader OSC         | /2<br>MUX x 2<br>/2 |       |  |

Fig.9. Chip micrograph and base layer layout

| TABLE I. PERF | FORMANCE COMPARISON |
|---------------|---------------------|
|---------------|---------------------|

|                 | This work                                               | [4]                       | [5]                             |
|-----------------|---------------------------------------------------------|---------------------------|---------------------------------|
| Architecture    | PLL                                                     | QLL                       | QLL                             |
| Oscillator      | QOSC                                                    | IL-QOSC                   | IL-QOSC                         |
| Technology      | 16nm FinFET                                             | 7nm FinFET                | 10nm FinFET                     |
| Frequency Range | 13 – 25GHz                                              | 4 – 16GHz                 | $9-20 GHz^{(c)}$                |
| Active Area     | 0.0004 mm <sup>2</sup>                                  | 0.105 mm                  | N/A                             |
| Supply          | 1V/0.85V <sup>(a)</sup>                                 | 1.2V/0.88V                | 1V                              |
| Power           | 8.1mW (OSC) <sup>(b)</sup> ,<br>5mW (2 Div)<br>at 24GHz | 48mW<br>at 16GHz          | 18mW<br>at 14GHz <sup>(d)</sup> |
| PI INL/DNL      | 1.25/0.97 LSB<br>at 24GHz                               | 1.44/0.87 LSB<br>at 16GHz | N/A                             |

<sup>(</sup>a)  $V_{OSC}$  measured at 24GHz; (b) PI-OSC without MUXes consumes 5.7mW, all calculated with 1V; (c) Simulated; (d) DCO+ILO, no PI

advantages in speed, power, and silicon area cost. The embedded PI provides good linearity performance over frequency. The proposed design could be used as a clock source for high-speed low-power serial links.

#### REFERENCES

- A. Shokrollahi et al., "A Pin-Efficient 20.83Gb/s/wire 0.94pJ/bit Forwarded Clock CNRZ-5-Coded SerDes up to 12mm for MCM Packages in 28nm CMOS," Proc ISSCC, vol. 10.1, pp. 182-183, 2016.
- [2] W.-S. Choi et al., "A 0.45-0.7V 1-6Gb/s 0.29-to-0.58pJ/b Source-Synchronous Transceiver Using Automatic Phase Calibration in 65nm CMOS," Proc 1SSCC, vol. 3.8, pp. 66-67, 2015.
- [3] J. Wilson et al., "A 1.15pJ/b 25Gb/s/pin Ground-Referenced Single-Ended Serial Link for Off- and On-Package Communication in 16nm CMOS using a Process- and Temperature-Adaptive Voltage Regulator," Proc ISSCC, pp. 276-277, 2018.
- [4] S. Chen et al., "A 4-to-16GHz Inverter-Based Injection-Locked Quadrature Clock Generator with Phase Interpolators for Multi-Standard I/Os in 7nm FinFET," Proc ISSCC, pp. 390-392, 2018.
- [5] J. Kim et al., "A 112 Gb/s PAM-4 56 Gb/s NRZ Reconfigurable Transmitter With Three-Tap FFE in 10-nm FinFET," IEEE JSSC, 2018.
- [6] A. Hafez and C.-K. Yang, "Design and Optimization of Multipath Ring Oscillators," IEEE TCAS, vol. 58.10, pp. 2332-2345, 2011.
- [7] J. Maneatis and M. Horowitz, "Precise Delay Generation Using Coupled Oscillators," JSSC Vol 28, No. 12, pp. 1273-1282, December 1993.