Integrated Transceivers for Millimeter Wave and Cellular Communication

Tobias Tired

DOCTORAL DISSERTATION

by due permission of the Faculty of Engineering, Department of Electrical and Information Technology, Lund University, Sweden.

To be defended on Wednesday, November 9, 2016 at 10.15 in lecture hall E: 1406, Department of Electrical and Information Technology, Ole Römers väg 3, 223 63 Lund Sweden

Faculty opponent
Andrea Bevilacqua
Abstract:

This doctoral thesis is addresses two topics in integrated circuit design: multiband direct conversion cellular receivers for cellular frequencies and beam steering transmitters for millimeter wave communication for the cellular backhaul. The trend towards cellular terminals supporting ever more different frequency bands has resulted in complex radio frontends with a large number of RF inputs. Common receivers have, for performance reasons, in the past used differential RF inputs. However, as shown in the thesis, with novel design techniques it is possible to achieve adequate performance with a single ended frontend architecture, thereby reducing the complexity and pin-count. Millimeter wave integrated circuits development has previously not been subject to the mass production requirements that have been put on chip sets for cellular terminals, i.e. a minimum number of circuits, low supply voltage and power consumption, together with programmability to handle process spread and performance fine tuning. However, in the near future, when 5G networks will be deployed and the number of small pico- and femtocell base stations will explode, there will be a strong demand for low cost and high performance single-chip millimeter wave beam steering transceivers. The millimeter wave circuits presented in this work have been designed in a SiGe bipolar technology. Traditionally, SiGe designs use a higher supply voltage compared to CMOS. In this work, however, it has been shown that millimeter wave transceivers can be designed using a low supply voltage, thereby reducing the power consumption and eliminating the need for dedicated voltage regulators.

Paper I presents a 28 GHz QVCO with an I/Q phase error tuning and detection. In paper II a 28 GHz beam steering PLL is presented together with measurement results for the design in paper I. Measurement results for the beam steering PLL are shown in paper III. Simulation results for a two-stage 81-86 GHz power amplifier are provided in paper IV. Paper V shows measurement results for two E-band power amplifiers. In paper VI, simulation results are presented for a complete E-band transmitter including a three-stage power amplifier. A reconfigurable single-ended CMOS LNA for different cellular frequency bands is presented in paper VII. A single-ended multiband RF-amplifier and mixer with DC-offset and second order distortion suppression in BiCMOS technology is presented in paper VIII.

Key words: receiver, LNA, mixer, bipolar, BiCMOS, SiGe, mm-wave, E-band, transmitter, PLL, PA, beam steering

Classification system and/or index terms (if any)
Integrated Transceivers for Millimeter wave and Cellular Communication

Doctoral Thesis

Tobias Tired

Department of Electrical and Information Technology
Faculty of Engineering, Lund University
Lund, Sweden 2016
Abstract

This doctoral thesis is addresses two topics in integrated circuit design: multiband direct conversion cellular receivers for cellular frequencies and beam steering transmitters for millimeter wave communication for the cellular backhaul. The trend towards cellular terminals supporting ever more different frequency bands has resulted in complex radio frontends with a large number of RF inputs. Common receivers have, for performance reasons, in the past used differential RF inputs. However, as shown in the thesis, with novel design techniques it is possible to achieve adequate performance with a single ended frontend architecture, thereby reducing the complexity and pin-count. Millimeter wave integrated circuits development has previously not been subject to the mass production requirements that have been put on chip sets for cellular terminals, i.e. a minimum number of circuits, low supply voltage and power consumption, together with programmability to handle process spread and performance fine tuning. However, in the near future, when 5G networks will be deployed and the number of small pico- and femtocell base stations will explode, there will be a strong demand for low cost and high performance single-chip millimeter wave beam steering transceivers. The millimeter wave circuits presented in this work have been designed in a SiGe bipolar technology. Traditionally, SiGe designs use a higher supply voltage compared to CMOS. In this work, however, it has been shown that millimeter wave transceivers can be designed using a low supply voltage, thereby reducing the power consumption and eliminating the need for dedicated voltage regulators.

Paper I presents a 28 GHz QVCO with an I/Q phase error tuning and detection. In paper II a 28 GHz beam steering PLL is presented together with measurement results for the design in paper I. Measurement results for the beam steering PLL are shown in paper III. Simulation results for a two-stage 81-86 GHz power amplifier are provided in paper IV. Paper V shows measurement results for two E-band power amplifiers. In paper VI, simulation results are presented for a complete E-band transmitter including a three-stage power amplifier. A reconfigurable single-ended CMOS LNA for different cellular frequency bands is presented in paper VII. A single-ended multiband RF-amplifier and mixer with DC-offset and second order distortion suppression in BiCMOS technology is presented in paper VIII.
Populärvetenskaplig sammanfattning


# Table of Contents

Abstract.......................................................................................................................I
Populärvetenskaplig sammanfattning...........................................................................III

Table of Contents........................................................................................................V
Preface..........................................................................................................................IX
Acknowledgements......................................................................................................XIII
List of Acronyms........................................................................................................XV
List of Symbols...........................................................................................................XIX

Part I General Introduction.........................................................................................1

1 Introduction ............................................................................................................1
1.1 Motivation ...........................................................................................................1
1.2 Millimeter wave frequency bands ..................................................................2
1.3 Millimeter wave applications ..........................................................................3
1.4 The future wireless backhaul ...........................................................................4
1.5 Thesis structure .................................................................................................6

2 Process technology and modeling for mm-wave circuits....................................9
2.1 Introduction ........................................................................................................9
2.2 SiGe active devices .........................................................................................10
2.3 Passive components for mm-wave designs .....................................................12
  2.3.1 On-chip inductors and transformers .......................................................12
  2.3.2 The ADS Momentum electromagnetic simulator ......................................14
  2.3.3 Capacitors, varactors and resistors ........................................................15
2.4 The b7hf200 BiCMOS process .......................................................................15
  2.4.1 Active devices ..........................................................................................15
  2.4.2 Passive devices and metal stack ...............................................................18
  2.4.3 Device matching ......................................................................................21

3 mm-wave TX carrier generation ..........................................................................23
3.1 Introduction .......................................................................................................23
3.2 Direct conversion TX architectures ..................................................................23
3.3 Sliding-IF TX architectures .............................................................................26

4 Beam forming for mm-wave transmitters .........................................................31
4.1 Introduction .......................................................................................................31
4.2 Linear timed and Phased Array Antennas .......................................................32
4.3 Beam steering transmitter architectures .........................................................34
  4.3.1 Introduction ..............................................................................................34
  4.3.2 Digital beam forming ..............................................................................35
10 Future work .......................................................................................................................... 113
  10.1 Duplexer elimination in FDD systems ............................................................................. 113
11 Conclusions ........................................................................................................................ 117
References ............................................................................................................................. 119
Part II Included papers ............................................................................................................ 139
Summary of included papers .................................................................................................... 141

Paper I: A 28 GHz SiGe QVCO with an I/Q phase error detector for an
81-86 GHz E-band transceiver ............................................................................................... 149

Paper II: A 28 GHz SiGe PLL for an 81-86 GHz E-band beam steering
transmitter and an I/Q phase imbalance detection and
compensation circuit .............................................................................................................. 157

Paper III: A 1.5 V 28 GHz beam steering SiGe PLL for an 81-86 GHz
E-band transmitter ................................................................................................................... 177

Paper IV: A 1 V power amplifier for 81-86 GHz E-band ......................................................... 185

Paper V: Comparison between two 2-stage SiGe E-band Power
Amplifiers ................................................................................................................................. 203

Paper VI: System simulations of a 1.5 V SiGe 81-86 GHz E-band
transmitter ................................................................................................................................. 211

Paper VII: Single-Ended Low Noise Multiband LNA with
Programmable Integrated Matching and High isolation
Switches ....................................................................................................................................... 227

Paper VIII: A BiCMOS single ended multiband RF-amplifier and
mixer with DC-offset and second order distortion
compensation ............................................................................................................................ 235
Preface

Today I have over 20 years of experience in silicon process technology and integrated circuit design. After I received my degree in Engineering Physics at Lund University I first moved to Stockholm in 1993 for a position as a process engineer at Ericsson Microelectronics. The circuits developed in that process technology were designed to handle speech frequencies over long distance fixed telephony wires. The breakdown voltage was more than 90 V. In 1996 I moved back to Lund and joined Ericsson Mobile Communications for a position as a designer of radio frequency circuits for mobile terminals operating at 900 MHz. My academic work started quite late in 2011 when Pietro Andreani at the department of Electrical and Information Technology at LTH asked me if I was interested in writing a licentiate thesis as an industry PhD student. After having finalized the licentiate work I decided to continue with a doctoral degree as a regular PhD student in 2012. The research topic was however quite different: millimeter wave beam steering transmitter circuits for E-band communication. The thesis is divided into two parts. The first part consists of a motivation and introduction to the two topics of my research, transmitter circuits for millimeter wave frequencies and integrated direct conversion receivers for cellular frequencies. The second part contains the published research papers.

Included publications


Related publications


Acknowledgements

First of all, I would like to thank Professor Henrik Sjöland for his efforts as main supervisor for the millimeter wave part of this thesis. Henrik has a wide knowledge of RF design and has during the project always been very helpful in solving technical issues. I would also like to thank my former supervisor Pietro Andreani at EIT for first of all arranging the position as industry PhD student for me, resulting in a Licentiate degree. Furthermore I would also like to thank my assisting supervisors Markus Törmänen and Johan Wernehag for reviewing the thesis. Göran Jönssons efforts in helping me to set up the millimeter wave measurement equipment for the power amplifiers were highly appreciated. Finally I am grateful to my former colleague Per Sandrup at Ericsson in Lund for his much appreciated support in discussions regarding circuit design and MATLAB coding.

I am also extremely grateful to my wife, Maria and my three children Axel, Nils and Signe for allowing me to spend time designing circuits and working on the included papers and the thesis during weekends and evenings.

Tobias Tired

Lund, 2016-10-04
## List of Acronyms

<table>
<thead>
<tr>
<th>Acronym</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>ABF</td>
<td>Analog Baseband Beamforming</td>
</tr>
<tr>
<td>ACPR</td>
<td>Adjacent Channel Power Ratio</td>
</tr>
<tr>
<td>AM</td>
<td>Amplitude Modulation</td>
</tr>
<tr>
<td>BER</td>
<td>Bit-error-rate</td>
</tr>
<tr>
<td>BiCMOS</td>
<td>Bipolar Complementary Metal Oxide Semiconductor</td>
</tr>
<tr>
<td>BPSK</td>
<td>Binary Phase-Shift Keying</td>
</tr>
<tr>
<td>BS</td>
<td>Base Station</td>
</tr>
<tr>
<td>CB</td>
<td>Common Base</td>
</tr>
<tr>
<td>CE</td>
<td>Common Emitter</td>
</tr>
<tr>
<td>CG</td>
<td>Common Gate</td>
</tr>
<tr>
<td>CM</td>
<td>Common Mode</td>
</tr>
<tr>
<td>CS</td>
<td>Common Source</td>
</tr>
<tr>
<td>CW</td>
<td>Continuous Wave</td>
</tr>
<tr>
<td>DBF</td>
<td>Digital Beamforming</td>
</tr>
<tr>
<td>DCR</td>
<td>Direct Conversion Receiver</td>
</tr>
<tr>
<td>DSP</td>
<td>Digital Signal Processor</td>
</tr>
<tr>
<td>CML</td>
<td>Current Mode logic</td>
</tr>
<tr>
<td>CMOS</td>
<td>Complementary Metal Oxide Semiconductor</td>
</tr>
<tr>
<td>DM</td>
<td>Differential Mode</td>
</tr>
<tr>
<td>EDGE</td>
<td>Enhanced Data rates for GSM Evolution</td>
</tr>
<tr>
<td>FEM</td>
<td>Front End Module</td>
</tr>
<tr>
<td>FoM</td>
<td>Figure of merit</td>
</tr>
<tr>
<td>FSK</td>
<td>Frequency Shift Keying</td>
</tr>
<tr>
<td>LTCC</td>
<td>Low Temperature Co-fired Ceramics</td>
</tr>
<tr>
<td>LTE FDD</td>
<td>Long Term Evolution Frequency Domain Duplex</td>
</tr>
<tr>
<td>LTE TDD</td>
<td>Long Term Evolution Time Domain Duplex</td>
</tr>
<tr>
<td>GPRS</td>
<td>General Packet Radio Service</td>
</tr>
<tr>
<td>Acronym</td>
<td>Definition</td>
</tr>
<tr>
<td>---------</td>
<td>------------</td>
</tr>
<tr>
<td>LO</td>
<td>Local Oscillator</td>
</tr>
<tr>
<td>MIM</td>
<td>Metal Insulator Metal</td>
</tr>
<tr>
<td>MOM</td>
<td>Metal Oxide Metal</td>
</tr>
<tr>
<td>NF</td>
<td>Noise Figure</td>
</tr>
<tr>
<td>OCP&lt;sub&gt;1dB&lt;/sub&gt;</td>
<td>1 dB output compression point</td>
</tr>
<tr>
<td>OFDM</td>
<td>Orthogonal frequency division multiplexing</td>
</tr>
<tr>
<td>OOK</td>
<td>On-Off Keying</td>
</tr>
<tr>
<td>PA</td>
<td>Power Amplifier</td>
</tr>
<tr>
<td>PAR</td>
<td>Peak-to average-ratio</td>
</tr>
<tr>
<td>PCB</td>
<td>Printed Circuit Board</td>
</tr>
<tr>
<td>PD</td>
<td>Pull Down</td>
</tr>
<tr>
<td>PDK</td>
<td>Process Design Kit</td>
</tr>
<tr>
<td>PLL</td>
<td>Phase Locked Loop</td>
</tr>
<tr>
<td>PFD</td>
<td>Phase frequency Detector</td>
</tr>
<tr>
<td>PN</td>
<td>Phase Noise</td>
</tr>
<tr>
<td>P&lt;sub&gt;sat&lt;/sub&gt;</td>
<td>Saturated output power</td>
</tr>
<tr>
<td>PU</td>
<td>Pull Down</td>
</tr>
<tr>
<td>Q</td>
<td>Quality factor</td>
</tr>
<tr>
<td>QAM</td>
<td>Quadrature Amplitude Modulation</td>
</tr>
<tr>
<td>QILO</td>
<td>Quadrature Injection Locked Oscillator</td>
</tr>
<tr>
<td>QPSK</td>
<td>Quadrature Phase Shift Keying</td>
</tr>
<tr>
<td>QVCO</td>
<td>Quadrature Voltage Controlled Oscillator</td>
</tr>
<tr>
<td>RF</td>
<td>Radio Frequency</td>
</tr>
<tr>
<td>RFIC</td>
<td>Radio Frequency Integrated Circuit</td>
</tr>
<tr>
<td>RMS</td>
<td>Root mean square</td>
</tr>
<tr>
<td>RX</td>
<td>Receiver</td>
</tr>
<tr>
<td>SAW</td>
<td>Surface Acoustic Wave</td>
</tr>
<tr>
<td>SiGe</td>
<td>Silicon Germanium</td>
</tr>
<tr>
<td>SNR</td>
<td>Signal-to-noise ratio</td>
</tr>
</tbody>
</table>

XVI
<table>
<thead>
<tr>
<th>TX</th>
<th>Transmitter</th>
</tr>
</thead>
<tbody>
<tr>
<td>VGA</td>
<td>Variable Gain Amplifier</td>
</tr>
<tr>
<td>VCO</td>
<td>Voltage Controlled Oscillator</td>
</tr>
<tr>
<td>Wi-Fi</td>
<td>Wireless Fidelity</td>
</tr>
<tr>
<td>WLAN</td>
<td>Wireless Local Area Network</td>
</tr>
</tbody>
</table>
## List of Symbols

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>$c$</td>
<td>Speed of light in vacuum</td>
</tr>
<tr>
<td>$\omega$</td>
<td>Angular frequency</td>
</tr>
<tr>
<td>$\omega_T$</td>
<td>Transit Angular frequency</td>
</tr>
<tr>
<td>$\omega_0$</td>
<td>Transit operating frequency</td>
</tr>
<tr>
<td>$F_{\text{min}}$</td>
<td>Minimum noise factor</td>
</tr>
<tr>
<td>$f_T$</td>
<td>Transit frequency</td>
</tr>
<tr>
<td>$f_{\text{max}}$</td>
<td>Maximum oscillation frequency</td>
</tr>
<tr>
<td>$f_{\text{LO}}$</td>
<td>Mixer clock frequency</td>
</tr>
<tr>
<td>$f_{\text{VCO}}$</td>
<td>Voltage controlled oscillator frequency</td>
</tr>
<tr>
<td>$Z_{\text{in}}$</td>
<td>Input impedance</td>
</tr>
<tr>
<td>$g_m$</td>
<td>Transconductance</td>
</tr>
<tr>
<td>$\sigma \phi$</td>
<td>RMS phase error</td>
</tr>
<tr>
<td>$I_E$</td>
<td>Total emitter current</td>
</tr>
<tr>
<td>$I_{\text{ne}}$</td>
<td>Electron emitter current</td>
</tr>
<tr>
<td>$I_{\text{pe}}$</td>
<td>Hole emitter current</td>
</tr>
<tr>
<td>$\gamma$</td>
<td>Emitter efficiency</td>
</tr>
<tr>
<td>$\alpha$</td>
<td>Common-base current gain</td>
</tr>
<tr>
<td>$\beta$</td>
<td>Common-emitter current gain</td>
</tr>
</tbody>
</table>
Part I

General Introduction
CHAPTER 1

1 Introduction

1.1 Motivation

In this thesis, integrated radio frequency receiver and transmitter circuits have been designed and manufactured, spanning over two different silicon processing technologies and two different operating frequency bands. The transmitter (TX) circuits were designed in a SiGe bipolar [1] technology for a wireless beam steering [2] transmitter operating at the E-band at 81-86 GHz [2], [3]. The receiver (RX) circuits were designed in BiCMOS and CMOS [4] technologies for a cellular system operating frequency of 900 MHz and 2 GHz respectively. In the three dimensional space defined by technology, operating frequency and RX/TX operation, three different points out of eight possible have been studied. Each point in the design space is associated with different technical challenges. To be competitive, a cellular transmitter has high requirements on e.g. efficiency, while this requirement is more relaxed for an E-band transmitter, due to limitations in silicon process technology. For high RX sensitivity, it is more difficult to design a receiver at millimeter wave than at cellular frequencies. This is due to both the limited device gain as well as the higher device noise figure at millimeter wave frequencies.

For integrated circuits developed for the cellular frequencies around 2 GHz, CMOS [4] is nowadays the dominating technology. However, this was not the situation in back in the late 90’s when competitive circuits where still developed in BiCMOS technology [5], using bipolar devices for the radio part of the design. The integration level of early E-band transceivers was quite low. In many aspects, integrated transceiver circuits for millimeter-wave communication now undergo the same transition as circuits for cellular communication, i.e. there will be a race towards higher integration level, lower supply voltages and lower current consumption. This is the common development when the manufacturing volumes for products increase, which is now the situation for E-band transceivers. The future 5G networks [6] will look quite different from the 4G networks of today. In order to support the increasing demand for higher data rates, the cell sizes in densely populated areas, with many users, will decrease. The best way to connect the smaller cells, so called micro, pico and femto cells [2], are through millimeter-wave radio links. With many small cells, it will no longer be cost-effective to deploy a fixed wired connection for
the base station to the backhaul in each cell. This is why wireless radio links has an excellent opportunity. For a given circuit topology, due to differences in device operation, a design using bipolar devices needs a higher supply voltage compared to if CMOS devices would have been used. However, as seen in this thesis it is possible to design high performance bipolar mm-wave circuits using a low supply voltage of only 1.5 V. A fundamental difference between CMOS and bipolar heterojunction devices is that a faster bipolar transistor is not always accompanied by lower breakdown voltages [7], which is always the case with newer and faster CMOS technology. For the author of the thesis, having a long background in the cellular wireless IC industry, designing BiCMOS and CMOS transceivers, the research on mm-wave transmitters using SiGe HBT technology [7]-[12] was an interesting opportunity to use bipolar devices again, but now for circuits operating at a much higher frequency, i.e. 81-86 GHz, enabling Gb/s communication.

### 1.2 Millimeter wave frequency bands

An overview of the microwave and millimeter frequency band, used for wireless communication is given in Fig. 1.1 [13]. Traditional point-to-point radio links use microwave frequencies below 40 GHz.

![Wireless communication frequency bands](image)

**Fig. 1.1.** Wireless communication frequency bands [13]

In this thesis, integrated transmitter circuits have been developed in SiGe technology for the E-band, located at 71-76, 81-86, and 92-95 GHz, offering a total bandwidth of 13 GHz [2], [3]. The attenuation, measured in dB/km, in air at sea level for mm-wave frequencies is illustrated in Fig. 1.2 [14]. For the three sub-bands located between 71 and 95 GHz, the attenuation is less than 0.5 dB/km, thereby enabling long range communication. The absorption peaks in Fig. 1.2 are due to resonances in the O₂ and H₂O molecules [14], [15], [16]. The band between 57-64 GHz, with an attenuation up to more than 15 dB/km is used mainly for short range wireless communications, e.g. wireless HD [14], [17] and WiGig [14].
For wireless long distance radio links, a high absorption is actually advantageous, since it allows for densely spaced unlicensed wireless communication. The high attenuation prevents interference between different systems. The low attenuation of the E-band enables high speed long range wireless communication without requiring an excessive output power of the transmitter. The power amplifiers presented in papers IV, V and VI have a saturated output power exceeding +15 dBm. Used together with beam steering, as shown in papers II and III, this output power is enough to establish a radio link with a range exceeding 1 km [18].

1.3 Millimeter wave applications

There are many different applications for millimeter waves besides point-to-point links. In satellite communication systems mm-wave technology is used for establishing radio links with earth as well as for communication between different satellites. During World-War II, there were large research efforts targeted towards developing military defense radars. Nowadays, radar has become a mass-market product with the introduction of automotive radars for cars. There are both short and long range automotive radars, using 24 and 77 GHz carrier frequencies, respectively [19], [20]. Short range automotive radars are typically used for parking assistance, while long range radar is used for collision avoidance. With a higher frequency, small objects can be detected due to a higher resolution. Driving safety has been improved with new systems such as adaptive cruise control, collision warning and automatic breaking [14]. Today, single chip radars, operating in either the 24-24.25 GHz or the 76-81 GHz band are available. Typical applications, besides automotive, for radars in the lower frequency bands, are door openers and home security [14]. Another application for mm-wave technology is in radio astronomy [16] for detecting mm-wave signals from objects in space. Millimeter waves can also be used in remote sensing of the atmosphere of the earth. Temperature measurements
of the atmosphere can be performed, utilizing temperature dependent changes in resonance frequencies of the oxygen and water molecules around 60 GHz [16]. One growing application is in airport security systems for scanning of human beings for concealed weapons [21], [22]. Using mm-wave imaging technology, objects that are hard to detect using X-rays, e.g. plastic items, can easily be found. Applications for the Wireless HD [14], [17] and WiGig [14] standards are typically wireless connections from a set-top box, DVD-player or PC to a HDTV screen [14]. One interesting even shorter range application is PCB-to-PCB or chip-to-chip wireless communication [14]. With higher clock frequencies, the losses and costs associated with PCB traces, cables and connectors increase. At some point a mm-wave wireless connection becomes attractive. For point-to-point radio links [2], which is the topic of this thesis, the traditional microwave bands between 6-38 GHz have started to become increasingly overcrowded, especially in densely populated urban areas. There is also a strong need for higher link data rates but these bands have an allocated channel bandwidth that does not exceed 56 MHz [3]. Even if higher modulation schemes like 256 QAM [2], [3], [14] are used, the link capacity is still too low. Using the bandwidth available in the E-band, located at 71-76 GHz, 81-86 GHz and 92-95 GHz, the data link capacity can be increased to several Gb/s even without utilizing higher order modulation schemes. This is beneficial since the link will then be more robust [2].

1.4 The future wireless backhaul

An overview of a future wireless backhaul supporting different systems for radio communication is shown in Fig. 1.3 [23]. Here, base stations for cellular and Wi-Fi [24] networks are communicating with the fiber connected main switching office using point-to-point wireless radio links. An important advantage comparing wired and wireless backhaul is how fast the wireless backhaul can be deployed. Installation of an optical fiber for a base station requires long planning before the base station can be operational. A wireless connection on the other hand can be installed in quite a short time. Electrical beam steering, i.e. automatic alignment of the radio link, which is presented in paper II and III, further simplifies the installation. Due to high antenna gain the mm-wave signal has the property of a “pencil beam”, minimizing interference with other radio links in the same area. There will nearly always be spectrum available, even in densely populated areas. Another desirable effect of the narrow beam is enhanced security, making it difficult to get access to the data for an unauthorized user.
In future 5G networks there will be many different cell sizes [2], [25], [25], i.e. the networks will evolve to heterogeneous ones. Especially there will be a large number of small cells, and therefore the number of base stations will increase significantly. There are three main drivers for implementing small cells in a network [26]:

- Increasing the capacity where there are many users in a small area. Such a place could be a train station or an arena.
- Improving the coverage in densely populated areas at the cell edges, or improving the quality of service inside the cell if obstacles degrade it.
- Reducing the power consumption of the terminals. Establishing a high data-rate wireless link between a terminal and base station requires more power the greater the distance is.

Early E-band transceivers used simple modulation schemes, such as on-off keying (OOK), and binary phase shift keying (BPSK) [2]. While easy to design, a wireless link using these schemes has a low spectral efficiency [27], measured in bits/second/Hertz (b/s/Hz). With an increasing number of users, it becomes more important, even at the E-band, to use a given bandwidth more effectively. Therefore, today’s commercial E-band systems use higher order quadrature amplitude modulation (QAM) [28]-[31], such as 16 QAM or 64 QAM. However, there is a penalty for adding more symbols to the constellation diagram, as in moving from 64 QAM to 256 QAM, since the required signal-to-noise-ratio (SNR) for a given bit-error-rate (BER) is increased [28]-[31]. Therefore, the maximum range of the link must be decreased with higher modulation order. Since the beam is narrow for E-band frequencies, alignment of the antennas is crucial for
reliable link operation. Early link installations depended on electrical motors that could mechanically rotate the antenna in two directions [32]. An important target for this thesis was to design circuits that could be used for electrical beam steering [32], [33], which is much cheaper than mechanical rotation. The results are presented in paper II and III. In an electrical beam steering transmitter, multiple transmitters are connected to an array of antennas, i.e. a phased array antenna [33]. The carrier phases are then altered to create constructive interference in the desired direction of the beam.

### 1.5 Thesis structure

The first part of the thesis gives an introduction to wireless communication and a motivation for research of the thesis, i.e. mm-wave beam steering transmitters for E-band frequencies and receivers for cellular frequencies. In chapter 2, the Silicon Germanium (SiGe) process technology is described together with challenges in the design of mm-wave integrated circuits. The concept of beam steering is discussed in chapter 3. In chapter 4, E-band point-to-point link transmitter requirements are analyzed. In chapter 5, the different building blocks required to design a transmitter are described. Chapter 6 describes the different building blocks in a mm-wave transmitter. In chapter 7 an introduction to the licentiate thesis topic is provided, i.e. integrated cellular radio front-ends and their requirements. Chapter 8 describes the receiver LNA properties. An overview of mixer topologies is given in chapter 9. Some future front-end architectures are presented in chapter 10. The conclusions of the thesis are given in chapter 11.

The organization of the thesis is as follows:

**Chapter 1:** The first chapter of the thesis starts with a motivation to why the research topic, mm-wave transmitters for E-band frequencies is important. An overview of other applications and products, driving mm-wave IC development, is given. The atmospheric propagation properties of the different mm-wave frequency bands are discussed. Finally, an overview of a future 5G wireless backhaul is provided.

**Chapter 2:** This chapter starts with a description of the advantages of the Silicon Germanium (SiGe) Heterojunction Bipolar Transistor (HBT) compared to the common Bipolar Junction Transistor (BJT). The difference in band structure, enhancing the performance of the HBT device is explained. Properties of passive components for mm-wave design are discussed together with an overview of the ADS Momentum electromagnetic simulator, used for designing the inductors and transformers of this thesis. Finally an overview of the devices available in the SiGe process technology used in this thesis, the Infineon b7hf200, is provided.
Chapter 3: In this chapter, different architectures for generating an E-band transmit signal at 84 GHz are analyzed. The TX frequency synthesis in this thesis is based on a 28 GHz quadrature voltage controlled oscillator (QVCO). The baseband signal is first mixed with the 28 GHz QVCO output, and then with the QVCO tail harmonic at 56 GHz, resulting in an 84 GHz TX carrier. The other architectures discussed use techniques such as direct conversion, frequency tripling, injection locking and N-push VCOs. Advantages and drawbacks of each architecture for mm-wave carrier generation are discussed.

Chapter 4: This chapter starts by describing the beam forming concept. An overview of different beam forming techniques that can be used for mm-wave transmitters is provided. Finally, the technique presented in paper II and III of this thesis, i.e. beam steering by injecting DC current into the load of the phase detector of a phase locked loop (PLL) is described.

Chapter 5: An introduction to mm-wave transmitter requirements is provided in this chapter. The Error Vector Magnitude (EVM) definition is discussed. An overview of the simulation setup presented in paper VI, i.e. a complete transmitter that up converts a 16 QAM baseband signal is given. The Symbol-Error-rate (SER) dependency on the Energy per Symbol to Noise Power Spectral Density, E_s/N_0, for different modulation schemes and I/Q phase imbalances is discussed. Finally, results for the circuit, presented in papers I and II, a QVCO with I/Q phase error detection and control are presented.

Chapter 6: This chapter describes the main properties of the different building blocks of mm-wave transmitters. The chapter starts with a description of the 28 GHz PLL that is presented in papers II and III. Next, the properties of a common voltage controlled oscillator (VCO) and the low supply voltage quadrature voltage oscillator (QVCO) in papers I, II and III are described. The properties of different frequency divider topologies, besides the Current-Mode-Logic (CML) divider, that was selected for the designs in paper II and III, are discussed. Next, an overview of the functionality of the Phase-Frequency-Detector (PFD) and Charge Pump (CP), plus the Phase Detector (PD) used in the PLL of papers II and III, is provided. The topology of the common passive loop filter and the active loop filter in papers II and III is discussed briefly. Next, the architecture of the passive and active mixers is described. The last section is about power amplifiers (PAs), which is the topic of papers IV and V. The main challenges in designing an E-band PA are described. Different ways of linearizing PAs are also mentioned. The architecture of the two E-band PAs presented in paper V is described.

Chapter 7: This chapter gives an introduction to the radio front-end of cellular receivers, which is the topic of paper VII and VIII included in the licentiate thesis. Performance issues related to frequency domain duplexing (FDD) in a multiband frontend direct conversion receiver (DCR) are discussed. An overview of the
requirements for the DCR is provided. The drawbacks of the DCR operating in frequency domain duplexing are analyzed. The impact from second and third order distortion and cross-modulation of the LO-leakage due to TX-leakage into the receiver LNA are described. This is also addressed in paper VIII.

**Chapter 8**: This chapter provides an overview of the receiver LNA. The properties of the MOS common source (CS) LNA with inductive degeneration, used in paper VII, are described. The effect of design parameters determining the input matching bandwidth of the MOS LNA are presented. A noise model for the CS LNA with inductive degeneration is outlined.

**Chapter 9**: In this chapter, the performance of different receiver mixers for DCRs is described. The mixer architecture can be either active or passive. It can be designed as single-ended or double balanced. In paper VIII, an active single ended mixer with a mismatch compensation feedback loop is presented.

**Chapter 10**: Future interesting radio front-end architectures, targeted for duplexer elimination in FDD systems, or SAW filter elimination in E-GSM systems are presented.

**Chapter 11**: This chapter gives the conclusions of the thesis.
CHAPTER 2

2 Process technology and modeling for mm-wave circuits

2.1 Introduction

Millimeter-wave integrated circuits are today fabricated using three main technologies, Silicon CMOS [4], Gallium Arsenide (GaAs) [34] and Silicon Germanium (SiGe) [7]-[12]. For quite some time, GaAs was the technology of choice for mm-wave devices. Compared to Silicon, the material properties of GaAs has several advantages, which makes it especially suitable for mm-wave designs. The charge carrier mobility, $\mu$, is much higher in GaAs, resulting in that carriers can respond faster to changes in an applied electric field [34]. Secondly, the saturated carrier drift velocity is higher in GaAs, resulting in shorter transit times and thereby faster devices [34]. When designing the power amplifier of a transmitter, GaAs devices have higher power handling capability, since they can sustain a higher electric fields before breakdown and also have a higher drift velocity [34]. In mm-wave designs, the quality of the on-chip inductors [35]-[37] is important for the overall circuit performance. Here, GaAs offers yet another advantage by being an excellent electrical insulator [34], reducing loss due to magnetically induced substrate currents, so called Eddy currents.

In spite of the above advantages, GaAs will not be the dominating technology for low cost and high volume applications. The GaAs material itself is more expensive than silicon, and for yield reasons the level of integration is lower. For small volume applications, requiring the best possible device performance, however, other technologies cannot compete with GaAs. For large scale mm-wave applications, such as automotive radars, wireless HD for TV sets and small cell mm-wave radio links, low cost technologies such as CMOS and SiGe have taken over and will continue to dominate in the future. Compared to GaAs technology, the integration level is higher using SiGe BiCMOS, since the digital parts can be implemented on the same die. This is important as increased integration level is a key driver for cost reduction. In this thesis, one of the key targets was to design an E-band transmitter, as shown in papers I-VI, for a supply voltage as low as 1.5 V. Such a transmitter can then be integrated with the digital part, sharing one common supply voltage. Comparing SiGe BiCMOS and Silicon CMOS technology, CMOS is cheaper due to less number of masks and processing steps. For a mm-wave transmitter, however,
bipolar devices are advantageous since compared to CMOS devices, with similar high frequency performance, they can sustain higher supply voltages without breaking. This increases the achievable output power at mm-wave frequencies. The metal layers in a standard digital CMOS process are not well-suited for implementing low loss inductors and transformers. In an RF CMOS process, additional thick metal layers are therefore added in the Back End of Line (BEOL) processing. This also applies for a BiCMOS process.

2.2 SiGe active devices

In general, silicon bipolar transistors for mm-wave applications require short base transit times, low base resistance and small capacitive parasitics. A SiGe Heterojunction Bipolar Transistor (HBT) is in its structure very similar to a common bipolar silicon transistor. However, the composition of the base is different, having germanium content. As seen in Fig. 2.1[38], the band gap, i.e. the energy distance between the valence and conduction band, is smaller in the base than in the emitter and collector [34], [38]. The energy barrier for injection of electrons from the emitter to the base is therefore reduced. This results in an increased electron injection current and also an increased emitter efficiency, $\gamma$ [34], The emitter efficiency is defined in (2.1), as the ratio between the electron injection current and the total emitter current, $I_E$, equal to the sum of the electron current, $I_{nE}$, and the hole current, $I_{pE}$ [34].

$$\gamma = \frac{\partial I_{nE}}{\partial I_E}$$

(2.1)

The common base current gain, $\alpha$, depends on both the emitter efficiency and the doping level of the base. It increases with higher emitter efficiency, but is reduced when the base doping increases. In the common-emitter configuration, the current gain, $\beta$, is defined in (2.2) from the common base current gain, $\alpha$ [34].

$$\beta = \frac{\alpha}{1-\alpha}$$

(2.2)

The benefit from introducing germanium in the base, is that due to the higher electron injection current, $I_{nE}$, the base can be higher doped, while still achieving a high current gain. A higher doped base is advantageous, since it reduces the base resistance and increases the $f_{\text{max}}$ of the device. Further, the early voltage is increased, which is beneficial for the device gain and linearity [34].

The amount of germanium in the base is graded in order to create an accelerating electric field in the base [38], thereby reducing the base transit time for minority carriers and increasing the transit frequency, $f_T$. The current gain is also higher for a SiGe device compared to a Si device.
The SiGe device also has lower 1/f noise compared to Si devices [34]. In CMOS devices, 1/f noise is created by carrier trapping and de-trapping at interface states in the gate oxide [4]. Excess 1/f noise in CMOS devices makes it difficult to design on-chip oscillators with a sufficiently low phase noise at low frequency offsets from the carrier. It also degrades the performance of active mixers in direct conversion transmitters and receivers [4]. Another SiGe HBT device advantage is the high linearity efficiency, defined as the device output referred third order intercept point, $OIP_3$, divided by the DC power consumption [19].

A comprehensive analysis of the small signal model of the bipolar junction transistor (BJT) is given in [39]. The most important parasitic capacitances, limiting the high frequency gain of the device, are the base-emitter capacitance, $C_{be}$, the base-collector capacitance, $C_{bc}$, and the collector-substrate capacitance, $C_{cs}$ [39].

A simplified equation for the large signal current of a SiGe transistor in the forward-active region is given in (2.3), defining the relation between the collector current $I_C$, the base-emitter voltage, $V_{BE}$, the saturation current, $I_S$, and the collector-emitter voltage, $V_{CE}$ [39]. The early voltage, $V_A$, determines the collector current dependency on $V_{CE}$.

$$I_C = I_S \left(1 + \frac{V_{CE}}{V_A} \right) \exp \left(\frac{V_{BE}}{V_T}\right)$$  \quad (2.3)

The large signal output characteristic of a typical SiGe transistor, including breakdown effects is outlined in Fig. 2.2 [20], [39]. The collector current, $I_C$, dependency on collector-emitter voltage, $V_{CE}$, for different values of base-emitter voltage, $V_{BE}$, is shown. As indicated by (2.3), in the forward active region the collector current increases with higher values of $V_{BE}$.
The intersect point with the $V_{CE}$-axis (dashed lines) defines the early voltage, $V_A$. At high $V_{CE}$ voltages, there will be an avalanche breakdown in the base-collector junction. This must be avoided since the induced high current can damage the device. In Fig. 2.2 [20], the breakdown voltage is specified as $BV_{CEO}$, the breakdown voltage with an open base. However, in an actual design, the base terminal is not open, but connected with a resistor or an inductor to a bias voltage. Depending on the resistor value, this yields a higher collector-emitter breakdown voltage. For a cascode topology, using a common-base connected output device, the breakdown is instead limited by the $BV_{CBO}$ parameter, which is several times higher than $BV_{CEO}$ [40]. For a PA, a high breakdown voltage is important, since it limits the maximum output power that can be achieved.

### 2.3 Passive components for mm-wave designs

#### 2.3.1 On-chip inductors and transformers

The quality factor (Q) [35]-[37] of on-chip inductors and transformers is a key performance parameter in mm-wave IC design. In amplifiers and buffers, inductors are used to cancel the imaginary part of both the input and output impedance at the operating frequency. Using high Q inductors reduces the losses, resulting in higher gain. In oscillators, the Q-value of the tank inductor is of fundamental importance to the achievable phase noise. For a transformer used at the output of a power amplifier, the losses have a strong impact on the power added efficiency (PAE). The losses in an on-chip inductor or transformer are due to several different mechanisms, which depend on process and design parameters. One loss mechanism is magnetically induced currents in the substrate, so called Eddy currents [35]-[37]. If the substrate resistivity is low, the induced currents can degrade the Q-value significantly. By increasing the substrate resistivity, the Eddy currents are reduced. Another loss mechanism is capacitive coupling between the inductor metal and the lossy substrate. This loss is mitigated by designing the inductors in the top layer of
the metal stack, and reducing the metal trace area. Capacitively coupled losses can also be mitigated by a patterned ground shield [41]. A third loss source is resistive losses in the metal of the inductor traces. These losses are augmented by the skin and proximity effects [35]-[37]. Due to these effects, the current density, $J$, of a high frequency signal is highest at the surface of a conductor and decreases exponentially with the distance $x$ from the surface as given by equation (2.4). For this reason, doubling the metal thickness does not reduce the resistive losses by a factor of two. $J_s$ is the current density at the surface of the conductor and $\delta$ is the so called the skin depth given by (2.5).

$$J = J_se^{-x/\delta} \tag{2.4}$$

$$\delta = \sqrt{\frac{2\rho}{\omega \mu_r \mu_0}} \tag{2.5}$$

$\rho$ is equal to the conductor resistivity, $\omega$ is the angular frequency of the signal, $\mu_r$ is the conductor relative magnetic permeability, and $\mu_0$ is the free space permeability [37]. A simplified lumped model of a differential transformer, excluding losses and parasitic capacitances [42], with an arbitrary turns ratio, $n$, is given in Fig. 2.3. Center tap biasing is used on both the primary and secondary side.

![Transformer lumped model](image)

**Fig. 2.3.** Transformer lumped model

Voltages and currents of the primary and secondary side are transformed by the turns ratio, $n$, as given by (2.6) [35]-[37].

$$n = \frac{v_S}{v_p} = \frac{i_p}{i_S} = k_m \frac{L_s}{L_p} \tag{2.6}$$

The magnetic coupling between the primary and secondary is given by the $k$-factor, $k_m$, defined in (2.7), where $M$ is the mutual inductance between the primary and secondary side [35].

$$k_m = \frac{M}{\sqrt{L_pL_s}} \tag{2.7}$$

In an on-chip transformer there will be leakage of the magnetic flux, resulting in a mutual inductance, $M$, that is less than $\sqrt{L_pL_s}$, i.e. the $k$-factor will have a value of
less than one. By altering the turns ratio, transformers can be used for impedance transformation. This is useful when for instance matching an amplifier with an input impedance of 200 Ω to a 50 Ω source. The transformer could then be realized with one turn at the primary side and two turns at the secondary. However, at E-band frequencies it is difficult to design transformers with turn ratios differing from one. This is due to interwinding capacitance [43], resulting in phase imbalance at the secondary side. Therefore, all transformers in paper I-VI have been designed with a turn ratio of 1:1. The fact that the separation of the two windings block DC-current can be conveniently used for biasing of the active parts of the design.

By grounding one side at either the primary or secondary side, a transformer balun is formed. These are usually required at the input and output of a circuit, since measurement equipment often has a single-ended input. For minimum loss, the primary and secondary side should be in resonance, i.e. a tuning capacitor is required at the input and output.

### 2.3.2 The ADS Momentum electromagnetic simulator

All transformers in the designs presented in this thesis have been designed using the ADS Momentum electromagnetic (EM) simulator. The simulator provides accurate models of the skin effect, substrate coupling, current distribution in thick metals, and effects of multiple dielectrics. It uses the Method of Moments (MoM) numerical discretization technique to solve Maxwell's electromagnetic equations for the transformer structures [44]. Momentum has two simulation modes, Microwave (full-wave) mode and RF (quasi-static) mode. The difference between the two modes is how the Green functions are calculated. The microwave mode uses frequency dependent Green functions to characterize the substrate, which results in frequency dependent and complex L and C elements. The RF mode on the other hand uses frequency independent Green functions, resulting in real L and C elements that are frequency independent. In RF mode, the Green functions only have to be calculated for one frequency and the simulation time is therefore decreased. The RF mode should typically only be used for structures that are significantly smaller than the electromagnetic wavelength and loses accuracy when the frequency of operation is increased [44]. In this thesis, for high modeling accuracy, all inductors and transformers were simulated using the microwave mode. The simulation time and required memory depend strongly on the how the structure is meshed and on the number of defined ports. Therefore, it is important to define a mesh that is not too coarse, resulting in a reduced accuracy, or too dense, resulting in an increased simulation time. If distortion is important in the circuit using the transformer, the s-parameters also need to be extracted for harmonics of the operating frequency, thereby further increasing the simulation time. Simulation time can be decreased, however, by using techniques such as mesh reduction and adaptive frequency [44]. The simulation time of the transformers is a factor that restricts how fast a mm-wave design can be completed. The design process is typically iterative, the diameter and trace width are altered after which the
transformer is re-simulated, which is repeated several times to optimize the design. The output from the Momentum simulator is an s-parameter file, i.e. a frequency domain model. In this thesis, the file is then imported into the Cadence Spectre simulation environment using the spectre Nport component. A model in the frequency domain works well for an AC-analysis and for the harmonic balance simulator in Spectre. In time domain simulations, such as transient analysis and Periodic Steady State (PSS) analysis using the shooting method, however, a frequency domain model can result in convergence difficulties. In paper VI, EVM simulations based on transient simulations for a complete E-band transmitter including a PA is presented. Due to convergence difficulties, lumped equivalent model of each transformer was extracted, thereby also significantly decreasing the simulation time. The lumped equivalent model consists of a network of resistors, inductors and capacitors, which has a frequency response similar to the s-parameter model.

2.3.3 Capacitors, varactors and resistors

A millimeter wave design kit contains accurate high frequency models for the capacitors, varactors and resistors. No specific parasitic extraction using an electromagnetic simulation tool is then required. When designing a VCO, the Q-value of the resonance tank inductor and varactor are both critical to obtain a low phase noise. Both varactors and capacitors have an internal series resistance, \( R_s \), that degrades the Q-value according to (2.8) [45].

\[
Q = \frac{1}{\omega_0 CR_s} \tag{2.8}
\]

As can be seen, the Q-value of a capacitor or a varactor decreases with operating frequency. One way to increase it is to split the structures into multiple segments. An integrated resistor has a parasitic capacitance to the substrate, which will often dominate the impedance at high frequencies. The capacitance can be reduced by using smaller width resistors. With reduced area, the device matching is compromised though.

2.4 The b7hf200 BiCMOS process

2.4.1 Active devices

The b7hf200 SiGe:C process technology from Infineon Technologies [19], [46], is using a double-polysilicon self-aligned transistor [47]. A cross-section of an NPN SiGe device is shown in Fig. 2.4. Epitaxial processing technology is used to grow the base. The epitaxial base is contacted using a silicided p+ doped polysilicon region. The mono-crystalline emitter is contacted through n+ doped polysilicon, thereby giving a low emitter resistance. The collector is contacted through a highly doped buried layer [19]. During processing, spacers, next to the n+ poly in Fig. 2.4
are used to reduce the emitter size, thereby reducing parasitic capacitances and increasing the transit frequency. For a drawn width of 0.35 μm, the effective width equals 0.18 μm [46]. The transistors have a mono-crystalline emitter contact [19] to achieve a small emitter resistance and a production stable interface between emitter contact and active silicon.

![Cross section of b7hf200 NPN-device](image)

Fig. 2.4. Cross section of b7hf200 NPN-device [19]

Besides germanium (Ge), the highly doped base also contains carbon (C). The carbon is used to prevent outdiffusion of boron from the base [19], [48]. The used carbon content is in the range of 0.2% and does not considerably change the band structure of the HBT device. One difficulty in SiGe processing is to keep the boron doping profile in the base of the SiGe device unchanged during succeeding thermal processing steps. High temperatures will cause a risk that the boron of the base to diffuses into close silicon regions. Undesired boron can cause barriers in the conduction band and reduce the device performance [48].

There are three types of NPN devices available, having different breakdown voltages and transit frequencies $f_T$ [46]. The fastest device is the ultra-high speed device (UHS-type) with an $f_T$ of 200 GHz and an open-base collector emitter breakdown voltage, $BV_{CEO}$, of 1.5 V. In this thesis, this device has been used for all parts having a supply voltage of 1.5 V. The high speed device (HS-type) has an $f_T$ of 170 GHz and a $BV_{CEO}$ equal to 1.7 V. There is also a high voltage device (HV-type) with an $f_T$ of 35 GHz and a $BV_{CEO}$ of 4.0 V which is used in the active loop filter of the PLL presented in papers II and III. The PNP device has an $f_T$ of only 3.5 GHz [46] and can only be used for biasing in a design targeted for E-band applications. The main device parameters for the different device types are summarized in Table 2.1 [46].
### Table 2.1. Nominal active parameters for UHS, HS and HV NPNs plus the V PNP [46]

<table>
<thead>
<tr>
<th>Devices Types</th>
<th>BVCEO (V)</th>
<th>BVCES (V)</th>
<th>fT (GHz)</th>
<th>fmax (GHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>UHS NPN</strong></td>
<td>1.5</td>
<td>5.8</td>
<td>170</td>
<td>200</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>170</td>
<td>250</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>170</td>
<td>250</td>
</tr>
<tr>
<td><strong>HS NPN</strong></td>
<td>1.7</td>
<td>6.5</td>
<td>170</td>
<td>200</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>170</td>
<td>250</td>
</tr>
<tr>
<td><strong>HV NPN</strong></td>
<td>4.0</td>
<td>14.5</td>
<td>320</td>
<td>35</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>120</td>
<td>120</td>
</tr>
<tr>
<td><strong>V PNP</strong></td>
<td>-10</td>
<td>-16</td>
<td>45</td>
<td>3.5</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

A Gummel-Poon core model [49] is used for the HBT devices in the b7hf200 process. Additional parasitic resistors and capacitances have been added to the model to improve the modeling of the high frequency behavior [19]. An external base-emitter capacitance is required to model the capacitance due to overlap between the base and emitter contacts. External base and collector resistances are needed to model the base and collector polysilicon and buried layer resistances [19]. An external base-collector capacitance is required to model the capacitance between the base contact and the connection to the internal collector.

In the b7hf200 process there are many different contact configurations available for the NPN devices. Besides the basic configuration with only one contact for each terminal, i.e. the BEC configuration, there are several configurations with multiple contacts that can be used for design performance optimization [19], [46]. In e.g. a low noise amplifier (LNA) used in e.g. a receiver, a low base resistance, $R_b$, is important to achieve a low noise figure. Typically, in such applications, a double base contact configuration, BEBC, can then be used. With two base contacts, there is a larger overlap between the base and collector contacts, however, resulting in an increased base-collector capacitance, $C_{be}$ [19]. At high operating frequencies this configuration is therefore less suitable. In Fig. 2.5, a layout view of a device with the CEBEC configuration is shown.
A parallel connection of several devices with this configuration has been used in the power amplifiers presented in papers V and VI. In a mm-wave PA, it is important to have a low parasitic inductance from emitter to ground, since the device will otherwise be inductively degenerated and have a lower gain. The effective inductance to ground is reduced by having many emitters contacted in parallel. At the collector terminal, it is desired to have a low collector series resistance, $R_c$. Any excess resistance will cause a collector voltage drop, that for a large current swing at the collector terminal can forward bias the base-collector pn-junction. Forward biasing will result in gain compression and increased distortion and noise. As the number of device contacts increases, however, the area of the devices becomes larger, thereby increasing the collector-substrate capacitance, $C_{cs}$. At mm-wave frequencies, device contact configuration is therefore an important design parameter.

### 2.4.2 Passive devices and metal stack

The main nominal design parameters for the passive devices are listed in Table 2.2 [46]. Since the b7hf200 process is used for manufacturing of ICs for 77 GHz radar, the varactor of the process is optimized for high Q-values at mm-wave frequencies. In this thesis, the low-ohmic TaN resistor [46] is typically used for resistive degeneration. The MIM capacitor has a Q-value of 50 at 2 GHz [46] but at the transmit frequency of 84 GHz it is however degraded. Therefore custom-designed Metal-Oxide-Metal (MOM) capacitors have been used to improve the decoupling.
Passive devices

<table>
<thead>
<tr>
<th>Device Type</th>
<th>Symbol</th>
<th>Resistance (Ω/□ ±10%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Poly resistor, p⁺ doping</td>
<td>Rs</td>
<td>150 ±10%</td>
</tr>
<tr>
<td>Poly resistor, p⁻ doping</td>
<td>Rs</td>
<td>1000 ±10%</td>
</tr>
<tr>
<td>TaN resistor</td>
<td>Rs</td>
<td>20 ±10%</td>
</tr>
<tr>
<td>Varactor, A = 10 x 10 x 30 µm²</td>
<td>BV&lt;sub&gt;CA&lt;/sub&gt;</td>
<td>7.7 V</td>
</tr>
<tr>
<td></td>
<td>Spec. capacitance, C&lt;sub&gt;VAR&lt;/sub&gt;</td>
<td>2.3 fF/µm² @ V&lt;sub&gt;PN&lt;/sub&gt;=0 V</td>
</tr>
<tr>
<td></td>
<td>Cap. ratio (0 V/-5.0 V)</td>
<td>2.2 @ 77 GHz</td>
</tr>
<tr>
<td></td>
<td>Q-value @ 77 GHz</td>
<td>8</td>
</tr>
<tr>
<td>Junction capacitor, A = 10 x 30 µm²</td>
<td>C&lt;sub&gt;J&lt;/sub&gt;(V&lt;sub&gt;J&lt;/sub&gt;= 0 V)</td>
<td>605</td>
</tr>
<tr>
<td></td>
<td>C&lt;sub&gt;J&lt;/sub&gt;(V&lt;sub&gt;J&lt;/sub&gt;= -1 V)</td>
<td>418</td>
</tr>
<tr>
<td></td>
<td>C&lt;sub&gt;J&lt;/sub&gt;(V&lt;sub&gt;J&lt;/sub&gt;= -5 V)</td>
<td>227</td>
</tr>
<tr>
<td>Metal-Insulator-Metal (MIM)</td>
<td>Spec. capacitance, C&lt;sub&gt;MIM&lt;/sub&gt;</td>
<td>1.4 fF/µm² ±10%</td>
</tr>
<tr>
<td>capacitor</td>
<td>Q-value @ 2 GHz</td>
<td>50</td>
</tr>
</tbody>
</table>

Table 2.2. Nominal passive device performance in the b7hf200 process [46]

A cross-section of the four layer metal stack [19], [46] is outlined in Fig. 2.6. M1 to M4 are made from copper and Pad from aluminum. In this thesis, the two thick copper top metal layers (M3 and M4) are used for all transformers. M4 alone, is used for all inductors.

![Fig. 2.6. Metal stack cross-section in the b7hf200 process [19], [46]](image)

In the thesis, the custom designed MOM capacitors use the oxide between metal 2 and 3 as a dielectric. The top and bottom plates are formed by joining the M4 and M3 layers as well as the M2 and M1 layers, thereby reducing the series resistance in the plates and increasing the Q-value. The electrical properties of the metal layers are specified in Table 2.3 [46]. Designing a transformer in M3 and M4, considering
the differences in perimeter shown in Fig. 2.7, the losses in the M3 layer will clearly dominate.

<table>
<thead>
<tr>
<th>Metal layer</th>
<th>Conductivity (S/m)</th>
<th>$R_{\text{sheet}}$ (mΩ/□)</th>
</tr>
</thead>
<tbody>
<tr>
<td>M4</td>
<td>$5.6 \times 10^{-7}$</td>
<td>6.5</td>
</tr>
<tr>
<td>M3</td>
<td>$5.6 \times 10^{-7}$</td>
<td>18</td>
</tr>
<tr>
<td>M2</td>
<td>$5.6 \times 10^{-7}$</td>
<td>26</td>
</tr>
<tr>
<td>M1</td>
<td>$5.6 \times 10^{-7}$</td>
<td>30</td>
</tr>
</tbody>
</table>

Table 2.3. Metal layer conductivities and sheet resistances in the b7hf200 process [46]

In paper V, two different 2-stage power amplifier topologies are presented. To increase the gain, the first design uses capacitive cross-coupling [50]-[52], while the second design instead uses a common cascode topology [53], [54]. The circuits contain several transformers, as an example, a Momentum view of the output balun of the cross-coupled PA is given in Fig. 2.7. The transformer has an inner diameter of 24 µm and a trace width of 5.5 µm. To reduce the effect of capacitive coupling to the substrate, the secondary side, at which the signal voltage is the highest, is implemented in the top metal layer (yellow). To maximize the magnetic coupling between primary and secondary, the primary side is implemented in the M3 layer (green). The supply voltage is connected to a center tap on the primary side.

An important aspect in designing transformers for E-band frequencies is the interconnect wiring from the transformer to the active device. If the active device is wide, as in a PA biased with a large collector current, the interconnect wiring will be long compared to the diameter of the transformer. In Fig 2.7, the connection to the collector of the active device is implemented in the M2 layer (red). In the Momentum simulation, the interconnect is modeled to the middle of the device, which gives a correct average distance from the collector terminals to the actual transformer structure. The minimum transformer trace width is constrained by the maximum allowed DC current density to avoid electromigration. This is especially
important for the output transformer in PA designs, since the active devices of the output stage have a large DC current.

2.4.3 Device matching

Millimeter-wave circuits often use differential topologies, since due to trace and ground plane inductances, it is difficult to obtain well defined signal grounds. This can result in large discrepancies between simulated and measured performance. With differential circuits this problem is eliminated. In this thesis, all designed blocks therefore use a differential topology. Device matching is however important for the performance of a differential circuit. Assuming a normal distribution of the process spread, the worst possible values of the realized device parameter, \( P_r \), in (2.9) can be taken as the nominal value, \( P_n \), plus minus the \( 3\sigma \) spread, where \( \sigma \) equals the standard deviation for the normal distribution. [46].

\[
P_r = P_n \pm 3\sigma
\]  

(2.9)

For the collector current relative spread (2.10) applies [46], where \( \omega_{E,\text{eff}} \) and \( l_{E,\text{eff}} \) are the effective width and length of the emitter, i.e. the size of the spacer has been subtracted from the drawn emitter size [46]. The matching constant is given by the \( c_I \) parameter.

\[
3\sigma \left( \frac{\Delta I_c}{I_c} \right) = \frac{c_I}{\sqrt{\omega_{E,\text{eff}}l_{E,\text{eff}}}}
\]  

(2.10)

An similar relation in (2.11) applies for the forward gain, \( \beta \), but with a matching constant \( c_\beta \) [46].

\[
3\sigma \left( \frac{\Delta \beta}{\beta} \right) = \frac{c_\beta}{\sqrt{\omega_{E,\text{eff}}l_{E,\text{eff}}}}
\]  

(2.11)

For a resistor with width and length \( \omega_R \) and \( l_R \), (2.12) applies with a matching constant \( c_R \) [46].

\[
3\sigma \left( \frac{\Delta R}{R} \right) = \frac{c_R}{\sqrt{\omega_Rl_R}}
\]  

(2.12)

The matching of the TaN resistor, having \( c_R \) equal to 5 %/\( \mu \)m [46] is superior to the \( p^+ \) and \( p^- \) resistors having \( c_R \) equal to 20 %/\( \mu \)m and 25 %/\( \mu \)m respectively. When optimizing a design for low spread, the three above relations are important to have in mind. As matching improve with device size, the I/Q phase error detector presented in papers I and II uses device up-scaling to reach a performance where the internal spread in the detector is far less than in the QVCO. Since the active mixer that constitutes the detector has a DC output, both the size of the load resistors and the NPN devices can be scaled up significantly without performance loss. The
active devices were sized a factor of 7 and 5 times the size for maximum $f_T$ for the mixer transconductance and switching pair, respectively.
3 mm-wave TX carrier generation

3.1 Introduction
There are several architectures to choose from for generating a mm-wave, or especially an E-band, TX signal. Direct conversion architectures [55], [56] are commonly used. Another solution is to use a sliding-IF topology [57], [58]. The differences between the architectures mainly relate to the frequency upconversion and the LO generation. The preferred architecture depends, besides on the mm-wave operating frequency, on the key performance of the devices in the semiconductor process, as well as on the radio system requirements. Important active device performance is high frequency gain, noise and nonlinearities. For the passive devices, the Q-value of inductors, capacitors, and varactors constrains the choice of architecture. For both active and passive devices, the matching is critical for the performance of differential and quadrature mm-wave building blocks. Radio system parameters to consider are typically EVM, spurious emissions and output power. Power consumption, die area and integration level, i.e. factors determining the price level of the mm-wave electronics must also be taken into account. In the following sections, regarding different transmitter architectures, a TX carrier at 84 GHz in the 81-86 GHz sub band is assumed. It is a transmitter for linear modulation. Quadrature amplitude modulation is used for the baseband signal for high spectral efficiency.

3.2 Direct conversion TX architectures
In the TX architecture shown in Fig. 3.1, the PLL is locked at the fundamental carrier frequency. Due to its simplicity with a single frequency conversion step, this is an attractive way for generation of the 84 GHz carrier. It is common in mobile terminals for cellular communication at lower carrier frequencies at around 2 GHz.
However, for carrier generation at mm-wave frequencies the method has important drawbacks. It is difficult to design a QVCO at 84 GHz with sufficiently low phase noise, since the Q-value of the varactors is typically inversely proportional to frequency and will be very low at 84 GHz. Another equally important drawback is the phase and amplitude mismatch of the four QVCO signals, which tends to increase with frequency. For higher order QAM there are stringent requirements on I/Q phase error, which even when operating the QVCO at a fraction of 84 GHz are difficult to fulfill without adaptive tuning. One tuning topology is proposed in papers I and II, where the phases of a 28 GHz QVCO can be adjusted by using four programmable varactors. This tuning could also be applied to the 84 GHz QVCO, shown in Fig. 3.1.

The 84 GHz LO signal can be generated using an architecture based on a 28 GHz QVCO. In [59], a 90 GHz carrier is generated from a 30 GHz VCO using either injection locked frequency triplers (ILFT) or harmonic-based triplers (HBFT). In Fig. 3.2, the LO-signals for the 84 GHz up conversion mixers are created using tripler circuits. An advantage of this architecture is that the PLL is locked at a frequency lower than $f_{TX}$ at $f_{TX}/3$, i.e. 28 GHz, thereby achieving higher Q in the varactor and requiring less speed in the PLL feedback path divider. However, at mm-wave frequencies, tripler circuits consume a lot of power. They also produce some spurs at integer multiples of $f_{TX}/3$. 

![Fig. 3.1. Direct conversion E-band TX architecture](image)
The 84 GHz TX carrier can also be generated using injection locked frequency multiplier and quadrature generation techniques [60] as outlined in Fig. 3.3, where an 84 GHz quadrature injection locked oscillator (QILO) is injection locked by a 28 GHz VCO. The QILO is then used as a frequency tripler. In [60] a low phase noise 60 GHz quadrature LO generation including a PLL using this technique is presented. The design, manufactured in 65 nm CMOS, achieves a measured phase noise of -95 dBc/Hz @ 1 MHz offset while consuming 80 mW using a 1.2 V supply.

In [61], [62] a 60 GHz PLL based on an N-push 20 GHz VCO is presented. This scheme can be adapted to an E-band transmitter, see Fig. 3.4. A 3-push VCO consists of three coupled VCOs with a phase difference of 120° between them. By combining the VCO outputs, the fundamental tone and the second harmonic can be cancelled, leaving only the third harmonic. Since the 3-push VCO operates at one third of the desired frequency the gain of VCO active devices is higher, as well as the Q-value of varactors [61], [62]. The phase noise performance and the tuning
range of the N-push VCO is therefore improved, compared to a VCO that oscillates directly at the desired frequency.

![Fig. 3.4. E-band TX architecture using based on a 28 GHz 3-push VCO](image)

An important drawback is that since the desired frequency is a harmonic of the oscillation frequency, the N-push VCO has a low output power. The output power can be further reduced due to losses of the signal combiner. An E-band TX architecture based on a 3-push 28 GHz VCO is outlined in Fig. 3.4. In phase and quadrature phase LO signals are generated either with a polyphase filter (PPF) or hybrid coupler (HC). However, at 84 GHz, the I/Q mismatch of both PPF and HC will be significant. Their losses will also further reduce the LO signal power.

### 3.3 Sliding-IF TX architectures

In a sliding-IF architecture the TX carrier is generated using two upconversion stages, as in a common IF transmitter architecture using a fixed IF frequency. A large advantage is that, compared to a fixed IF architecture, only one PLL is required in the sliding-IF topology. A drawback of the sliding-IF topology is that it is less suitable for covering large frequency ranges, requiring sharp IF filter to remove noise outside the channel. A second advantage is that operating the I/Q mixer at a lower frequency results in improved I/Q balance. A too low IF frequency will, however, result in issues attenuation of the image frequency. In papers I, II, III and VI of this thesis a PLL with a QVCO, see Fig. 3.5, is locked at $f_{TX}/3$, i.e. 28 GHz. The baseband signals are first up converted by 28 GHz I/Q mixers. The 56 GHz second order tail harmonic, present at the emitters of the QVCO core devices, is used as the LO signal for a second mixer, providing up conversion to an 84 GHz carrier frequency. A similar architecture was used in a 60 GHz receiver presented in [57].
One advantage is that no additional frequency doubling circuits are required to create the 56 GHz LO signal, it is inherently available from the QVCO. Operating the QVCO at 28 GHz improves the phase noise performance, tuning range, and I/Q imbalance. A drawback is the more complicated transmitter layout, since compared to the other described architectures, a third mixer is required. As can be seen in Fig. 3.7, the distance from the QVCO to this mixer is quite long, requiring LO signal buffering. A more detailed view of the TX architecture presented in paper VI, including LO buffers, but without the PLL, is outlined in Fig. 3.6. Excluding the active low pass filter, the design uses a single supply voltage of only 1.5 V.
The chip layout of the architecture given in Fig. 3.6 is shown in Fig. 3.7. The QVCO is located to the left in the figure, followed by the 28 GHz and 56 GHz mixers. The output of the 56 GHz mixer is connected to a transformer coupled to a three-stage differential PA. Each stage in the PA uses capacitive cross-coupling [50]-[52] to increase the gain. Due to the limited output power of the 56 GHz mixer it was necessary to introduce a pre-amplifier before the original two-stage PA, presented in papers IV and V. The three-stage PA provides a simulated gain of 21 dB gain at 84 GHz, thereby relaxing the output power requirement from the TX-mixer. The size of the design excluding bond pads equals 890 µm x 450 µm. The design consumed 131 mA from a 1.5 V supply. The simulated saturated output in the 81-86 GHz frequency band is between 15 and 16 dBm with an output 1 dB compression point, OCP\text{1dB}, between 10 and 11 dBm. From transient simulations with a 16 QAM baseband signal with 1 GHz RF bandwidth, the transmitter achieves 7.2% EVM at 7.5 dBm average output power.

![Complete E-band TX transmitter layout](image)

Fig. 3.7. Complete E-band TX transmitter layout

In [58], a sliding-IF transmitter implemented in a 0.13 µm BiCMOS process, covering the 71-76 GHz, as well as the upper 81-86 GHz band, is presented. The architecture is outlined in Fig. 3.8. For the 81-86 GHz band the PLL is operating between 18 and 19.1 GHz. A frequency divide-by-two circuit generates I- and Q LO signals at 9.0-9.6 GHz for the mixer that up-converts the baseband signal to \(f_{IF}\).
mixture output is then amplified and filtered in the IFVGA-block.

A frequency quadrupler circuit connected to the PLL output is used to create the LO-signal for the final upconversion mixer. The mixer output signal at $f_{RF}$ is then amplified by the Driver and PA blocks for a saturated output power of 15.2 dBm. The quadrupler design is critical for the performance of the architecture. The most important drawbacks of frequency multipliers are low conversion gain and undesired frequency harmonics that require filtering. In [58], a balanced architecture was used to increase the odd harmonic suppression. With a 2.7 V supply, the quadrupler consumed 12 mA giving a conversion gain for the upper E-band of -8.5 dB. The driver amplifier, consuming 42 mA, also provided a necessary L-C notch filter for the image frequency between 63 and 66.8 GHz. In comparison, the architecture in Fig. 3.5 has an image frequency between 27 and 28.7 GHz and is therefore easier to suppress. The advantage of the design topology is that it is based on a low frequency PLL and can cover a large TX bandwidth, while the main drawbacks are related to the superheterodyne architecture requiring filtering of the image frequency.

![Fig. 3.8. E-band sliding-IF architecture [58]]
CHAPTER 4

4 Beam forming for mm-wave transmitters

4.1 Introduction

Beam forming [63]-[65] is a technique to achieve spatial filtering of radio signals. Signals could then be received or transmitted in some desired directions, and suppressed in others. This can be realized using an array of isotropic antennas, with signals steerable in phase and amplitude. High antenna gain can be achieved for a certain direction. This can be used for both transmission and reception of radio signals. High gain can also be implemented using a single directional antenna. One example is a satellite receiver dish antenna [63]. Since the signal from space is weak, high antenna gain is required to receive and decode the signal with an adequate signal to noise ratio (SNR). With an increasing dish diameter the gain increases, but the central lobe becomes more narrow. Direction control of a dish antenna is implemented mechanically, using electrical engines. For the antenna array the direction of the beam can instead be controlled electrically, which is very fast and does not involve any moving parts. Interfering signals for a receiver can also be suppressed by tuning a receiver antenna array to have a very low gain for some directions. In the same way, the radiation pattern of a transmit array antenna can be tuned not to transmit in certain directions. Beam forming is not only used in communication systems, but also in radar and sonar technology. During deployment of a wireless backhaul operating at mm-wave frequencies, alignment of the mm-wave beams is essential to achieve a high performance of the network, i.e. to establish mm-wave links that have a low bit-error-rate (BER). Applied to mm-wave links, electrical beam steering offers advantages compared to mechanical beam steering. In Fig. 4.1a a link using 2-axis mechanical beam steering is illustrated. In Fig. 4.1b an equivalent link, but with electrical beam steering in two directions is shown [66].
Comparing the installations, the solution using electrical beam steering has a clear cost advantage, since mechanical rotating devices consists of many different expensive parts. In future systems for high speed cellular communication, e.g. 5G, there will be many small base stations serving pico and femto cells [6]. All these will require high speed backhaul. With large base station volumes, the unit prize becomes more important for the manufacturers in order to be competitive.

### 4.2 Linear timed and Phased Array Antennas

The beamforming transmitter aligns the signals to the antenna elements in time, which gives coherent combination in one direction and suppression in other directions. The beam steering concept is illustrated in Fig. 4.2, showing an antenna with 8 linearly spaced antennas with a distance $d$, and a plane wave that leaves the antenna at an angle $\alpha$ [64].
In order for the 8 signals to add constructively in direction $\alpha$, each signal $n$ must be time shifted by $\Delta t_n$, given by (4.1), where $c$ is equal to the speed of light and $d$ equals the antenna element spacing [64].

$$\Delta t_n = \frac{(n-1)dsin\alpha}{c}$$

(4.1)

However, due to lack of good wideband controllable delay elements in both the analog and digital domain, a linear phased array is often instead used. Many applications use a small fractional bandwidth, i.e. the modulation bandwidth is small compared to the carrier frequency, and then the time delay can be approximated with a frequency independent phase shift. In transceivers, such phase shifts can be implemented efficiently. In papers II and III, the phase shift is introduced in the LO path by injecting a DC current into the PLL phase detector load.

In Fig. 4.3, the radiation pattern from a phase array antenna consisting of 8 elements, radiating in one direction, placed $\lambda/2$ apart with a progressive phase shift of $0.7\pi$ is illustrated. This results in a 45 degrees shift of the central beam to the left [63]. Compared to an isotropic antenna, the radiation pattern has a high level of directivity. A more narrow beam can be generated if the number of elements in the antenna array is increased. By adding amplitude weights to each of the 8 antenna elements, the sidelobe levels and null directions can be optimized. With beam steering, only the phase is controlled. Controlling both amplitude and phase, as in beam forming, enables enhanced control of the side lobe levels and nulls in the radiation pattern. The circuits of this thesis, presented in papers II and III, have been designed for phase control only of the transmit signal. To add amplitude control, however, would be possible by adding a variable gain amplifier in the signal paths of the transmitter.
Due to high path loss, a directional antenna is necessary for a transceiver operating at mm-wave frequencies. An electronically steered phased array antenna, consisting of a number of small non-directional antenna elements, can be used to mimic a conventional large directional antenna [63]. Using a phased array antenna, the amount of interference at receivers that are not targeted is strongly attenuated. In a wireless cellular backhaul, beamforming is used to optimize the direction of the beam from the cell tower to the backhaul controller. By also controlling the direction of the nulls in the radiation pattern, interference to other radio systems can be reduced.

If the direction of the radiation lobe should be controllable in two directions, a two-dimensional array of isotropic antennas is utilized. The array factor, $F$, is a function of the geometry of the array and describes the antenna pattern for an array of identical isotropic antenna elements [67].

The far field for an antenna array, $E_{tot}$, is defined in (4.2) as the product of the field from a single antenna element, $E_{elem}$, and the array factor, $F$ [67].

$$E_{tot} = F \cdot E_{elem}$$  \hspace{1cm} (4.2)

This allows the gain to be evaluated in different directions also for arrays of non-isotropic antenna elements.

### 4.3 Beam steering transmitter architectures

#### 4.3.1 Introduction

A beam steering transmitter can be implemented in several ways. Depending on the transmitter operating frequency, number of antenna elements, output power, type of
modulation, and signal bandwidth, some architectures are more suitable than others. At mm-wave frequencies, on-chip routing of TX and LO signals require power consuming buffers and can introduce mismatch that needs to be mitigated by calibration. To avoid global routing of such high frequency signals the beam steering concept presented in papers II and III is therefore based on LO phase control by DC current injection at the phase detector of the PLL. As will be seen in the comparison to other beam steering techniques in this chapter that method is highly competitive.

4.3.2 Digital beam forming

Digital beam forming is the most flexible solution for beam forming. The architecture is outlined in Fig. 4.4, showing a simplified system with two channels. Like analog baseband beam forming, multiple beams can be supported [63]-[65], and in the digital beam forming case the number of beams is programmable. However, separate DACs are required for each transmit path. Especially for high speed mm-wave links that use modulation bandwidths in the range of 1 GHz, the current consumption of the DACs will be high. A Digital Signal Processor (DSP) performs beamforming in the digital domain [63]-[65]. Inside the DSP, complex multiplication with weight factors is used to generate a digital baseband signal with directional properties [63], i.e. both the amplitude and phase of each transmitter signal can be controlled. If the different transmitters share a common PLL, the high frequency LO signals need to be routed to each transmit mixer. Long routing can result in both amplitude and phase mismatch of the LO signals. The DSP can, however, also be used to calibrate the beamformer regarding effects of circuit mismatch as well as mismatch in the antenna elements. Despite the advantages of digital beamforming, having superior flexibility and programmability, for low cost applications, due to its reduced complexity, analog beamforming is the preferred choice.

Fig. 4.4. Digital baseband beamforming architecture
4.3.3 RF beam steering

With RF beam steering [56], [68], the phase shifting is implemented at the inputs to the PAs [56], as outlined in Fig. 4.5. An advantage of this technique is that only the phase shifting block needs to be duplicated for each TX path in the beam steering transmitter. The rest of the transmitter can be single-channeled. However, implementing a high performance phase shifter as in [56] is a difficult task. The phase shifters can be implemented with either passive switched or continuously tunable blocks. They can also be designed as active phase shifters as presented in [56].

![Fig. 4.5. RF beam steering](image)

With RF beamforming, implemented in a transmitter, the phase shifting takes place where the frequency and signal level is high [68]. This will make phase shifter linearity challenging. If the control switches for the phase control are not linear enough, intermodulation in the switches can deteriorate the adjacent channel power ratio, ACPR, of the transmitter. A second problem is that at mm-wave frequencies the on/off impedance ratio of the RF switches is low due to capacitive parasitics [45]. A third disadvantage is that there will be long routing of high frequency TX signals to each phase shifter and PA. Long on-chip routing of high frequency signals should be avoided, since the large parasitics associated with the wires will require power consuming buffers, especially since high linearity is required in the TX signal path. Since losses are high in the mm-wave phase shifters, due to both passive devices and switches, the TX buffers will require significant power consumption. The losses will reduce the combined gain of the phase shifter and the power amplifier. For lower frequencies, e.g. below 5 GHz, implementing RF beamforming is less attractive. As the wavelength is in the range of centimeters, the size of the phase shifting elements will be too large for cost-effective integration. For mm-wave frequencies, the wavelength is smaller, allowing for on-chip integration.
The loaded-line phase shifter [45] consists of tunable series and shunt reactive elements. Since it is difficult to design a variable inductor, the tunable inductor can be implemented as a fixed inductor in series with a varactor. Another topology is the switched-delay phase shifter. It consists of a cascade connection of unit cells, where each cell has two modes for the phase shift [45]. The total phase shift of the chain is the sum of the phase shift of the unit cells. Commonly, an LC network is used as the unit cell. In the reflective-type phase shifter [69], on the other hand, a phase shift is introduced using a coupler with two of the coupler ports connected to identical reflective loads. The reflective load consists of an inductor and capacitor connected in series. The phase shift is given by the phase of the reflection coefficient, determined by the characteristic impedance of the coupler, \( Z_O \), and the reflective load, \( Z_L \). Since the signal power is high, nonlinearities in the switches controlling the passive phase shifters can create intermodulation products that distort the TX spectrum. In [69] the coupler loss equals 3.3 dB.

In [56] an E-band an RF beam steering transceiver with four phased array elements for the RX and TX part is presented. The RF signal is connected to a quadrature splitter [70] that creates in-phase (I) and quadrature phase (Q) signal components. Phase control is performed using vector summation of the RF signals to generate any phase between 0° to 360°. The beam steering transmitter architecture is outlined in Fig. 4.6 [56].

![Fig. 4.6. Block diagram of a transmitter using active phase shifters at RF frequency [56]](image)

The quadrature RF signals are supplied to two single-ended variable gain amplifiers (VGAs). For differential signal generation, the VGA output current is connected to a transformer, with the center tap on the secondary side grounded. Each current output from the transformer is then connected to two common-base (CB) devices having their collectors connected to either the positive or negative outputs of the phase interpolator. In the interpolator, the I and Q currents are summed passively using an inductor network. Which CB device that is turned on is controlled with one
digital selector control bit for the I and Q part, respectively. By altering the control bits it is therefore possible to change which signal from the transformer that is connected to the positive or negative output of the phase interpolator, thereby enabling quadrant selection.

An output vector, \( RF_{\text{out}} \), is then formed by a summation. Bold letters indicate vector property. The I and Q RF signals, \( V_I \) and \( V_Q \) are defined in (4.3) and (4.4) respectively [56].

\[
V_I = \frac{RF_{\text{in}}}{\sqrt{2}} \\
V_Q = j \frac{RF_{\text{in}}}{\sqrt{2}} \tag{4.3}
\]

The weighted PA output signal, \( RF_{\text{out}} \) is defined in (4.5) [56].

\[
RF_{\text{out}} = A_I V_I + A_Q V_Q = A_I \frac{RF_{\text{in}}}{\sqrt{2}} + jA_Q \frac{RF_{\text{in}}}{\sqrt{2}} \tag{4.5}
\]

The amplitude and phase of each RF output in (4.6) depend on the selected gain in the VGAs [56].

\[
|RF_{\text{out}}| = \left| \frac{RF_{\text{in}}}{\sqrt{2}} \right| \sqrt{|A_I| + |A_Q|} \tag{4.6}
\]

A 360° phase control in (4.7), is achieved with phase relations depending on the sign and magnitude of the selected gain in the VGAs, \( A_I \) and \( A_Q \) [56].

\[
\angle RF_{\text{out}} = \angle RF_{\text{in}} + \begin{cases} 
0^\circ + \tan^{-1}\left(\frac{A_Q}{A_I}\right) & A_I \geq 0, A_Q \geq 0 \\
180^\circ - \tan^{-1}\left(\frac{A_Q}{A_I}\right) & A_I < 0, A_Q \geq 0 \\
180^\circ + \tan^{-1}\left(\frac{A_Q}{A_I}\right) & A_I < 0, A_Q < 0 \\
360^\circ - \tan^{-1}\left(\frac{A_Q}{A_I}\right) & A_I \geq 0, A_Q < 0
\end{cases} \tag{4.7}
\]

Due to mismatches extensive calibration is required [56]. The design uses a transformer-based quadrature hybrid [56]. It is, however, difficult to achieve acceptable quadrature accuracy without calibration. In [56], varactor banks are added to the transformer structure, thereby enabling digital quadrature phase calibration.
4.3.4 Analog baseband beam forming

Compared to beam forming in the RF or LO path, analog baseband beam forming is advantageous, since the phase shifting takes place at a lower frequency [71], [72]. It is also possible to support multiple beams. A block diagram of an analog baseband beam former using a baseband vector phase shifter with two transmit paths is shown in Fig. 4.7.

In RF beam forming, the signal is split at the highest frequency of the transmitter, resulting in mismatch in phase and amplitude of the signals, and power losses that must be compensated for by expensive RF gain. This is avoided in analog baseband beam forming, where all signal splitting is instead performed at the lowest frequency [71], [72]. Long RF signal routing can also be avoided, reducing the sensitivity to inductive and capacitive layout parasitics. The four LO phases, however, still need to be distributed to the different transmitters across the die. In [72], demonstrating a 60 GHz transmitter with four antenna paths, long LO interconnect have been avoided by duplicating the entire PLL so that the LO signals only need to be distributed to two transmitters instead of four. Compared to digital baseband beam forming, there is also only one pair of DACs required for the complete beam forming transmitter, thereby reducing the power consumption. At the analog baseband interface after the DACs, the analog I and Q signals are available. If a certain antenna element should have a phase shift of $\phi$ degrees, the baseband beamformer should create output baseband signals $I_{out}$ and $Q_{out}$ from the input baseband signals, $I_{in}$ and $Q_{in}$ according to by (4.8) and (4.9) [72]. This corresponds to a rotation of the baseband signals by $\phi$ degrees around the origin of the I/Q phase.
Variable gain amplifiers (VGAs), with a differential current output, are used to implement the weighting factors that are proportional to $\cos(\phi)$ and $\sin(\phi)$. Differential amplifiers are used to easily create accurate gain of both polarities using switches.

\[
I_{out} = I_{in} \cdot \cos(\phi) - Q_{in} \cdot \sin(\phi) \quad (4.8)
\]
\[
Q_{out} = I_{in} \cdot \sin(\phi) + Q_{in} \cdot \cos(\phi) \quad (4.9)
\]

A block diagram of a baseband vector phase shifter implementation is given in Fig. 4.8 [72]. The baseband input signals are first buffered and then supplied to the VGAs. The differential polarity switches operate at baseband frequencies and are thus easier to design with high performance compared to switches used in RF beam forming.

![Block diagram of a baseband vector phase shifter](image)

**Fig. 4.8.** Block diagram of a baseband vector phase shifter [72]

### 4.3.5 Local Oscillator Beam Steering

With local oscillator (LO) beam steering, the phase of the LO-signal to each transmitter is altered. Changing the phase of the LO signal changes the phase of the TX in the same way as if the phase of the baseband signal would have been altered. An advantage of LO beam forming is that there are no phase shifters in the signal path. This is important since phase shifters contain nonlinear devices that can create unwanted intermodulation products. Another advantage is that, if the amplitude of LO signal at the mixer input is large enough, there will be no significant conversion gain mismatch between channels. In [73] a 52 GHz receiver in 90 nm CMOS is presented, using a QVCO to create the four LO phases for the phase interpolators, as shown in Fig. 4.9. Due to phase mismatch in the LO routing it is difficult to
distribute a perfectly balanced LO signal to the different phase interpolators across the die. Calibration is therefore required [73].

Fig. 4.9. Transmitter architecture using LO beamforming with a phase interpolator

The phase interpolator topology is outlined in Fig. 4.10. It consists of four differential pairs, each with a tunable bias current, through bias voltages $V_{c1}$ through $V_{c4}$ [73]. The gate terminals of the differential pairs are connected to the four LO phases from the QVCO as seen in Fig. 4.11.

Fig. 4.10. Phase interpolator topology

At the output, the currents are summed using a resonant load. By varying the bias current of the differential pair, the contribution to the phase of interpolator output signal from that LO phase can be altered. In [73] the bias currents can be continuously varied, giving a phase control range of 360°.
4.3.5.1 PLL beam forming

In papers II and III, a PLL beam forming technique, with an architecture depicted in Fig. 4.11 is presented. Here, one complete PLL, including a 28 GHz QVCO generating four LO phases, is placed locally at each transceiver. A low frequency reference signal, $f_{\text{ref}}$, is distributed across the die to the different PLLs [57]. This is advantageous, since routing of high frequency signals is avoided. Long interconnects carrying high frequency signals require buffers with high power consumption. Due to wiring mismatch and coupling to adjacent wires there will also be phase and amplitude mismatch between the signals.

![Fig. 4.11. Block diagram of a beam steering PLL.](image)

The PLL phase shift is implemented by injecting DC current into the load of a Gilbert type phase detector [74], [75], as shown in Fig. 4.12. The loop filter is designed as an active low pass filter together with a passive RC-link [74]. The divider ratio $N$ equals 16, giving a reference frequency, $f_{\text{ref}}$, of 1.75 GHz. In [57], linear PLL phase control is also demonstrated, but for a 60 GHz receiver based on a 20 GHz QVCO, designed in 90 nm CMOS technology. The QVCO phase is altered by injecting DC current into a passive loop filter [76]-[78], succeeding a conventional charge pump. A large advantage of the PLL beam forming technique is that it is modular. Other techniques, involving distribution of high frequency signals on-chip, are more difficult to modify if more TX-paths should be added to the architecture. Careful electromagnetic modeling of the wires using e.g. ADS Momentum is required to minimize wiring mismatch. With PLL phase control, the entire high frequency part is copied for each TX path. The injected DC current is preferably digitally controlled through a DAC. The phase resolution is therefore
determined by the phase control range and the number of bits in the DAC. A drawback of the technique is the increased die area due to that the entire PLL is duplicated for each TX path. The schematic of the Gilbert type PD including the phase control is shown in Fig. 4.12, with the output signal from the divider driving the transconductance devices and the reference signal connected to the current-commutating devices.

For the phase range used, the output current of the PD is close to proportional to the phase difference between its inputs. In locked condition the QVCO frequency is constant, and so is then its control voltage, and also the phase detector output voltage. To keep the loop locked when control current is injected, the phase detector must thus produce the opposite current. Given the characteristics of the PD, this requires that the phase of the QVCO changes proportionally, with respect to the reference signal. As shown in papers II and III, for a given injected DC phase control current, the phase change is proportional to the ratio between the phase control current, $I_{\text{phase-ctrl}}$, and the tail bias current, $I_{PD}$, of the phase detector as given by (4.10), where $N$ equals the divider ratio. The feedback mechanism of the PLL makes this phase control technique robust against process and temperature spread.

$$\delta \phi_{QVCO} = \pi N \frac{I_{\text{phase-ctrl}}}{I_{PD}}$$  \hspace{1cm} (4.10)

A chip photo of the design presented in papers II and III is given in Fig. 4.13. The chip was implemented in a 0.18 µm bipolar SiGe process. The total die size equals 1448 µm x 928 µm. The different PLL blocks, i.e. the QVCO, divider, PD and active LF are outlined. More than 70% of the die size is used for on-chip decoupling. The chip consumes 52 mW from a 1.5 V supply plus a minimum of 7 mW from a dedicated variable supply for the active low pass filter, which is used to extend the locking range of the PLL. The measured PLL in band phase noise at 1 MHz offset equals -107 dBc/Hz, in good agreement with the simulated performance.
Linear phase control covering 360° with 2.5°/µA injected DC current has been verified in paper III for a PD bias current of 1.29 mA. The measured phase shift at the divider output versus injected DC current is provided in Fig. 4.14. A linear phase control is desired, since it eliminates the need of look-up tables required for mapping of a non-linear behavior to a linear one, and also using a look-up table potential variations may still result in remaining non-linearity.
CHAPTER 5

5 mm-wave transmitter requirements

The E-band is located at 71-76 GHz, 81-86 GHz and 92-95 GHz and is used for point-to-point radio links. One application is wireless backhaul for cellular communication systems with Gb/s capacity. Compared to microwave links operating at lower frequencies, the large available bandwidth at the E-band frequencies enables much higher data rates. Each sub-band is divided into 250 MHz channels [2], [3]. An operator can combine several channels to increase the capacity. Since E-band wireless communication is licensed, interference from other deployed radio links in the area is minimized. First generation E-band links only supported simple modulation schemes, such as BPSK and OOK. Though easy to implement, these schemes do not use the available bandwidth efficiently, i.e. an unnecessary amount of the spectrum is used to transmit at a certain data rate [79]. For early deployed E-band links this was overseen, due to that several GHz of bandwidth was available and there was hardly any risk of interference with other E-band links. As more cellular communication systems started to utilize E-band wireless backhaul, however, spectral efficiency became increasingly important. With an increasing number of users requiring high data rates, as in future 5G systems, and an increased cost for a licensed channel it becomes important to use modulation with a higher bandwidth efficiency, e.g. 16 QAM and 64 QAM. Using an M-QAM modulator [80], [81], the transmitted symbols can be represented with a constellation diagram [81], outlined in Fig. 5.1 for M equal to 16. In the constellation diagram the x-axis represents the in-phase (I) component and the y-axis represents the quadrature (Q) component, i.e. a component that has a 90 degrees phase difference to the in-phase component. With $M=16$, four bits are mapped to each symbol. All symbols have different amplitudes and phases.
Besides supporting M-QAM modulation, these second generation systems also support different capacity enhancement techniques [79], such as adaptive modulation and adaptive coding/channel spacing. Adaptive modulation can be used to overcome reduced link quality due to a difficult weather situation. High speed networks for cellular radio communication, such as LTE and 5G networks, have stringent requirements on clock synchronization [79]. While the first commercial E-band systems did only support basic requirements, the second generation now supports the requirements specified in the LTE standards. An important limitation in wireless communication at E-band frequencies is the attenuation of the signal due to the free space loss, $LFS$ [45] defined in (5.1), where the wavelength is equal to $\lambda$ and $d$ is the distance between the transmitter and receiver.

$$LFS = 20 \log_{10} \left( \frac{4\pi d}{\lambda} \right) \tag{5.1}$$

The high path loss at mm-wave frequencies is mitigated by using high gain antennas, which are highly directional. Due to the directivity, interference is also mitigated, both in the down and up link. The quality of the radio link is defined by its Bit Error-Rate (BER) or Symbol Error-Rate (SER). These quantities depend on the Error Vector Magnitude (EVM), illustrated in Fig. 5.2 [28]-[31], showing the difference between an ideal transmitted constellation point and the actual transmitted point.
The rms error vector magnitude, $EVM_{rms}$, is defined in (5.2) [80], as a normalized root mean square value of the errors of all constellation points.

$$EVM_{RMS} = \left[ \frac{1}{N} \sum_{r=1}^{N} |S_{ideal,r} - S_{meas,r}|^2 \right]^{0.5}$$  \hspace{1cm} (5.2)

The EVM depends on several imperfections of the transmitter [29]. One source of error is the phase noise of the local oscillator, which adds random noise to each constellation point [31]. In papers I, II and III, a SiGe QVCO is presented, which has a measured phase noise of -100 dBc/Hz at 1 MHz offset from the carrier. For a certain phase noise profile, the rms phase error $\sigma_\varphi$ is given by (5.3) [74], where $L(f)$ equals the phase noise power density relative to the carrier. The lower and upper limits for integration is equal to $a$ and $b$, respectively. Integrating between 100 kHz and 1 GHz, gives a phase error of 2.6°.

$$\sigma_\varphi = \frac{180}{\pi} \sqrt{2 \cdot \int_a^b L(f) \cdot df}$$  \hspace{1cm} (5.3)

Converting the PLL phase noise into an rms phase error gives a quantity that can more easily be compared with other imperfections in the LO generation. Another source of error is the I/Q imbalance [30], [31]. In [31], an analytical model is developed for analyzing the QAM transceiver performance degradation due to amplitude and phase mismatch. The analysis results can be used to specify the maximum tolerable mismatch for a given modulation type. The Symbol-Error-Rate (SER) versus energy per symbol divided by noise power spectral density, $E_s/N_0$, for different modulation schemes and transmitter I/Q phase imbalance is simulated. Comparing QPSK modulation with 16 QAM and 64 QAM, with a 3° phase imbalance results in $E_s/N_0$ requirements of 9, 16 and 25 dB respectively for a SER of $10^{-4}$. For the same SER and noise level, higher order modulation requires more...
energy per symbol. For a constant $E_s/N_0$ value of 20 dB and a phase error of 3°, SER degrades from $10^{-8}$ to $10^{-2}$ going from QPSK to 64 QAM. As expected QPSK modulation is quite insensitive to I/Q phase error. For a SER of $10^{-4}$ for QPSK, 16 QAM, and 64 QAM, a degradation of the phase error from 0° to 3° gives about 0 dB, 1 dB, and 3 dB degradation of $E_s/N_0$ respectively, i.e. using higher order QAM requires more stringent phase error calibration.

As described in papers I and II, the I/Q phase error depends on both static and dynamic mismatches. If a PA is integrated on-chip, thermal gradients can cause time varying (dynamic) mismatch between devices. Static errors are caused by device mismatch due to fluctuations in the semiconductor processing. For a mm-wave design, mismatches in capacitive parasitics become more important as the operating frequency increases. Therefore, in an E-band TX-architecture, it is difficult to design a QVCO, oscillating at the carrier frequency, that provides a sufficiently low I/Q phase error. A second reason for not selecting such an architecture, is that the Q-value of the varactor is reduced at higher operating frequency, thereby degrading the oscillator phase noise. In papers I and II, a 28 GHz QVCO with an I/Q phase error tuner, see Fig. 5.3, is presented. The topology could also be used for phase error tuning of QVCOs operating at higher frequencies.

![QVCO with I/Q phase imbalance detection and compensation](image)

**Fig. 5.3.** Architecture of the QVCO with I/Q phase imbalance detector and compensation

The phase error detector, outlined in Fig. 5.4, is designed as two cross-coupled double-balanced active mixers. The differential output voltage is proportional to the phase difference from 90 degrees of its inputs.
Due to internal mixer phase shift, a symmetrical arrangement with two mixers is required to provide zero output voltage for 90 degrees phase difference between the inputs. Monte Carlo simulations of the detector were used to secure that the internal error of the detector was below 1.0°. In Fig. 5.5, the simulated phase and magnitude of the differential detector output voltage versus a differential control voltage, $V_{\text{tune,ctrl}}$ is shown. The breakdown voltage of the varactor is equal to 7.7 V, i.e. the midpoint is set to 3.85 V. The $V_{\text{tune,ctrl}}$ voltage is related to the varactor voltages by setting $V_{\text{tune,L,p}} = V_{\text{tune,L,n}} = 3.85 - V_{\text{tune,ctrl}}$ and $V_{\text{tune,Q,p}} = V_{\text{tune,Q,n}} = 3.85 + V_{\text{tune,ctrl}}$.

The detector output voltage (green curve M1) equals 150 mV for an 8 degree I/Q phase error at the detector input (blue curve M2) with $V_{\text{tune,ctrl}}$ equal to 3.85 V. The phase error at the QVCO core is given by the red curve, marked M3. The chip photo of the design is shown in Fig. 5.6.
The larger part of the die area of 1.3 mm$^2$ is used for Metal-Insulator-Metal (MIM) and Metal-Oxide-metal (MOM) decoupling capacitors and pads. In Paper VI, EVM simulation results of a complete E-band transmitter for the 81-86 GHz band, with a 16 QAM signal of 1 GHz bandwidth is presented. The transmitter architecture is outlined in Fig. 3.6. In a QAM modulator, I and Q signals with discrete levels are created. These signals can, however, not be transmitted as they are, but must be shaped by filters to avoid spectrum broadening of the transmitted signal. Filtering is provided in both a digital Root-Raised-Cosine (RRC) filter [82] and in an analog filter. In Paper VI, the EVM performance of the transmitter architecture shown in Fig. 3.6 is analyzed using the simulation setup of Fig. 5.7. The four bits of each symbol in the constellation diagram are represented by four uncorrelated random bitstreams.
Baseband I and Q modulated signals are created in a 16 QAM modulator, represented in Verilog-A. After pulse shaping in a digital RRC filter and an analog filter, the baseband I and Q signals are up converted to 84 GHz by an I/Q modulator in the transmitter block. The output from the PA is then demodulated using an ideal quadrature LO signal at 84 GHz, without either phase noise or I/Q imbalance. After filtering with a replica of the baseband transmitter filters, the output signal is sampled at the symbol rate. By comparing the constellation diagram of the ideal transmitted symbols and the demodulated symbols, the EVM of the transmitter can be calculated. In paper VI, the EVM dependency on QVCO phase noise, I/Q phase imbalance, and TX output power is analyzed. Fig. 5.8 shows the EVM as a function of TX output power. As can be seen, the EVM is low for output powers far below the PA compression point of 11 dBm. The nonlinearities of the transmitter start to have an impact on the EVM already at power levels well below the compression point.
Above the compression point, the constellation diagram is increasingly distorted and the EVM increases rapidly. A third EVM source is LO-leakage [18], [29]. Calibration is therefore often required to suppress the leakage of the LO signal to the TX port. In the commercial E-band transmitter presented in [18], two DACs are used to adjust the I/Q modulator for minimum LO-leakage. The transceiver in [18] is designed using SiGe bipolar technology. One advantage compared to CMOS technology is the low flicker noise of the SiGe HBT devices. Using such devices, oscillators can be designed, which suffer significantly less from increased phase noise at low frequency offsets from the carrier [18]. Using higher modulation schemes, such as 64 QAM there is information at frequencies close to DC. Using SiGe technology, it is therefore easier to design both the up-link and down-link part of the transceiver. For the RX part, this applies in particular if a zero-IF or low-IF architecture is used. In [18], the phase noise of the internal VCO is better than -80dBc/Hz at 100 kHz carrier offset.

The transceiver in [18] is packaged together with a GaAs PA, which provides a saturated output power of +21 dBm at the antenna. Losses in PCB traces, and in the diplexer used to separate the RX and TX signals, reduce the output from the module. Due to high PAR and strict EVM requirements, it is not possible to transmit at full power with higher order modulations. Instead, the transmitter must be linearized using so called power back off [18]. In TX performance measurements [18], a 3.18 Gb/s radio link was established using 256 QAM in a 500 MHz RF channel bandwidth. However, the antenna port power was then reduced to only +8 dBm. Even so, using a directional antenna, this is output power is enough for a link distance of 1 km [18].

![Fig. 5.8. EVM versus PA output power for the complete transmitter](image-url)
6 Building blocks for mm-wave transmitters

6.1 Introduction
The design flow for an integrated mm-wave transceiver circuit is quite different from that of an IC for cellular frequencies. In cellular transceiver design, the work with the chip layout can start after the schematic design is finalized. In mm-wave circuit design the circuit schematic and layout is best developed simultaneously. To be successful in mm-wave design, chip floor planning is essential. If not taken into account at an early stage, interconnect wiring between circuit blocks can become too long, resulting in large trace inductances. If the wire inductance in combination with capacitive parasitics result in a wire self-resonance frequency that is lower than the operating frequency of the circuit, this can significantly deteriorate the performance. In general, a large difficulty in mm-wave transceiver design is that the device operating frequency is high in comparison with the $f_T$ of the active devices. To maximize the gain, most circuit nodes must therefore be in resonance, i.e. the capacitive parasitics must be cancelled using either an inductor or transformer. This adds a lot of die area and complicates the circuit floor plan. For a cellular transceiver on the other hand, the operating frequency is far below the device $f_T$ and the penalty for not cancelling the capacitive parasitics is less important. Compared to a mm-wave receiver, the high signal powers and DC current densities in a mm-wave transmitter restrict the minimum sizes that can be used for the active devices. The capacitive device parasitics are thus increased, further increasing the challenges of chip layout.

6.2 Phase Locked Loops
Most transceivers use a phase locked loop (PLL) [74]-[78], [83]-[85] to generate phase and frequency stable local oscillator signals for the mixers in the receive and transmit chains. The basic function of a PLL is to create frequencies that are multiples of a reference signal. The reference signal either comes from an external fixed frequency crystal, i.e. an XO, or from a crystal with tunable frequency, i.e. a VCXO. The reference signal should be as pure as possible, since its close-in phase noise is multiplied by the divider ratio [75]. In commercial transceivers, the fractional-N synthesizer [75], with variable non-integer effective divider ratio, is commonly used. However, within the time frame of this thesis, designing a complete fractional-N synthesizer was not possible. The 28 GHz PLL presented in papers II
and III is instead using a fixed divider ratio, N, equal to 16. For arbitrary frequency generation, a programmable reference could be used. A possible solution is to use a fractional-N PLL that generates the reference frequency, as shown in Fig. 6.1, where an XO is used as a reference for PLL 1, generating the reference frequency for the PLL 2, $f_{ref}$. Since in this thesis, the PLL frequency is multiplied by a factor three to create the TX carrier, PLL 1 should generate reference frequencies between 1.688 GHz (for 81 GHz TX output) and 1.792 GHz (for 86 GHz TX output). The performance of PLL 1 regarding phase noise and spurs is critical for the overall system performance. However, this PLL is common for all transmitters on the die and therefore its performance can be improved by allowing for a higher current consumption. The bandwidth of PLL 1 should be much smaller than the 5 MHz of the presented PLL in papers II and III. Reference spurs will then be strongly attenuated before entering PLL 2, and the amount of in-band noise of PLL 1, multiplied by PLL 2, is minimized. In papers II and III, phase control for beam steering is implemented with DC current injection into the load of the phase detector, as indicated in Fig. 6.1.

The performance of the PLL presented in papers II and III was simulated using Cadence Spectre RF. Simulating an entire PLL in locked mode using PSS analysis is known as a very difficult task, due to the frequency difference between the reference frequency $f_{ref}$ and the QVCO frequency, $f_{QVCO}$. Therefore, in the work of paper II, a Verilog-A model of the QVCO with a phase noise profile and sensitivity that matched the real QVCO was developed. Using the Verilog-A model, large signal PSS analysis together with PNOISE analysis was possible to perform. In paper III, the common small signal PLL model [74], [75], shown in Fig. 6.2, was used to analyze the contributions from the different blocks to the total PLL output noise. As shown in Fig. 6.2, a noise source, denoted, $n_{ref}$, is inserted at the reference input to the phase detector. Noise sources corresponding to the output noise from the phase detector, low pass filter, QVCO and divider are inserted at the block outputs.
The phase detector has an output voltage that depends on the phase difference between its inputs, i.e. it has a fixed gain \( G_{PD} \) in volt/rad. The low pass filter has frequency dependent gain, \( F(s) \). Being an integrator, the gain of the QVCO is equal to \( K_{VCO}/s \) where \( K_{VCO} \) is the tuning sensitivity of the QVCO. The divider gain, \( H(s) \) is equal to 16. By simulating the output noise of each block of the PLL, and inserting an equivalent noise source at the block output, the noise contribution from each block at the PLL output, \( Y(s) \) in Fig. 6.2, can be found using the transfer function from the noise source to the PLL output.

### 6.2.1 Voltage Controlled Oscillators

The Voltage Controlled Oscillator (VCO) [45], is a key component PLL design for transceivers. The VCO, generating the LO signals, sets a limit on the performance of the complete transceiver. There are several available architectures that can be used for implementing an oscillator, among these are the Colpitts oscillator [45], the ring oscillator [45], the N-push VCO [61], [62] and the cross-coupled VCO topology [45]. An advantage of the ring oscillator is that many phases of the oscillation signal are available simultaneously, a property that can be used in for instance vector combination beam steering.

The cross-coupled VCO topology [45], outlined in Fig. 6.3a and 6.3b, is common in transceivers designed using both CMOS and bipolar devices. In Fig. 6.3a, the bases of the core devices \( Q_i \) are AC coupled with capacitors \( C_i \) to the output nodes \( V_{out,p} \) and \( V_{out,n} \). If the VCO is designed using bipolar devices, this is necessary not to forward bias the base-collector junction of the \( Q_i \) devices for a part of the oscillation cycle. With CMOS devices AC coupling is not required. If forward biased, the noise from the core devices will be significantly increased. In papers I, II and III, presenting a QVCO, the supply voltage is reduced to 1.5 V, thereby reducing the voltage swing at the output nodes \( V_{out,p} \) and \( V_{out,n} \). The VCO core in these designs therefore has the topology shown in Fig. 6.3b, where the AC-coupling capacitors have been removed, resulting in a more compact core layout and reduced
parasitics. For 360 mV peak single-ended output signal, there is a maximum of 720 mV across the base collector junction. The $Q_1$ devices are therefore close to the forward biasing limit.

The cross-coupled core devices create a negative resistance, which cancels the resistance in the VCO tank, consisting of the varactors and the inductors. Compared to a Colpitts topology, the cross-coupled VCO oscillates for a lower transconductance, $g_m$, of the core devices [45]. Increasing the bias current, increases the negative conductance of the core, thereby making it easier for oscillations to start. As shown in Fig. 6.3a and 6.3b, a varactor is used to control the oscillation frequency, $f_{osc}$, given in (6.1) [45], where $L$ equals two times $L_{VCO}$ and $C_{tot}$ is the total capacitance, i.e. the sum of the varactor capacitance and all parasitics at the collector nodes.

\[
f_{osc} = \frac{1}{2\pi \sqrt{L C_{tot}}} \tag{6.1}
\]

All oscillators exhibit phase noise [86], [87], i.e. a fluctuation of the phase of the oscillation signal in time. Being very close to the carrier frequency, this noise cannot be removed by filtering. The phase noise profile for a typical VCO is outlined in Fig. 6.4, where $L(\Delta\omega)$ is the phase noise in dBc/Hz at a distance $\Delta\omega$ from the carrier. For offset angular frequencies less than $\omega_0/2Q$ the oscillator loop gain amplifies the phase noise and gives a phase noise slope of 6 dB/octave [86], [87].

---

**Fig. 6.3.** Common bipolar VCO core (a) and the VCO core in papers II and III (b)
For low frequency offsets, the 1/f noise of the active devices in the oscillator dominates and the slope is increased to 9 dB/octave. Compared to devices in CMOS technology, SiGe bipolar devices exhibit much less 1/f noise [19], [34] and are therefore advantageous to use in VCO design.

For a receiver, the oscillator phase noise will e.g. result in reciprocal mixing [88], i.e. the phase noise of the receiver mixer LO signal will result in down conversion of an interferer to the frequency of the desired signal. For a transmitter, the phase noise will increase the noise of the sent symbols, i.e. the BER will increase. The level of transmitted power outside the modulation bandwidth will also increase, resulting in a possible violation of the spectrum mask.

In modern transceiver architectures, quadrature LO signals, with a 90 degrees phase difference, are typically required. One way of generating such signals is to use a VCO that oscillates at twice the LO frequency, and supply the VCO signal to a divide-by-two circuit, such as a CML divider. Another way is to use a Quadrature Voltage Oscillator (QVCO) [89], as presented in papers I, II, III and VI. The QVCO architecture uses two VCO cores that are injection locked [89]-[92] to each other, e.g. as shown in Fig. 6.5b [89]. In mm-wave design the $f_T$ of the core devices and the frequency tuning range needed put a limit on the maximum fundamental oscillation frequency of the VCO. If a higher frequency is needed in the design, this can be circumvented by using the fact that in a cross-coupled VCO, a strong signal at twice the oscillation frequency is present at the emitter of the core devices. This is utilized in paper VI to extract a 56 GHz LO signal from the 28 GHz QVCO presented in papers I, II and III.

The VCO phase noise figure-of-merit (FoM) is defined by (6.2) [88], where $L(\Delta \omega)$ is the phase noise in dBc/Hz at offset frequency $\omega$ and $P_{DC}$ is the DC power consumption.
\[ \text{FoM} = L(\Delta \omega) - 20 \log\left(\frac{\Delta \omega}{\Delta \omega}\right) + 10 \log\left(\frac{P_{DC}}{1mW}\right) \] (6.2)

In papers I and II, a QVCO with a mismatch compensation [92] and detection circuit is presented. As shown in Fig. 6.5a, tunable varactors, controlled by \( V_{\text{tune},p} \) and \( V_{\text{tune},n} \), are connected to each QVCO output. By varying the control voltages the capacitance of the varactor changes, thereby mismatch of the active QVCO devices can be compensated. For layout symmetry reasons, the main varactors, controlled by \( V_{\text{ctrl}} \), have been split into two.

![Diagram of QVCO core schematic and architecture](image)

**Fig. 6.5.** 28 GHz QVCO core schematic (a) and architecture (b)

In paper VI, a complete 84 GHz E-band transmitter based on the frequency up-conversion architecture of Fig. 3.6 is presented. The differential 56 GHz LO signal is extracted from the tail nodes of the two QVCO cores, using a transformer as outlined in Fig. 6.6. The 28 GHz QVCO and the 56 GHz LO buffers, driving the 56 GHz mixer in Fig. 3.6, are biased through inductor center taps.
A Momentum view of the inductors and the 56 GHz transformer together with routing for the complete QVCO is shown in Fig. 6.7. The 56 GHz transformer with input signals $I_{56_p}$ and $I_{56_n}$ is located in the middle. The tank inductors for the 28 GHz QVCO with center tap supply are located to the left and right. The LO buffers for the 28 GHz mixers are connected to the four $Out_{28}$ nodes. The PLL divider is connected to two of the $Div_n$ nodes, however, for symmetry reasons routing has been included for all four outputs.

The QVCO inductors have an inner diameter of 50 μm and a trace width of 11 μm. Their differential inductance of equals 0.12 nH and they have a Q value of 18 at 28 GHz. For a well-balanced QVCO, i.e. a QVCO with a small I/Q phase error, it is important to optimize the symmetry of the routing.
6.2.2 Dividers

6.2.2.1 Miller Divider

Miller dividers are commonly used as the first stage in a divider chain if the input frequency is high. In [85] a 100 GHz PLL was designed with a divide-by-64 divider, where a Miller divider was used as the first stage. In papers II and III of this thesis, the QVCO frequency is only 28 GHz and CML dividers could therefore be used in the entire divider chain without significant power consumption penalty. The architecture of the Miller divider, consisting of an active mixer and a low pass filter, is outlined in Fig. 6.8.

![Miller divider architecture](image)

The feedback signal from the output, is amplified in the active mixer transconductance stage before mixing. A low pass filter is typically added at the mixer output to prevent mixing of unwanted harmonics. The Miller divider can also be designed using a passive mixer followed by a low-pass filter and a separate amplifier [45]. At higher input frequencies, the Miller divider consumes less power compared to a static (CML) divider and is therefore often used as the first stage in a divider chain [45]. With \( f_o \) as the Miller divider output frequency and \( f_{in} \) as the divider input frequency, the divider function is defined by (6.3), with the integers \( n \) and \( m \) being the harmonics of the mixing products [45].

\[
n \cdot f_{in} \pm m \cdot f_o = f_o \quad (6.3)
\]

Solving (6.11) for the output frequency \( f_o \) gives (6.4) [45].

\[
f_o = n \cdot f_{in} / (1 \pm m) \quad (6.4)
\]

By setting both \( n = 1 \) and \( m = 1 \), the often desired divide-by-two function is obtained.

\[
f_o = 1/2 \cdot f_{in} \quad (6.5)
\]

A drawback of the Miller divider is that other combinations of \( n \) and \( m \) can result in unwanted behavior. For instance setting \( n \) equal to 3 and \( m \) equal to 1 gives an output frequency, \( f_o \), higher than \( f_{in} \) [45].

\[
f_o = 3/2 \cdot f_{in} \quad (6.6)
\]
This is the reason why the low pass filter in Fig. 6.8 is often required to attenuate signals at frequencies higher than $f_{in}$ [45]. The Miller divider core topology given in Fig. 6.9 has been verified in [85] for an input frequency of 130 GHz.

The input signal to the core is supplied to the bases of the switching pair devices. These devices are smaller and therefore have a lower parasitic capacitance [85]. In order to increase the operating frequency, shunt peaking inductors, i.e. inductor $L_I$ with series resistor $R_I$, are used instead of purely resistive mixer loads.

6.2.2.2 Current-Mode-Logic Divider

In Fig. 6.10a, a low supply voltage Current-Mode-Logic (CML) [45], [93]-[96] latch, presented in papers II and III, is shown. The regular CML latch has a tail current source, and emitter followers at the output [45]. In the presented work, these were eliminated in order for the divider to operate with a supply voltage as low as 1.5 V. By connecting two latches in a negative feedback configuration, as shown in Fig. 6.10b, a divide-by-two function is realized [45]. The data outputs of the second latch are then connected to the data inputs of the first. This feedback configuration effectively yields an injection locked differential 2-stage ring oscillator, which will self-oscillate at a certain frequency without an input signal [95]. By designing the self-oscillation frequency of the divider to be close to the desired frequency of the application, the CML divider can operate correctly for very low input power levels.
An important property of the CML divide-by-two circuit, is that both the in-phase (I) and quadrature phase (Q) signals are available. This is often utilized in the generation of LO signals for quadrature mixers, where the VCO then operates at twice the LO frequency. Quadrature signals can also be created using QVCOs or 90 degree hybrids, however, regarding bandwidth and accuracy the CML divider is superior [45]. The VCO plus divider arrangement also has less sensitivity to oscillator pulling. A block layout of the four stage divider in papers II and III, dividing an input signal of 28 GHz to a 1.75 GHz output, is shown in Fig. 6.11. Excluding decoupling capacitors, it has a size of 287 µm x 85 µm and consumes 20 mW from a 1.5 V supply.

6.2.2.3 Injection Locked Divider

An injection locked divider [45], outlined in Fig. 6.12, is an injection locked oscillator where the input signal, $V_i$, is at a harmonic of the oscillator free running frequency $\omega_0$. When the oscillator is locked, the output signal, $V_o$, will then be at a
frequency $n$ times lower than that of the input, where $n$ is the harmonic. For the VCO in Fig. 6.12, the tail node, i.e. the emitters of devices $Q_1$, will have a dominant signal component at two times $\omega_0$ even without an input signal.

![Fig. 6.12. Injection locked divider schematic [45]](image)

Compared to Miller and CML dividers, injection-locked dividers can reach the highest operating frequencies [45]. The noise performance is also good due to the filtering in the resonance tank. An important drawback, however, is their narrow frequency lock range. This can be extended, at the expense of power consumption, by decreasing the Q-value of the resonance tank.

### 6.2.3 Phase detectors and charge pumps

The SiGe process used in this thesis did not include any CMOS devices. Therefore it was not possible to use a common phase-frequency detector (PFD) combined with a charge pump (CP). Commonly, the PFD is implemented in standard CMOS digital logic and used in conjunction with a CMOS charge pump, as outlined in Fig. 6.13. The charge pump injects or sinks current into a passive loop filter, depending on the phase and frequency difference between the PFD inputs [75]. Designing a charge pump using PNP and NPN devices, for the source and sink current source, respectively, was not feasible either due to the low $f_T$ of PNP devices. With a significantly lower reference frequency than 1.75 GHz, using a larger divider ratio than 16, a traditional implementation of the CP would, however, have been possible. The PFD in Fig. 6.13 uses D flip-flops, responding to the rising clock edges of their input signals, $REF$ and $DIV$ [75]. Starting after both flip-flops have been reset, a rising edge on either the $REF$ or $DIV$ signal, generates a 1 at the corresponding flip-flop output.
Fig. 6.13. Phase-frequency detector and charge pump

When the other clock signal then goes high, the other flip-flop output will also be set, i.e. equal to 1. Then with both flip-flops set, the AND gate resets both flip-flops to zero output [75]. The pulse length at either flip-flop output is therefore equal to the time difference between the rising edges of the two clocks. This is illustrated in Fig. 6.14 [97] showing how the pull-up (PU) and pull-down (PD) signals depend on the phase and frequency difference between the REF and DIV clocks.

When the PU signal goes high, the charge pump injects current into the loop filter, thereby increasing the voltage at its output, which is connected to the varactor of the VCO. A higher varactor control voltage increases the VCO frequency. When on the other hand, the PD signal goes high, current is drawn from the loop filter and the VCO frequency is reduced. The low pass transfer function of the loop filter results in a VCO control voltage that depends on the average current from the CP [75].

A bipolar PFD can be implemented with Emitter-Coupled Logic (ECL) as in [98]. However, since one of the targets of the thesis was to design a low supply voltage transmitter, using ECL was not an option. Therefore, as presented in papers II and III, an analog Gilbert type Phase Detector (PD) was designed, see Fig. 6.15. An advantage of this PD topology is that it has no dead zone [99] and therefore has high linearity for signals close to the output zero crossing. An important property of a PFD or a PD is the linear input range, i.e. the maximum phase difference range between the reference signal, $f_{ref}$, and the signal from the divider output, $f_{div}$, for which the phase detector provides a nearly proportional output. The PFD has a $4\pi$
linear range [100] compared to $\pi$ for the analog PD [83]. Apart from smaller range, the PD also lacks the frequency discrimination property of the PFD, which helps the PLL to acquire lock. To acquire lock over a larger frequency range with the PD, a dedicated supply voltage was therefore used for an active loop filter, presented in papers II and II, succeeding the PD. In a practical application, as an acquisition aid, a circuit can be used that sweeps this supply voltage automatically, e.g. by charging or discharging a capacitor.

![Gilbert type phase detector (PD)](image)

**Fig. 6.15.** Gilbert type phase detector (PD)

In paper II, the output voltages of the PD and the active loop filter were simulated versus phase difference between $f_{\text{div}}$ and $f_{\text{ref}}$, see Fig. 6.16. The PD had a gain of 0.55 mV/degree, which was amplified by the active filter to 8.6 mV/degree.

![PD and loop filter output voltage versus phase difference between $f_{\text{div}}$ and $f_{\text{ref}}$](image)

**Fig. 6.16.** PD and loop filter output voltage versus phase difference between $f_{\text{div}}$ and $f_{\text{ref}}$
6.2.4 Loop filter

The PLL loop filter can be either passive [75] or active [74]. A key PLL design parameter is the loop bandwidth [75], which determines the shape of the output noise spectrum and settling time. A passive loop filter is typically used together with the PFD and charge pump, where the charge pump output current is connected to node \( CP_{out} \) in Fig. 6.17. The node \( LP_{out} \) is the control voltage for the VCO.

![Passive loop filter](image)

**Fig. 6.17. Passive loop filter**

By selecting values for \( C_1, R_2 \) and \( C_2 \), the PLL stability and phase noise can be optimized [75]. With \( R_3 \) and \( C_3 \), an additional low pass link is formed, which can be used to further attenuate spurs from the reference clock [101]. The active loop filter [74] on the other hand is typically used together with the analog PD to increase the loop gain of the PLL, as described in papers II and III. The active loop filter of the presented PLL is given in Fig. 6.18.

![Active loop filter](image)

**Fig. 6.18. Active loop filter [74]**

The filter has a dedicated supply voltage, \( VCC_{LF} \), which is used to add acquisition and to order to provide lock over a wider frequency range. Devices with a \( BV_{CEO} \) of 4 V were used as \( Q_1 \) and \( Q_2 \) to sustain the higher \( VCC_{LF} \) voltage. The locking range of the PLL presented in papers II and III is shown in Fig. 6.19. It was measured by altering the reference frequency, \( f_{ref} \) for fixed values of \( VCC_{LF} \). The locking range varies between 120 and 340 MHz, and the PLL can operate between 24.6 and 27.8 GHz, i.e. it has a total tuning range of 12 %.
6.3 Mixers

In the mixer block [102]-[104], the baseband signal is up converted to the transmit carrier frequency. Mixers can be either active [4], [88], [102], [104] or passive [4], [88], [103] and use a double or single balanced architecture [88]. Due to performance advantages, the double balanced architecture dominates in transceiver design. With bipolar technology, as in this thesis, an active mixer architecture is preferred. Passive mixers require low-ohmic switches, which are not easily designed using bipolar devices. In a BiCMOS technology, active mixers always use bipolar devices due to their lower 1/f noise and rapid current commutation. In a passive mixer, the DC current is zero, so the issue with excess 1/f noise is therefore eliminated [103]. At mm-wave frequencies, mixer up conversion is far from ideal. Parasitic capacitances will make the switching slow and increase both the distortion and mixer noise contribution. Compared to the single balanced version, the double balanced mixer suppresses LO feed-through as well as common mode $IM_2$ products from the transconductance stage [104]. Another advantage is that duty-cycle mismatch of the LO-signals does not result in either an increased LO feed through to the mixer output or an increased level of $IM_2$ products [104]. The double balanced mixer also suppresses noise from the LO driver [88].

In paper VI, a transmitter providing up conversion of a baseband signal I/Q signal to an E-band carrier at 84 GHz is presented. The transmit chain uses in total three mixers, two mixers in quadrature for up conversion of the baseband signal to a 28 GHz carrier, plus one mixer for up conversion of the 28 GHz signal to an 84 GHz carrier, using a 56 GHz LO signal. The schematic of the 56 GHz mixer is outlined in Fig. 6.20a. The four signals $RF_{p,L}$, $RF_{p,R}$, $RF_{n,L}$ and $RF_{n,R}$ are the output signals from the two quadrature 28 GHz mixers. For layout reasons, these signals are merged first after the AC-coupling capacitors, before the transconductance stage devices $Q_2$. The mixer must be driven with a sufficient 56 GHz LO amplitude, otherwise the conversion gain will be reduced and the third order nonlinearity due
to the switching devices will increase. However, if the LO signal is large enough, the third order distortion is dominated by the transconductance stage. The degeneration resistor $R_1$, equal to 10 $\Omega$, increases the linearity of the mixer, without significantly degrading the voltage headroom.

![Up conversion mixer circuit](image)

**Fig. 6.20.** Up conversion mixer to 84 GHz carrier frequency, active implementation (a) and passive implementation (b)

For maximum power transfer to the PA, the imaginary part of the impedance in the interface between the mixer and PA must be cancelled. A single turn transformer with a mixer supply voltage center tap at the primary side, and a PA input stage bias center tap at the secondary side, was therefore designed. The up conversion mixer must handle a large signal at the output without compressing. A large bias current, of 6.2 mA was therefore used in each device of the mixer core, which requires large size devices, thus increasing the parasitic output capacitance. This puts a limit on the inductance and then also on the size of the transformer. In Fig. 6.20b, a passive implementation of the mixer in paper VI is outlined for a comparison. Compared to the active mixer, the passive mixer does not provide gain, thereby increasing the gain requirement on the PA, or before the mixer.

## 6.4 Power Amplifiers

### 6.4.1 Introduction

The power amplifier (PA) is a critical block in the transmit chain. Since it often dominates the power consumption, it also determines the overall efficiency of the entire transmitter. The most important performance parameters of the PA are the power gain, ($G_p$), the output compression point ($OCP_{1dB}$), the saturated output power ($P_{sat}$) and the output referred third order intercept point ($OIP_3$) [105]. One of the targets of this thesis has been to investigate the design of mm-wave transceiver
blocks in SiGe, that can operate with a supply voltage as low as 1.5 V. Compared to a design in CMOS technology [105]-[107], a design in SiGe usually requires a higher supply voltage [108]-[111]. For a circuit that should deliver high output power, e.g. a power amplifier (PA), the SiGe transistors have the advantage of combining high breakdown voltages with high $f_T$ [19], [112]. In contrast, for each new CMOS process generation, a higher $f_T$ is achieved at the cost of breakdown voltage reduction. For the parts of the transceiver that do not need to deliver high output power, however, a supply voltage reduction offers a great possibility to reduce the power consumption. A SiGe PA can actually be designed using a low supply voltage, as has been demonstrated in papers IV and V, by designing output transformers that perform both impedance transformation and power combination. However, SiGe mm-wave PA design with a supply voltage as low as 1.5 V poses several difficulties. Since the bias current of the output devices needs to be high, the size of the output devices must be large in order to fulfil the current density rules. Exceeding the current density limit would result in emitter crowding and the device models are no longer be valid [46]. Unfortunately, the size of the parasitic capacitances scales with device size, which constitutes a difficulty for the output transformer design. For maximum power transfer, the imaginary part of the impedance in the interface between the output devices and the transformer should be cancelled. This is accomplished by adjusting the size of the transformer inductance. Large capacitive parasitics implies a small transformer inductance, i.e. a small diameter of inductor. The losses of the transformer will then increase, thereby reducing the power added efficiency, PAE, of the power amplifier. If high performance can be achieved for a low supply voltage SiGe PA, it would be possible to design a single-chip complete low voltage transmitter with an integrated PA. Such a high integration level architecture would be a very attractive solution for an E-band beam steering transmitter.

### 6.4.2 Linearity and efficiency

For a weak nonlinearity, where the third order nonlinearity dominated the gain compression up to the 1 dB compression point, equation (6.7) applies [113].

$$OIP_3 = OCP_{1dB} + 9.6 \, dB$$  \hspace{1cm} (6.7)$$

Usually when performing mm-wave measurements, only one input signal source is available, preventing a two-tone measurement. Then (6.7) can be used for estimating the third order intercept point based on a compression point measurement. The power added efficiency (PAE) is defined in (6.8), where $P_{in}$ is the input signal power to the PA, $P_{out}$ is the power delivered to the load, and $P_{DC}$ is the total DC power consumption.

$$PAE = \frac{P_{out} - P_{in}}{P_{DC}}$$  \hspace{1cm} (6.8)$$
The maximum achievable power added efficiency of the PA depends on what modulation is used for the signal. Early E-band systems used simple modulation schemes, such as OOK (on-off keying), BPSK (binary phase shift keying) and FSK (frequency shift keying). High efficiency could then be achieved at the expense of low spectral efficiency. Spectral efficiency, measured in bit/s/Hz, is defined as the net bitrate of a communication channel divided by the channel bandwidth. The net bitrate is defined as the useful data rate, excluding bits used for error correction coding. Modulation schemes with high spectral efficiency typically use M-QAM modulation [27]. The signal then has a non-constant envelope. The crest factor, $\xi$, for a certain modulation, is defined in (6.9), where $P_{\text{peak}}$ and $P_{\text{avg}}$ are the peak and average power, respectively [114].

$$\xi = 10 \log \frac{P_{\text{peak}}}{P_{\text{avg}}} \quad (6.9)$$

The crest factor increases with higher order QAM modulation. Using a modulation with a high crest factor, the PA cannot always operate at high power where the PAE is maximized, since the power would then be above the compression point for some of the transmitted symbols. In Fig. 6.21, the relation between output power, $P_{\text{out}}$, and input power, $P_{\text{in}}$, for a typical PA is illustrated. Also depicted is the third order output and input referred intercept point, $OIP_3$ and $IIP_3$ together with the PAE. As can be seen, the wanted signal and the third order intermodulation product have a slope of one and three, respectively. The saturated output power, $P_{\text{sat}}$, is at 16 dBm, while the $OCP_{1\text{dB}}$ is equal to 13 dBm, giving an $OIP_3$ of 22.6 dBm according to (6.7). The maximum PAE is equal to 13 %, which is reduced to 6.5 % at 10 dBm output power.

![Fig. 6.21. Power amplifier compression, third order distortion and PAE](image)

The adjacent channel power ratio (ACPR) [114] is a measure of how much transmitted power leaks into the neighboring channels. At signal power peaks, the PA can be driven beyond its compression point, resulting in large intermodulation products. This results in strong spectral regrowth of the transmitted signal. If the
spectral regrowth is too large, it will disturb communication in neighboring channels. This is illustrated in Fig. 6.22, where PA nonlinearities cause intermodulation products that fall inside the neighboring channels.

The ACPR is defined in equation (6.10) [114] as the ratio of the integrated unwanted power in the neighboring channel, divided by the integrated power in the main channel.

\[
ACPR = \frac{\int_{\text{adj.channel}} P(f) df}{\int_{\text{main channel}} P(f) df}
\]  

(6.10)

The ACPR is a measure of the effects of transmitter nonlinearity on neighboring channels, while the error vector magnitude (EVM) describes the effects on in-band signals [114].

As the input power is increased, the intermodulation (IM) product power increases rapidly, until gain compression occurs. The higher level of IM-products degrades the ACPR, and the transmitted symbols with the highest amplitude will be less amplified compared with the symbols with low amplitude. The constellation diagram of the transmitted signal will therefore become distorted and the BER of the receiver will increase. The most simple way to linearize the PA is to use the so called back-off linearization technique. From Fig. 6.21, a 10 dB decrease in output power will result in that the third order intermodulation product, decreases with 20 dBc. However, the PAE of the power amplifier will be significantly degraded.

Another way to linearize the PA is to use predistortion in the transmit chain, thereby mitigating unwanted effects of PA nonlinearities. In baseband predistortion, as outlined in Fig. 6.23, the inverse of the power amplifier transfer function is implemented in the baseband before upconversion. If the PA exhibits gain compression, gain expansion is introduced in the predistorter [114].
Predistortion can also be implemented at other places in the transmit chain. In RF predistortion, the predistortion is added just before the PA. Another way to linearize the PA is to use the so called feedforward linearization technique [114], [115]. A typical feedforward architecture is outlined in Fig. 6.24 [114].

The input signal, in this case consisting of two signals at frequencies $f_1$ and $f_2$, is split into two paths. One path goes to the main PA and the other path is delayed and compared with the attenuated output of the main PA. At the main PA output, the third order nonlinearity of the PA will generate distortion products at $2f_2-f_1$ and $2f_1-f_2$. If delay 1 and $1/A_0$ match the delay and gain of the main PA, only the distortion products are amplified by the error PA. The output signal from the error PA is then subtracted from the output signal of the main PA. If the gain $A_0$ matches the gain of the PA, and delay 2 matches the delay of the error PA, only the wanted signals at $f_1$ and $f_2$, are left at the output. Also the distortion of the error PA is left, but as it processes a small signal compared to the main PA, it can be made much more linear. The main drawbacks of the feedforward linearization technique are that the PAE of the PA is reduced due to the power consumption of the auxiliary PA, the need for matched delays, and the signal combination needed at the output. In [115] a 24 GHz PA was linearized using this technique. The output power level with at least -40 dBc IMD₃ was increased by 3.7 dB to 7.9 dBm. With the feedforward linearization active, the measured ACPR was improved with 10 dB below 7 dBm output power.
6.4.3 Power amplifier transformer design

Transformers are key components in designing high performance power amplifiers. Especially at the PA output, where the signal power is high, any excess transformer loss will deteriorate the PAE of the power amplifier. When designing a multistage power amplifier, the design work should therefore start with the output stage plus the output transformer that is usually connected to a 50 Ω load. For minimum power loss, the reactive impedance in the interface between the output of the last active stage and the input of the output transformer should be zero. The output transformer, common for the 2-stage PAs described in paper V and the 3-stage PA in paper VI, is given in Fig. 6.25. The transformer has an inner diameter of 24 μm and a trace width of 5.6 μm. From Fig. 6.25b the loss at 84 GHz equals 1.08 dB.

![Output transformer without power combination layout](a) ![Insertion loss](b)

Fig. 6.25. Output transformer without power combination layout (a) and insertion loss (b)

It is also desired to maximize the PA output power, and the active devices in the output stage must then be made large in order to sustain high DC-currents. Large active devices, however, result in large parasitic capacitances. With too large a capacitance, the required inductance of the transformer will be very small, resulting in increased transformer loss, reducing the PA output power and efficiency. Careful optimization of the size of the active devices in the output stage and the dimensions of the output stage transformer is therefore necessary when designing a mm-wave PA. Due to the design rules of the b7hf200 process, all transformers in this thesis have octagonal shape. Comparing ADS Momentum simulation results for octagonal shaped and circular inductors, the differences at E-band frequencies are negligible.

6.4.4 Two and three-stage power amplifier design

To increase the gain of the PA, several stages are commonly cascaded into a multistage PA. Cascading several stages, however, always reduces the PAE. If devices with enough gain were available in a future technology, a PA with the highest PAE would only have one single stage. In paper IV, simulation and
measurement results are compared for two SiGe two-stage E-band power amplifiers. The architecture of both designs is outlined in Fig. 6.26. Both stage 1 and stage 2 have the supply voltage connected through a center tap on the primary side of the transformers. As described in paper IV, the two designs use a transformer based interstage matching with up transformation of the second stage input impedance to increase the power gain. The active devices operate close to the process maximum $f_T$ and therefore the device gain is limited. To mitigate the gain reduction due to the base-collector capacitance, the first design uses a conventional cascode topology [40]. The second design instead uses a capacitive-cross coupling technique [50]-[52]. At 84 GHz, the current gain only equals 1.9 times for the driver stage devices in the cascode design.

![Fig. 6.26. Two-stage power amplifier architecture](image)

The amplifier stage schematics of the cascode and cross-coupled designs are shown in Fig. 6.27a and Fig. 6.27b, respectively. To reduce the effect of process spread, the cross-coupling capacitances are implemented with diode connected transistors of the same type as the input devices.

![Fig. 6.27. PA stage schematics for the cascode (a) and cross-coupled design (b)](image)

To reduce the parasitic inductance from the emitters of the $Q_1$ devices to signal ground, the CEBEC device contact configuration was used. This configuration also reduces the parasitic collector resistance, $R_c$.

A Momentum view of the three transformers in the cascode design is shown in Fig. 6.28. A co-simulation of all three transformers is essential, since undesired
interaction between the transformers can alter the behavior of the PA. Since the width of the output devices is large compared to the diameter of the inductors, the interconnects in the Cu2 (red) layer to the center of the collector terminal were also included in the Momentum simulations. The isolation versus frequency between the first and second transformer, i.e. between the nodes $In_{1p}/In_{1n}$ and $Out_{1p}/Out_{1n}$ is illustrated in Fig. 6.29. At 84 GHz the isolation equals 20.9 dB.

![Momentum model for the transformers in the cascode design](image1)

**Fig. 6.28.** Momentum model for the transformers in the cascode design

![Coupling between first and second transformer](image2)

**Fig. 6.29.** Coupling between first and second transformer

The chip photo of the cascode design is shown in Fig. 6.30a. The majority of the chip area is used for on-chip decoupling. A photo of active part of the cross-coupled PA is provided in Fig. 6.30b. This photo also outlines the interconnect wires of the cross-coupled devices.
For paper VI, a three stage PA was designed using the capacitive cross-coupling technique. A three-stage architecture was necessary to increase the gain, and thereby relaxing the output power requirements on the TX mixer. The three-stage PA obtain a simulated gain of 21 dB at 84 GHz. Maximum PAE equals 17\% while it is reduced to 12\% at the compression point.

### 6.4.5 Transformers and power combination

Power splitting and power combination using transformers, is widely used in mm-wave power amplifiers [116], [117]. Using power combination, a PA can achieve high output power even with significantly reduced supply voltages. Since the active devices have limited breakdown voltages the signal voltages must be reduced within the design to prevent damage to the devices. This is achieved by transforming the input and output impedances to lower values, resulting in lower signal voltages for a given power level. The power level in the amplifier stages can also be reduced by splitting the signal and using several stages operating in parallel, each amplifying a lower power signal. Then the signals are combined to a single output. A key component in power combining PAs is the transformers used for splitting and combining the signal. It is essential to minimize the losses in the power combining output transformer. Ideally, without losses, a power combining PA with two branches should achieve 3 dB higher output power compared to its non-combining counterpart. The losses of the output transformer will, however, reduce this benefit. Compared to losses at the input, losses at the output are more critical since the signal power is higher there. Even with careful design, the loss of a transformer in the SiGe technology used in this thesis will be approximately 1 dB at 84 GHz. Two different ways of performing power combination using transformers have been analyzed. All
power combining transformers use the two thick top Cu3 (green) and Cu4 (yellow) metal layers to minimize losses. In paper V, a transformer tree, given in Fig. 6.31, transforms the 50 Ω load into a differential load of 25 Ω for each output. All three transformers were implemented with an octagonal shape with an inner radius of 18 μm and a trace width of 6 μm.

The advantage of the tree structure is that it inherently achieves a good phase and amplitude balance at the four terminals connected to the two amplifier output stages. As three individual transformers are required in the structure, however, the losses are high, making this structure less suitable as an output combiner. In paper V, a power combiner using two stacked transformers, given in Fig. 6.32, achieving the same function as the transformer tree in Fig. 6.30, is therefore used.

To minimize the loss at 84 GHz the transformer is tuned using a capacitor, $C_{\text{gnd}}$. A center tap on the primary side, is used to connect the supply voltage to the active devices. The benefit of using only two transformers is the lower loss, which in paper IV, was simulated to 1.3 dB at 84 GHz. The drawback is that the phase and amplitude at the four interfaces to the active devices are no longer inherently
balanced. This can, however, be mitigated by tuning of the two capacitors \( C_{t1} \) and \( C_{t1} \).

Compared to the cross-coupled architecture presented in papers IV and V, a power combining [116], [117] PA architecture, shown in Fig. 6.33, also using the cross-coupling technique, can achieve a higher \( OCP_{1dB} \) and \( P_{sat} \). A power combining PA was designed according to Fig. 6.33, with an optimum PAE for 1.5 V supply voltage. At the input, the RF input signal is split using two transformers in parallel. At the output, a stacked transformer, presented in paper IV, is used to provide a differential load impedance of 25 \( \Omega \) for each of the two output stages. The voltage swing at the output of stage 2 is thereby reduced, allowing for a higher output power without compression in the voltage domain. As can be seen in Fig. 6.32 and Fig. 6.33, the stacked transformer requires three tuning capacitors to optimize the performance. The PA was biased with a total current of 109 mA.

![Fig. 6.33. Architecture of the power combining PA design](image)

A layout of the PA is shown in Fig. 6.34. Excluding GSG pads, the design occupies an area of 0.033 mm\(^2\). The GSG pads at the output are offset to minimize the routing from the stacked transformer. For decoupling, both MIM capacitors, and MOM capacitors are used.
The simulated small signal parameters are shown in Fig. 6.35. As can be seen the PA achieves a maximum gain of 15.9 dB gain at 82 GHz. The minimum stability factor is equal to 3.3, i.e. the design is unconditionally stable.

The large signal performance, presented in Fig. 6.36 was simulated for a parasitic extracted view with a PSS analysis. As expected the design achieves a higher \(OCP_{1\text{dB}}\) and \(P_{\text{sat}}\) compared to the design presented in paper IV. While the design in paper IV achieves an \(OCP_{1\text{dB}}\) and \(P_{\text{sat}}\) equal 9.0 dBm and 14.7 dBm, respectively, the power combing PA achieves 12.6 dBm and 16.0 dBm.
Fig. 6.36. Simulated Pout and PAE for the power combining PA design

However, due to additional losses in the power combiner at the output, the peak PAE is reduced compared to the design presented in paper V. The cross-coupled design in paper V, achieves a peak PAE of 16.3 %, while the power combining has a peak PAE of 13.6 %. At the compression point, however, both the cross-coupled PA and the power combining PA have a PAE of 9.0 %.

Another popular technique is to make use of a Wilkinson combiner [45], [118]. A Wilkinson power combiner uses quarter-wave length transformers [45], [118] to merge two signals with equal amplitude and phase from parallel PA chains. The advantage of Wilkinson combiners is that they can be designed with low loss, however, since quarter-wave length transformers are used throughout the design, even at E-band frequencies these combiners tend to occupy a large die area compared to power splitters/combiners using transformers.
CHAPTER 7

7 Cellular integrated receivers

7.1 Background

The number of users of mobile terminals has increased dramatically, resulting in congestion in the frequency bands. Therefore an increasing number of frequency bands have been allocated for mobile communication. For LTE FDD [119]-[121], receive frequency bands between 700 MHz and 2700 MHz are defined, while LTE TDD [119], [120], [122], [123] operates in bands between 1800 MHz and 3800 MHz. While the receiver and transmitter in the first chipsets for E-GSM [121] only covered a single frequency band, the transceivers of today’s chipsets are multiband. Multiband support, however, results in an increased complexity of the RF front end, both on-chip and on the PCB. By using novel front end architectures, the increased complexity can be mitigated without sacrificing receiver performance. One way of reducing the complexity is to use single ended LNAs and mixers, which is the topic of papers VII and VIII.

A typical multiband direct conversion receiver (DCR) [88], [124] frontend is shown in Fig. 7.1. For simplicity, only two LNAs in the primary and diversity receiver paths are outlined. The RF signal from the antenna is down converted by the mixer to quadrature baseband signals, which are low-pass filtered by the baseband filter. The analog to digital converter (ADC), converts the analog signal into a bit stream for the digital part of the receiver. When support for antenna diversity [125] is implemented, the receiver contains two receiver chains connected to separate antennas, as shown in Fig.7.1. With two receivers connected to different antennas, the degradation in receiver performance due to fading dips of the received signal strength is strongly reduced.
Fig. 7.1. Multiband DCR with primary and diversity receiver chains [126]

If the radio system is operating in FDD, e.g. for WCDMA or FDD LTE [119]-[121], each RF input, except the diversity inputs, needs a dedicated duplexer [127] for the supported frequency band. The duplexer is used to isolate the receiver from the transmitted signal from the power amplifier (PA). Otherwise this strong signal would saturate the receiver. The duplexer also blocks noise at the receive frequencies, and attenuates out of band interference. It is typically implemented by a pair of acoustic wave filters (SAW or BAW). If the RF input is instead used for a TDD [119], [120], [122], [123] system, where the receiver and transmitter are not active simultaneously, like E-GSM, a SAW filter can be used to attenuate the out of band interferers [128]. A single-ended duplexer or SAW filter usually has 50 \( \Omega \) input and output impedance. Each LNA normally needs off-chip components to match that impedance, which adds both cost and PCB area. A single ended LNA architecture with programmable on-chip matching is presented therefore in paper VII. The single ended input both reduces the number of pins of the chip and makes routing of the RF signals on the PCB easier.

The DCR is the dominating architecture in cellular chipsets. The elimination of the IF-filter, present in the superheterodyne receiver [88], is the main advantage, but there are also performance issues that must be mitigated, especially in FDD radio systems. The DCR is sensitivity to second order distortion. In FDD radio systems like WCDMA and LTE, the transmitter (TX) and receiver (RX) are active at the same time. Quadrature Amplitude Modulation (QAM) is used for both the TX and
RX signals, i.e. the signals contain both amplitude and phase modulation. Due to finite duplexer isolation, a small part of the TX signal leaks into the receiver, where the amplitude part of the modulation together with second order distortion in the LNA and mixer will generate an in-band interferer. Since the distortion product is in-band, it cannot be removed by filtering. It therefore causes the bit-error-rate (BER) [129] of the receiver to increase, especially at high TX power levels. This is an important problem, since when the terminal is transmitting with full power, it is far away from the base station (BS), resulting in a weak RX signal which is vulnerable to interference. In paper VIII, a single-ended mixer with a feedback loop is therefore presented, that suppresses second order distortion. The design was made in a 0.25 μm BiCMOS process with an $f_r/f_{MAX}$ of 40/90 GHz. A second drawback with the DCR is that it is sensitive to low frequency flicker noise, which can be high in CMOS processes. A third drawback is the impact of DC-offsets on the receiver performance [124]. The most important design parameters for front end designs in general are noise figure, linearity, gain, and power consumption. In the thesis, two single ended front end designs are presented. High performance in the LNA and mixer relaxes the requirements on the succeeding analog baseband filters and ADCs.

In the terminals of today, other wireless systems like WLAN [121], [130], Bluetooth [121] and Global Positioning System (GPS) are almost always integrated together with the cellular radio. This can result in coexistence issues [24], due to low isolation, typically between the cellular antenna and the antennas of the other systems. The poor isolation, in the range of 10-15 dB, results in large interferers that leak into the cellular receiver, reducing its performance. In paper VII, a programmable multiband LNA is therefore presented that also addresses a coexistence issue with WLAN. The design was made in a 90 nm CMOS process.

### 7.2 Cellular receiver requirements

Transceivers designed for wireless communication standards like GSM/EDGE [131], 3G, and LTE use different modulation schemes for transmitting and receiving information. For each standard, there are specific requirements on the signal-to-noise ratio, SNR, to receive a radio signal with a certain bit-error-rate (BER) [129]. One way to increase the data rate of the radio link is to increase the number of points in the constellation diagram, e.g. transmitting with 64-QAM instead of 16-QAM. However, in that case, low BER then requires higher receiver SNR. The requirements on the individual blocks of the receiver, i.e. the LNA, mixer, baseband filter, and ADC depend on which radio architecture that has been selected. In today’s cellular receivers, the direct conversion architecture is the most commonly used. Therefore, all requirements in this chapter are analyzed with regard to this architecture.
7.2.1 The direct conversion architecture

In early chipsets developed for cellular radio communication, the superheterodyne receiver architecture [88] was common. With the rapid growth of the handset industry, research efforts were targeted towards investigating other architectures that could provide sufficient performance, but with a lower current consumption and at a lower cost. With smaller terminals, the PCB area occupied by the radio also became an important parameter. The superheterodyne radio requires an external image reject filter [88] together with a second down conversion mixer. If the local oscillator frequency, $f_{LO}$ is located at a distance $f_{IF}$ above the center of the RF signal, $f_{RF}$, an interfering signal at the image frequency, $f_{IMAGE} = f_{LO} + f_{IF}$ will be down converted to the same frequency as the wanted signal. The suppression of the image frequency, i.e. the image rejection ratio, is important for all architectures with an IF frequency, like low-IF [88] and superheterodyne receivers. The direct conversion (zero-IF) receiver architecture [124], requiring no IF-filter, was therefore advantageous. The high IF in a superheterodyne receiver also requires an off-chip IF filter, whereas the low-pass channel filters in a direct conversion receiver can be integrated on-chip. Receiver architectures are typically compared with parameters as current consumption, sensitivity, image rejection [88] blocker tolerance [120], and the required number of external components.

The architecture of the direct conversion receiver (DCR) architecture [124] is depicted in Fig 7.2. In contrast to the superheterodyne receiver, the RF input signal is down converted to baseband using a single mixer stage. Quadrature mixers are used not to lose any information about the modulated signal converting it to zero IF. The architecture is also known as the homodyne or zero-IF receiver.

The DCR architecture is the dominating choice for today’s cellular chipsets. In the DCR, the center of the channel is down converted to DC. There is thus no image frequency that can interfere with the wanted signal [124]. This is one of the main advantages with the DCR. A second advantage is the simplified baseband filtering, realized with only a low-pass filter [124]. There are four major issues of the DCR that must be mitigated during design, not to result in receiver performance degradation. These are: LO-leakage to the RF input, mixer output DC offset, second order distortion, and 1/f noise [124].

![Fig. 7.2. Architecture of the zero-IF receiver [126]](image-url)
1. **LO-leakage to the RF input**

Leakage of the LO-signal to the RF input can cause three problems: LO radiation by the antenna [124], DC offset at the mixer output [124] and cross modulation of the LO-leakage with the TX-leakage [132]-[139]. In the DCR, the mixer LO signal frequency is equal to $f_{RF}$, and it will therefore pass through the antenna substantially un-attenuated. There are regulations [120], limiting the maximum allowed power of the LO frequency and its harmonics at the antenna, to reduce interference for other nearby terminals [124]. In the own terminal, the LO-leakage will be mixed with itself and down converted to a DC voltage at the mixer output [124], that can saturate the baseband filter and ADC. In paper VIII, the feedback loop around the mixer core reduces the LO-leakage due to mismatch in the switching stage of the active mixer. The cross modulation will be elaborated in section 7.2.5.2.

2. **Mixer output DC offset**

Eliminating the mixer output DC-offset by AC-coupling is not possible since modulation schemes used for high speed communication, e.g. 64 QAM [129], contain information at frequencies close to DC. Instead offset cancellation techniques [124] can be utilized. The DC-offset is then continuously measured and averaged over time and subtracted from the output signal. The feedback loop around the mixer core in paper VIII also reduces the DC-offset.

3. **Second order distortion sensitivity**

Second order nonlinearities in the DCR can cause performance issues in TDD as well as in FDD systems. For E-GSM, the 3GPP specification states that the receiver sensitivity is tested with a simultaneously present AM-modulated interferer at 6 MHz offset from the received signal. In FDD systems the AM-modulation of the TX-leakage at the RF input creates second order distortion via self-mixing [124], second order nonlinearities in the active mixer transconductance stage, and second order nonlinearity in the switching mixer core devices [140].

4. **Mixer 1/f noise sensitivity**

In the DCR, the RF signal is down converted to DC, and the receiver is therefore sensitive to 1/f noise [124]. Comparing active mixers in bipolar and CMOS technology, the 1/f noise in the switching core using bipolar devices is much less, thereby making the passive mixer topology attractive in CMOS technology. A bipolar switching core with low 1/f noise is used in the active mixer core presented in paper VIII.

7.2.2 Sensitivity and noise figure

The receiver sensitivity [88], is defined as the minimum input radio signal level that gives a certain bit-error-rate (BER). A commercial GSM receiver today has a sensitivity better than -110 dBm. In a cellular communication system a high sensitivity is desired, since it means that each cell that is served by a BS can be made
larger. The number of cells can then be reduced with maintained coverage, resulting in large savings in network deployment costs. The noise figure of the receiver is determined by parameters of both the RFIC and the external discrete components. Most important are the noise figure (NF) of the complete RX chain, the insertion losses of the antenna switch RX path, and the losses of external filters, i.e. the E-GSM SAW filters or the duplexers for FDD systems. The receiver noise figure, the channel bandwidth, and the required carrier to noise and interference ratio, \( C/(N+I) \) for the detector, given a certain carrier modulation and BER, determines the sensitivity in scenarios with low interference. A high gain in the LNA and mixer is advantageous for the total NF, but the linearity can be compromised, resulting in an overall sensitivity reduction in scenarios with strong interference. Typically, receivers of today has an NF in the range of 2.5 to 3 dB.

A limit for the minimum input signal that can be detected by the receiver is set by the thermal noise of the resistance of the signal source (antenna). The available noise power \( P_{NA} \) is given by (7.1) where \( k \) is Boltzmann’s constant (1.38e-23 J/K), \( T \) is the absolute temperature in Kelvin, and \( \Delta f \) is the noise bandwidth in Hz.

\[
P_{NA} = kT\Delta f
\]  
(7.1)

At \( T = T_0 = 290K \), \( P_{NA} \) with \( \Delta f = 1 \text{Hz} \) equals 4.00e-21 W or -174 dBm. The noise factor, \( F \), (7.2) of the receiver is defined by the SNR at the receiver input and output [88].

\[
F = \frac{\text{SNR}_{in}}{\text{SNR}_{out}} = \frac{P_{\text{sig}}}{P_{R_s}} \frac{1}{\text{SNR}_{out}}
\]  
(7.2)

\( P_{\text{sig}} \) is the power of received signal per unit bandwidth, and \( P_{R_s} \) is the noise power of the source per unit bandwidth, i.e. -174 dBm/Hz in logarithmic scale. Integrating over a bandwidth \( \Delta f = B \) gives the signal power \( P_{\text{sig BW}} \) in (7.3)

\[
P_{\text{sig BW}} = P_{R_s} \cdot F \cdot \text{SNR}_{out} \cdot B
\]  
(7.3)

Using equation (7.3) together with the minimum required SNR value for the detector, \( \text{SNR}_{\text{min dB}} \) [88], gives the RX sensitivity, \( P_{\text{sens}} \) in (7.4)

\[
P_{\text{sens}} = P_{R_s \_dBm/Hz} + NF + \text{SNR}_{\text{min dB}} + 10\log(B)
\]  
(7.4)

where \( NF \) is the noise figure.

Using typical values for E-GSM, i.e. NF=3.5 dB including SAW filter and antenna switch, \( \text{SNR}_{\text{min dB}} \) equal to 10 dB, and \( B \) equal to 135 kHz, results in a sensitivity of -109 dBm which is about the performance of a receiver today. Note that the noise figure of the DCR is the double sideband noise figure, \( NF_{DSB} \). Using the single sideband noise figure definition, \( NF_{SSB} \), would give a 3 dB too high value [88].

If the mixer is driven by a square wave LO signal, the frequency spectrum contains odd harmonics. In the mixer down conversion, noise and interference at harmonics
of \( f_{LO} \) are then also down converted to baseband frequency [4], thus resulting in an increased receiver NF. The mixer in paper VIII therefore utilizes a tunable low-pass filter in its transconductance amplifier to suppress signals at \( 3f_{LO} \) and above.

### 7.2.3 Intermodulation

The receiver nonlinearities can cause in-band intermodulation distortion from interfering signals that are located outside the channel bandwidth. The interferers can be signals transmitted from the BS to other cellular terminals, from the own terminal, or from nearby terminals. Other radio communications systems, e.g. WLAN, active in the same area may also interfere with the own radio system.

#### 7.2.3.1 Second order intermodulation

The AM modulation part of an interfering signal can be represented as a two tones with two close frequencies, \( f_{TX1} \) and \( f_{TX2} \). For 3G or LTE, the largest interferer for the RX-part is the own TX leakage signal, due to finite isolation of the duplexer. The maximum allowed RX \( IM_2 \) level from TX-leakage can be derived from the 3GPP specification, stating the minimum RX sensitivity for full TX power, i.e. with 24dBm at the antenna port [120]. The scenario is illustrated in Fig. 7.3, with second order in-band distortion generated at a frequency \( f_{IM2} = f_{TX1} - f_{TX2} \) caused by a modulated interferer at the duplex distance, \( f_{duplex} \) [132].

![Fig. 7.3. Second order distortion with AM modulated TX-leakage [126]](image)

If the front-end has an input signal \( x(t) \) and an output signal \( y(t) \), and a nonlinear transfer function with second and third order nonlinearities, (7.5) applies.

\[
y(t) = a_1 x(t) + a_2 x^2(t) + a_3 x^3(t)
\]

(7.5)

For an input signal with two tones at \( \omega_1 \) and \( \omega_2 \), with amplitude \( A \), \( x(t) \) becomes (7.6)

\[
x(t) = A \cos(\omega_1 t) + A \cos(\omega_2 t)
\]

(7.6)
The nonlinearity of (7.5) will then generate second order distortion products, \( y_2(t) \) given by (7.7) at frequencies \( \omega_1 + \omega_2, \omega_1 - \omega_2, 2\omega_1, 2\omega_2, \) and DC [132], [133].

\[
y_2(t) = a_2A^2\left[ 1 + \frac{1}{2}\cos(2\omega_1 t) + \frac{1}{2}\cos(2\omega_2 t) + \cos((\omega_1 + \omega_2)t) + \cos((\omega_1 - \omega_2)t) \right]
\] (7.7)

The second order distortion scenario is illustrated in the intercept point diagram, see Fig. 7.4 for an amplifier with 0 dB gain. The level of second order distortion at the output, \( P_{o,IM^2} \), is due to the 0 dB gain identical to the corresponding level at the input, \( P_{i,IM^2} \).

![Second order intercept point diagram](image)

**Fig. 7.4.** Second order intercept point diagram

The second order input intercept point (in log-scale), \( IIP_2 \), for the intermodulation product at \( \omega_1 - \omega_2 \) calculated back to the input as \( P_{i,IM^2} \) for two interferers, each of power \( P \), is given by (7.8) [132], [133].

\[
P_{i,IM^2} = 2P - IIP_2
\] (7.8)

### 7.2.3.2 Third order intermodulation

Given a two-tone input signal as in (7.9) with amplitudes \( A_1 \) and \( A_2 \), the third order nonlinearity of (7.5) will create \( IM_3 \) products given by (7.10) [133]. This interferer scenario with third order intermodulation distortion is shown in Fig. 7.5.
Fig. 7.5. Third order distortion with interferers of different powers [126]

\[ x(t) = A_1 \cos(\omega_1 t) + A_2 \cos(\omega_2 t) \]  

Third order intermodulation products are created at \(2f_1-f_2\), \(2f_2-f_1\), \(2f_1+f_2\), and \(2f_2+f_1\). The two lowest frequency products are given by (7.10).

\[ y_3(t) = \frac{3}{4} a_3 A_1^2 A_2 \cos((2\omega_1 - \omega_2)t) + \frac{3}{4} a_3 A_2^2 A_1 \cos(2\omega_2 - \omega_1)t) + ... \]  

For the IM3-product at \(2\omega_1-\omega_2\) (7.11) applies in log-scale [133].

\[ P_{IM3} = 2P_1 + P_2 - 2IIP_3 \]  

7.2.4 Compression and desensitization

Compression and desensitization [126] of a receiver occurs when the power of the total input signal is high enough that the receiver is close to clipping. High power interferers can be present at the input together with the wanted signal. With increasing input power, the receiver gain is reduced and the receiver nonlinearities increase. If the gain of the front-end is decreased, the high noise of the baseband filter will have a larger impact on the overall NF, so called receiver desensitization. The input compression point, \(ICP_{1dB}\), is defined as the input power where the gain is reduced by 1dB. However, there are two different compression scenarios. In the first scenario, the wanted signal is too strong and compresses the receiver, i.e. the terminal is close to the base station (BS). In 3G, the maximum power of the wanted signal is -25 dBm [120]. The gain in both the LNA and the baseband filter is therefore commonly programmable to avoid compression and keep the SNR high enough not to increase the BER. In paper VIII a gain switch is implemented in the mixer transconductance amplifier. In paper VII, the LNA power consumption is reduced by reducing its bias current when the wanted signal power is high. In the second scenario, an out of band signal, or a signal close to the received channel compresses the receiver, so called cross compression. The signal to receive can then
be weak, making this scenario challenging. Receiver desensitization [88] will occur if the effects of the interferer are not mitigated.

In E-GSM the largest out of band interferer specified is at 0 dBm [120]. It is, however, strongly suppressed by the SAW filter. In 3G and LTE it is reduced to -15 dBm [120], [133] and attenuated by the duplexer to around -45 dBm [133]. The instantaneous interferer power, however, can be higher due to the crest factor [114]. In the interferer scenario that defines the GSM receiver cross compression point, there is a blocking signal at $\Delta f=3$ MHz from the carrier. The input compression point, $ICP_{1dB}$, for this interferer should be at least -23 dBm [120], [126]. In FDD systems, the required $ICP_{1dB}$ is determined by the own TX signal that leaks through the duplexer. Low frequency $IM_2$-products will then cause desensitization of the receiver. The topology of the active mixer presented in paper VIII can be therefore be altered depending on the TX power. At low TX power, the transconductance stage is DC coupled to the mixer switching core, while AC-coupling is used for high TX powers, thereby suppressing the low frequency $IM_2$ products from the transconductance stage. A commercial receiver typically has 0.5 to 1 dB desensitization at maximum TX power [127]. Receiver desensitization is also caused by down conversion of TX noise at the receiver LO frequency, and down conversion of TX signal by RX LO phase noise, so called reciprocal mixing [88]. Harmonic mixing, i.e. down conversion of interferers at $n\cdot f_{LO}$ can also degrade the receiver sensitivity. One example so called coexistence issues [24] is where a WLAN TX signal [130] is down converted by the third harmonic of the cellular LO frequency [141]. One way to mitigate this is to use a combination of 8-phase mixers driven by square-wave LO signals [142], which mimics a sinusoidal LO. Another is to use narrow band LNAs, as shown in paper VII. The problems with receiver nonlinearities are pronounced trying to remove off-chip filters, which increases out-of-band interference. Several works have none-the-less been presented on the topic of receivers without SAW filter in GSM/GPRS/EDGE [143]-[149], and duplexers in 3G/LTE [127], [150].

7.2.5 Second and third order distortion in 3G and GSM/GPRS/EDGE systems

7.2.5.1 Second and third order intermodulation in 3G FDD systems

For 3G FDD systems, there are in-band and out-of-band interferer scenarios that require a certain second order receiver linearity. The sensitivity to second order distortion is an important drawback of the zero-IF receiver [124]. The in-band requirement are set by the in-band blocking test, for which the receiver must handle a modulated blocking signal at either 10 MHz or 15 MHz offset from the wanted signal. The interferer together with receiver second order nonlinearities creates in-band $IM_2$. The minimum receiver $IIP_2$ is, however, set by an out-of-band requirement, where the interferer is the own TX-leakage. The receiver $IM_2$ level due to TX-leakage is tested in a 3GPP test case [120], that specifies the minimum
sensitivity with maximum TX signal at the antenna port, i.e. +24 dBm. There are three mechanisms that can create second order distortion products from an AM-modulated TX-leakage in the DCR [124], [132], [133].

1. *RF self-mixing* [88] when modulated RF signal leaks to the LO port.

2. *Second order nonlinearity in the mixer transconductance stage* [104], [140].

3. *Cross modulation of the LO-leakage* [132]-[139]

For third order nonlinearity, the limiting case for the front-end is when an interferer is present at half the duplex distance between the RX and TX frequency [120], [133] as shown in Fig. 7.6 for a TX-leakage into the LNA of power $P_1$ and a half duplex interferer of power $P_2$. The third order nonlinearity then generates an $IM_3$ product at the RX frequency.

![Fig. 7.6. Third order intermodulation product from TX-leakage and half duplex interferer [126]](image)

The power of the $IM_3$ product referred to the LNA input, $P_{in,IM_3}$, is given by (7.12) [126].

$$P_{in,IM_3}(dBm) = 2P_2(dBm) + P_1(dBm) - 2IIP_3(dBm)$$

The in-band $IIP_3$ requirement is set by the 3GPP adjacent channel selectivity (ACS) test case [120], with two blocking signals at 3.5 MHz and 5.9 MHz offset, respectively, from the wanted signal.

**7.2.5.2 Cross modulation of LO-leakage in 3G FDD systems**

Cross modulation [132]-[139] takes place when the AM-modulation of an interferer is transferred to a simultaneously present unmodulated signal. In relation to the power of the modulated interferer, the cross modulation looks like a second order effect, but it is the system third order nonlinearity that creates the transfer. Given an RX input signal, $x(t)$, which is the sum of an un-modulated interferer, $x_i(t)$ with amplitude $A_i$, and the AM-modulated TX-leakage with amplitude $A_2$ [133]:
\[ x(t) = A_1 \cos(\omega_1 t) + A_2 [1 + m(t)] \cos(\omega_{RX} t) \]  \hspace{1cm} (7.13)

In (7.13) \( m(t) \) is the amplitude modulation of the TX-leakage. Inserting (7.13) into (7.5) gives an output cross modulation product [133] given by (7.14).

\[ y_{\text{crossmod}}(t) = \frac{3}{2} a_3 A_1 A_2^2 (1 + m(t))^2 \cos(\omega_1 t) \] \hspace{1cm} (7.14)

The interferer at frequency \( \omega_1 \) has become amplitude modulated by the square of the amplitude of the TX-leakage, \( A_2 \), as depicted in Fig. 7.7.

![Fig. 7.7. Cross modulation of a CW interferer by AM-modulated TX leakage [126]](image)

As can be seen, the bandwidth of the cross modulated signal is two times that of the AM modulated interferer [134]. If a continuous wave (CW) signal is close to the wanted signal, cross modulation from an AM-modulated interferer can thus cause in-band interference. Using (7.5), the input referred cross modulation product, \( x_{\text{cross mod}}(t) \), can be written as (7.15), where \( \text{IIP}_3 \) is the input referred third order intercept point [133].

\[ x_{\text{cross mod}}(t) = \frac{y_{\text{crossmod}}(t)}{a_1} = \frac{A_1 2 A_2^2 ((1 + m(t))^2}{\text{IIP}_3^2} \cos(\omega_1 t) \] \hspace{1cm} (7.15)

To summarize, the cross modulation product is linear proportional to the interferer power and to the square of the TX-leakage. It is inversely proportional to the square of the \( \text{IIP}_3 \). Converting (7.15) to log-scale using power units at the input [134], [139] results in (7.16).

\[ P_{I_{\text{cross mod}}} \approx 6 + P_1 (\text{dBm}) + 2[P_2 (\text{dBm}) - \text{IIP}_3 (\text{dBm})] \] \hspace{1cm} (7.16)
The interferer power that overlaps the desired channel will be down converted together with the desired channel. Distortion generated by second order nonlinearities is common-mode, i.e. device mismatch is required to generate a differential $IM_2$ product. In contrast, distortion generated through cross modulation is added to the desired RF channel before down conversion. It will therefore appear as a differential signal at the mixer output. To be more accurate, the expression (7.16) must be modified to include a correction factor [133], [139] depending on the frequency distance between the interferer and the wanted signal, plus the TX modulation type. The expression is given in (7.17) with a correction factor including the 6 dB in (7.16).

$$P_{i\text{,crossmod}} \approx C_{\text{factor}} + P_1(dBm) + 2[P_2(dBm) - IIP_3(dBm)]$$  \hspace{1cm} (7.17)

In a zero-IF receiver, the LO leakage, i.e. the CW signal in Fig. 7.6, is at the same frequency as the center of the received RF channel. There is always a certain level of LO-leakage present at the LNA input, acting as an interferer. A large part of the cross modulation power will then become an in-band interferer. In paper VIII, the feedback loop around the mixer counteracts mixer core mismatch, and therefore also cross modulation due to LO leakage. Since the mixer LO signal is a square wave and contains odd harmonics of $f_{LO}$, the LO-leakage at these frequencies will also be cross modulated.

### 7.2.5.3 Linearity requirements in E-GSM/GPRS/EDGE systems

In the GSM system, the $IIP_2$ requirement is determined by a test case with an AM-modulated interferer 6 MHz from the wanted carrier with a power of -31 dBm [120], while the wanted signal is at -99 dBm. In a multimode receiver, supporting both GSM and 3G, this is however not the requirement that determines the minimum required $IIP_2$, which is then instead limited by the maximum allowed $IM_2$ products due to TX-leakage. For E-GSM, the $IIP_3$ requirement is set by a test case with one CW interferer at 800 kHz offset from the wanted signal together with a modulated interferer at 1600 kHz offset [120]. In a multimode terminal for both GSM and 3G, however, the limiting requirement for third order nonlinearity, is set by the half duplex interferer scenario [133] in 3G mode.
CHAPTER 8

8 LNA architectures

8.1 Introduction

The purpose of the LNA is to improve the noise figure of the complete receiver and to relax the noise requirements of the succeeding blocks, i.e. the mixer, baseband amplifier, and ADC [126]. If a narrow band topology is used, the LNA can also provide selectivity, i.e. it can attenuate unwanted interfering signals. Today, cellular terminals typically also support other communication systems, such as WLAN, that can interfere with the cellular radio. For instance, a WLAN TX signal at 5.8 GHz, leaking into the cellular LNA, will in the mixer be down converted by the third harmonic of the LO signal [141], thereby causing in-band interference at 1933 MHz. A narrow band LNA, as described in paper VII, will attenuate this interferer before it reaches the mixer. Wide band input matching LNAs, e.g. resistive feedback [151], [152] and common-gate [153]-[157]. LNAs have low selectivity, but on the other hand ideally do not require any external matching components. For high RX sensitivity, the LNA should have low noise figure in combination with high gain [126]. In an FDD system, the compression point of the receiver must also be high enough in order not to compress on the own TX signal that leaks through the duplexer. In a TDD system like E-GSM, the compression point is instead determined by a blocking signal at 3 MHz distance from the desired signal. The third order linearity should be high enough not to create in-band intermodulation products that fail the test cases specified. The current consumption of the LNA is dependent on the compression point requirement. In paper VII, the current consumption can therefore be reduced when the terminal is close to the base station, i.e. for the case when the own TX signal is weak [126]. In Fig. 7.1, the TX signal can leak through the finite isolation of the antenna switch to the input of an unused LNA for certain frequency bands where the TX frequency overlaps the RX frequency of the turned off LNA. This leakage path could then potentially be larger than the path through the duplexer. This issue is addressed in paper VII, presenting a multiband LNA with high isolation between the LNA inputs [126].

Integrated RF transceiver circuits contain an increasing digital part. It is not uncommon even to include a microprocessor, used for circuit calibration. In these system on chip architectures, digital signals, e.g. harmonics of the digital clock frequency, can leak to the LNA input through the supply lines, electromagnetic
coupling, or through the substrate [158]-[162], and become in-band interferers. The two single ended architectures, presented in paper VII and VIII, are sensitive to interfering signals at the ground, supply and substrate nodes. A differential LNA is commonly used in cellular receivers since it has built-in rejection of common mode interferers in the supply, ground and substrate, providing increased design robustness. Compared to differential architectures, single ended architectures are more sensitive to substrate noise and interference and require more careful substrate interference robust [162] design. Substrate noise is caused by different sources [158]-[162]. Depending on how the noise couples to the substrate the noise sources can be either internal or external [159]. The internal noise is created without interaction with the package, when e.g. digital gates are switched [161], and the noise is coupled through the parasitic capacitances of the active devices, wells and interconnects. The external noise is created when the noise path from the digital to analog part of the die goes through the power domain network. The current in the switching gates introduce noise on the supply lines. This noise is then coupled to the analog part through substrate contacts connected to the same supply and ground [158], [159]. External noise can be dominating, but can be reduced having small package inductances together with decoupling capacitors on the supply network [158]. As the number of supported frequency bands increases, and support for e.g. Wi-Fi, Bluetooth and GPS nowadays is mandatory, the single ended architecture becomes attractive due to the reduced number of RF input pins and smaller package size. Interferers in the supply, ground and substrate are only suppressed in the differential LNA, resulting in increased requirements on on-chip and off-chip decoupling as well as package impedance for the single ended architecture. Low impedance paths through the package to ground are important for isolation, e.g. for the effectiveness of shunting an undesired signal to ground. The performance of the multiband LNA presented in paper VII depends on a low impedance path to ground for increasing the isolation between different RF inputs [126]. Supply connections with low package impedances make it feasible to use single ended architectures that would otherwise have been hard to implement due to poor PSRR.

### 8.2 LNA architectures in CMOS technology

Compared to the common-gate (CG) topology [153]-[157], the common-source (CS) LNA with inductive degeneration [163]-[167] is the topology that can achieve the lowest noise figure (NF) [126], [163]. The CS LNA, however, has narrow band input matching and needs a matching inductor in series with the gate. The multiband LNA described in paper VII uses a CS topology with an on-chip inductor. For low $\omega_0/\omega_T$ ratios of the MOS input device, the NF of the CG LNA is much higher compared to the CS architecture [154]. Compared to the CS architecture, however, the CG LNA offers higher linearity and wider input matching [154], [163], [164]. With matched input, $g_mR_i$ equal to 1, the CG noise factor $F$ is given by (8.1) [154], with $\gamma$ and $\alpha$ being process- and bias-dependent parameters [154], [164]. The
minimum value of $\gamma/\alpha$ is equal to 2/3 corresponding to $\text{NF} = 2.2$ dB but in practice it is difficult to reach below 3 dB NF.

$$F = 1 + \frac{\gamma}{\alpha} \quad (8.1)$$

The architectures of the CS LNA, CG LNA and shunt feedback LNA is shown in Fig. 8.1.

![Fig. 8.1. Architecture of CS LNA with inductive degeneration (a), CG LNA (b) and shunt feedback LNA (c) [126]](image)

The resistive shunt feedback CMOS LNA has a wide-band input impedance [4], [151] given by (8.2), where $g_m$ is the transconductance of the input device.

$$Z_{in} = \frac{1}{g_m} \quad (8.2)$$

A wideband input is advantageous in some systems, such as ultra-wide band (UWB) receivers. Cellular multi-band receivers on the other hand need either a duplexer (FDD systems) or a SAW-filter (TDD systems) before each RF input. An LNA with wideband input matching could replace multiple narrow-band LNAs, but due to the band specific external filters, RF switches are then needed to connect the wide-band LNA to each filter or duplexer. Still the wide bandwidth would provide increased flexibility in which bands to support. An advantage of the resistive feedback LNA is that it does not require any matching components. If noise cancellation [151] is combined with this topology, the noise performance can be very attractive.

The source inductor, $L_s$, of the inductively degenerated CS CMOS LNA shown in Fig. 8.1, creates the real part of the input impedance, $Z_{in}$ [167]. From small signal analysis, the current through $L_s$ is given in (8.3), where $i_{in}$ is the input current at the gate.

$$i_L = \frac{g_m v_g - v_s}{g_m i_{in} s C_{gs}} = i_{in} + g_m i_{in} \frac{1}{s C_{gs}} \quad (8.3)$$

The voltage $v_L$ across the source inductor is then given by (8.4).
The input impedance \( Z_{in} \) [167] is defined by (8.5) and now includes a real part.

\[
Z_{in} = \frac{v_{in}}{i_{in}} = \frac{v_{L_s} + v_{C_{gs}} + v_{L_g}}{i_{in}} = \frac{g_m L_s}{C_{gs}} + s L_s + s L_g + \frac{1}{s C_{gs}}
\]  

(8.5)

The real part of the impedance is created by generating a current proportional to \( \frac{1}{s} \) (the channel current in the MOS device) and injecting it into an impedance proportional to \( s \), i.e. the source inductor \( L_s \). Substituting \( s \) by \( j \omega \) gives \( Z_{in}(\omega) \) in (8.6).

\[
Z_{in} = \frac{g_m}{C_{gs}} L_s + j \left[ \omega (L_s + L_g) \frac{1}{\omega C_{gs}} \right]
\]  

(8.6)

In a real design, ESD-diodes, routing and package add both inductive and capacitive parasitics, making the expression for \( Z_{in} \) more complex. At least one more component, besides the series inductor \( L_g \), is therefore often needed to match the LNA to the source impedance.

### 8.3 The common source LNA

#### 8.3.1 Input matching bandwidth and gain

A series resonance circuit is formed by the input port of the LNA [126] see Fig. 8.2. At resonance, the voltage across \( C_{gs} \) will then be \( Q_{in} \) times larger than the input voltage \( v_{in} \), where \( Q_{in} \) is the quality factor of the series resonance circuit.

![Model for input matching Q-value calculation](image)

**Fig. 8.2.** Model for input matching Q-value calculation [126]

The resonance frequency \( \omega_0 \), becomes

\[
\omega_0 = \frac{1}{\sqrt{C_{gs} (L_s + L_g)}}
\]  

(8.7)
The LNA input voltage \( v_{in} \) at resonance is given by (8.8)

\[
v_{in} = i_{in} R_{in}
\] (8.8)

The gate-source voltage \( v_{gs} \) is then given by (8.9).

\[
v_{gs} = i_{in} \frac{1}{\omega_0 C_{gs}} = \sqrt{C_{gs} \left( L_g + L_s \right)} \frac{i_{in}}{C_{gs}}
\] (8.9)

The ratio \( v_{gs}/v_{in} \) is equal to \( Q_{in} \) of the resonance circuit and is given by (8.10) [126].

\[
\frac{v_{gs}}{v_{in}} = \sqrt{C_{gs} \left( L_g + L_s \right)} \frac{1}{C_{gs} R_{in}} = \sqrt{\frac{L_g + L_s}{C_{gs}}} \frac{1}{R_{in}} = Q_{in}
\] (8.10)

At the resonance frequency \( \omega_0 \), the voltage across \( C_{gs} \) will be \( Q_{in} \) times the input voltage \( v_{in} \). In a real design it is of high importance for the bandwidth of the LNA to cover the dedicated frequency band with sufficient margin. Semiconductor process variations as well as changes in supply voltage and temperature will cause the input matching to drift. The Q-value of the input matching must therefore not be too high. The matching bandwidth is proportional to \( f_0/\omega_{in} \). In Fig. 8.3 the input matching is simulated for an input matching Q equal to 3 and \( f_0 \) equal to 2 GHz. A spread in \( C_{gs} \) of \( \pm 20 \% \) changes the center of the matching with 420 MHz. A nominal matching bandwidth of 430 MHz is achieved for \( S_{11} < -10 \, \text{dB} \).

![Fig. 8.3. LNA matching bandwidth with spread in \( C_{gs} \)](image)

The drain signal current of the device is then given by (8.11).

\[
i_{d \_LNA} = g_m v_{gs} = g_m v_{in} Q_{in}
\] (8.11)

Including the source resistor \( R_s \) in the Q-value calculation gives the drain current as a function of the source voltage \( v_s \) in (8.12).
\[ i_{d\_LNA} = g_m v_{gs} = g_m v_s Q_s \]  \hspace{1cm} (8.12)

The overall transconductance from the source input voltage to the drain output current, \( G_{m\_tot\_LNA} \), is given by (8.13).

\[ G_{m\_tot\_LNA} = \frac{i_{d\_LNA}}{v_s} = g_m Q_s = g_m \sqrt{\frac{L_s + L_s}{C_{gs}}} \left( R_s + \frac{g_m L_s}{C_{gs}} \right) \]  \hspace{1cm} (8.13)

The total transconductance, \( G_{m\_tot\_LNA} \), is \( Q_s \) times larger than the device \( g_m \), since the gate-source voltage is \( Q_s \) times larger than the source voltage \( v_s \). Using (8.7) and (8.14), the expression (8.13) can be expressed as (8.15) [126], [163].

\[ \omega_T = \frac{g_m}{C_{gs}} \]  \hspace{1cm} (8.14)

\[ G_{m\_tot\_LNA} = \frac{\omega_T}{\omega_0 R_s \left( 1 + \frac{g_m L_s}{R_s C_{gs}} \right)} \]  \hspace{1cm} (8.15)

If the real part of the input impedance is made equal to the source impedance, assuming impedance matching, the expression (8.15) can be further simplified in (8.16) [126], [163]

\[ G_{m\_tot\_LNA} = \frac{\omega_T}{2 \omega_0 R_s} \]  \hspace{1cm} (8.16)

### 8.3.2 Noise figure

The noise model [4], [168] for the CS-stage LNA with degeneration inductor \( L_s \) and a series matching inductor \( L_s \) is provided in Fig. 8.4 [126].
The noise of the LNA originates from a number of different sources.

- **Gate resistance**

  The gate electrode realized in polysilicon adds resistance effectively appearing in series with the gate [4], [126]. The resistance can be decreased by increasing the number of fingers in the device and by contacting the gate at both ends. The noise of the gate resistance is represented by the \( \overline{v_{rg}^2} \) source in Fig. 8.4.

- **Series resistance of inductors \( L_g \) and \( L_s \)**

  The inductor series resistances are represented by the \( \overline{v_{r-L_g}^2} \) and \( \overline{v_{r-L_s}^2} \) noise voltage sources for the \( L_g \) and \( L_s \) inductors, respectively [126]. The LNA presented in paper VII has a significant noise contribution from \( L_g \), since the Q-value of the on-chip inductor is limited.

- **Channel noise**

  The channel noise [4], [126], \( \overline{i_d^2} \) in Fig. 8.4 is one of the dominant noise sources in a CMOS CS LNA. This noise depends on device size (W/L) as well as biasing

  \[
  \overline{i_d^2} = 4kT\gamma g_{d0}\Delta f
  \]  

  (8.17)

  The parameter \( g_{d0} \) is the drain-source conductance when the device is biased at \( V_{DS} = 0 \) V [4]. The parameter \( \gamma \) equals 2/3 for a long channel device [4], [126]. For a short channel device, \( \gamma \) can be considerably higher.

- **Induced gate noise**
Electron movements in the channel induces a gate current due to capacitive coupling. When the channel is inverted, the induced gate noise is given by (8.18) [4]. This is one of the dominant noise sources in a CMOS LNA.

\[ \overline{i_g^2} = 4kT\delta g_s \Delta f \]  \hspace{1cm} (8.18)

In (8.19), \( g_s \) is given by (8.21) [4].

\[ g_s = \frac{\omega_0^2 C_{gs}^2}{5g_{d0}} \]  \hspace{1cm} (8.19)

The gate noise coefficient \( \delta \) equals 4/3 for a long channel device [4], [126]. For short channel devices, \( \delta \) can be considerably higher. The parameter \( g_{d0} \) is the same as for channel noise [4], [126]. The induced gate noise is partially correlated with the channel noise current, because it is generated by the same charge carrier [4], [126].

- **Flicker noise (1/f-noise)**

The 1/f noise [4], [103] is a low frequency noise caused by traps at the interface between the channel and the gate oxide. The traps randomly capture and release charge carriers. If the LNA is AC-coupled to the mixer, the 1/f noise of the LNA is effectively suppressed [126]. It is also suppressed if a tuned load is used at the LNA output. Flicker noise is thus seldom a concern in LNAs. The overall noise factor of the CS architecture, taking into account channel noise and induced gate noise only, given by (8.20) [163].

\[ F = 1 + \frac{\gamma}{\alpha} \frac{1}{Q} \left( \frac{\omega_0}{\omega_f} \right)^2 \left[ 1 + \frac{\delta \alpha^2}{5\gamma} (1 + Q^2) + 2|c| \sqrt{\frac{\delta \alpha^2}{5\gamma}} \right] \]  \hspace{1cm} (8.20)

where \( c \) is the correlation coefficient, typically about 0.4 [167]. An optimal Q-value for the input matching exists where the noise figure is minimized [165], balancing the contributions from channel noise and induced gate noise.

8.3.3 Third order nonlinearity

Compared to the collector current of a bipolar transistor, which depends exponentially on the input voltage [39], the MOS drain current is instead ideally proportional to the square of the input voltage. If the output resistance of the device is high enough, the relation between drain current and gate-source bias voltage \( V_{GS} \) is given by (8.21) [4], [39] [126], [169].

\[ I_{dc} = \frac{kW}{2L} (V_{GS} - V_r)^2 \]  \hspace{1cm} (8.21)
The large signal drain current $I_D(t)$ when applying a bias $V_{GS}$ plus a signal voltage $v_{gs}(t)$ is given by (8.22), [4], [39], [126]

$$I_D(t) = I_{DC} + I_d(t) = I_{DC} + k' \frac{W}{L} (V_{GS} - V_t) v_{gs}(t) + \frac{k' W}{2 L} v_{gs}^2(t)$$ (8.22)

There is no third order term present in (8.22). However, adding the effect of vertical field mobility degradation [4], [39], gives a modified relation between $I_D$ and $V_{GS}$ according to (8.23)

$$I_{DC} = \frac{k' W}{2 L} \frac{(V_{GS} - V_t)^2}{1 + \theta(V_{GS} - V_t)}$$ (8.23)

where $\theta$ is a process technology dependent parameter [39]. Taylor expanding (8.23) to a third order polynomial, the $IP_3$ at the device input can be found (8.24).

$$v_{gs_{-IP3}} = \sqrt{\frac{8(V_{GS} - V_t)}{3\theta}}$$ (8.24)

For a more accurate estimate of $IP_3$, also short channel effects like velocity saturation must be included. Furthermore, at low bias voltages the device will be in weak or moderate inversion, resulting in significant third order nonlinearity.
CHAPTER 9

9 Mixer architectures

9.1 Introduction

The mixer topology in cellular DCRs can be either active [4], [88], [170]-[173] or passive [174]-[177]. There are also single and double balanced versions of each topology. The double balanced versions are commonly chosen in commercial transceiver designs [126], as they have built in suppression of both common-mode (CM) signals, and LO feed through. In paper VIII of this thesis, however, the performance of an active single balanced mixer has been improved using a feedback loop, thereby mitigating some of the drawbacks of this topology [126]. According to Monte Carlo simulations [178], the $IIP_2$ and DC-offset of this mixer are sufficient for use in a 3G radio system. For multiband transceivers with many RF inputs, the combination of a single ended LNA and a single balanced mixer is advantageous. The number of RF input pins is then reduced, and there is no need for a tunable multiband on-chip balun, occupying large die area. The key metrics of the mixer are: conversion gain, noise figure, port isolation, second and third order intercept points, power consumption, image rejection, and harmonic rejection, i.e. conversion gain for harmonics of $f_{LO}$ [179]. Harmonic down conversion can be mitigated using narrow band LNA input matching in combination with a narrow band balun between the LNA and mixer, as described in paper VII [126].

9.2 Active mixers

Using BiCMOS technology, as in the design presented in paper VIII, the mixer switching core is implemented with bipolar devices in order to lower the 1/f noise [126]. Designing an active mixer in a process with only CMOS devices for a zero-IF receiver [180], [181], is due to excess 1/f noise, not preferred [126]. The active mixer can provide conversion gain from the transconductance stage, while the passive mixer topology results in a conversion loss [88]. This relaxes the gain requirement of the LNA in an architecture with an active mixer. In BiCMOS technology, the transconductance stage is preferably implemented with a MOS device, as in the design presented in paper VIII, since for a given tail current, the $IM_3$-product will be lower for a MOS device compared to a bipolar one. The 1/f
noise of the MOS transconductance stage is up-converted with $f_{LO}$ and does not impact the receiver NF. Comparing active and passive mixers, the active mixer isolates the I- and Q branch of the front-end. The lack of isolation must be mitigated in passive mixer designs, e.g. by using a 4-phase clock [175].

9.2.1 Single-balanced active mixer

The single balanced active mixer shown in Fig. 9.1 has an NMOS transconductance stage and a bipolar switching core, as in the design presented in paper VIII [126]. Compared to the double-balanced version, the mixer is, however, sensitive to noise from the LO driver [88], [181]. The reason is that at the differential output, $Out_p - Out_n$, the LO signals do not cancel as for the double balanced mixer [88]. Another issue is LO to RF leakage. With mismatch in the mixer switching core, the LO-leakage to the RF input will increase compared to a double-balanced mixer. In an FDD system with a TX leakage into the LNA, cross modulation [132]-[139] of the LO-leakage with the AM-modulated TX-signal, may impact the receiver sensitivity.

![Fig. 9.1. Active single-balanced BiCMOS mixer [126]](image)

The design in paper VIII is designed with a feedback loop around the mixer that suppresses the effects of mismatch. The architecture needs two large sized off-chip capacitors for low pass filtering in the feedback loop. The functionality of the DC feedback loop around the mixer in paper VIII relies on device up-scaling to improve the matching. Passive components as capacitors, and inductors can be placed inside the package, thereby reducing the number of external components.

The single balanced version does not suppress the LO signal and its noise at the mixer output [88], [181]. From Fig. 9.1, the LO noise source can be modeled as a noise voltage source in series with the base of devices $M2$ and $M3$. Short rise- and fall time of the LO thus reduces the noise. When both devices are turned on they will act as a differential amplifier amplifying the noise at the bases [88]. On the other hand, when e.g. $M2$ is turned on and $M3$ is turned off, $M2$ is operating as a
cascode and adds very little noise. M3 is then off and therefore does not contribute any noise. The noise from the transconductance stage M1 at the mixer output is maximized when the when either of the mixer core devices is fully turned on. Except for the noise of the LO and the intrinsic noise of the switch devices, the main noise contribution come from the input device M1 and the load resistors R_L.

When the mixer is switched with a large square-wave LO-signal, the transconductance stage determines the overall mixer third order nonlinearity [4]. The mixer presented in paper VIII is therefore linearized with a programmable two-stage feedback amplifier as transconductance stage. With a less ideal LO-signal, the third order nonlinearity of the switching core also impacts the overall linearity [4]. Either resistive or inductive degeneration, in combination with increased bias current, is commonly used to linearize the transconductance stage. The input second order intercept point of the mixer, IIP2, is an important parameter for the DCR [88].

Representing the AM-modulation of the TX-signal that leaks into the LNA with two close RF tones, separated by Δf, an in-band interferer at Δf will be generated due to second order nonlinearities of the mixer. The receiver sensitivity can be degraded by the following nonidealities [126]:

- **Self mixing**

The AM modulated TX interferer can leak to the LO side of the mixer and mix with itself [88], [132] to generate an IM2-product. With too low an LO-amplitude, the mixer will work more like a multiplier [173], thereby generating a stronger second order intermodulation product. If the TX leakage to LO_p and LO_n in Fig. 9.1 is exactly equal, i.e. in common mode, mismatch in the mixer core devices is needed to create a differential signal at the mixer output.

- **Transconductance stage second order nonlinearity**

The second order nonlinearity of the transconductance stage will create an IM2 product that appears at the mixer output as a common mode signal [104] [132], [140] Due to mismatch in the switching mixer core, load resistors, and LO driver, a differential IM2 product will also be present at the mixer output. Driving the mixer with a large swing LO signal mitigates the effects of mismatch in the LO driver and mixer core.

- **DC offset originating from LO-leakage at the LNA input**

LO-leakage at either the LNA or the mixer transconductance stage will generate a DC offset at the mixer output [88]. Effectively, the offset voltage will cause a mismatch of the mixer core devices, since their collector-emitter voltage, V_CE, will not be equal. The offset voltage thus increases the CM to DM conversion of IM2 products.
Second order nonlinearity of the switching mixer core

The devices in the mixer core will generate second order distortion. Since the $IM_2$ product is common mode there will be no differential $IM_2$ product, unless mismatch is present in either the switching devices, the mixer load resistors, or in the LO driver. Mismatch in the load resistors can be mitigated by increasing the device sizes. Increasing the size of the core active devices also reduces mismatch, but this will unfortunately reduce the switching speed resulting in increased $IM_3$ products [4], increased self-mixing [132], [140], and increased LO noise [88].

Cross modulation of the LO leakage at the mixer switching core input

The AM-modulation of the TX interferer that leaks into the receiver will transfer to the LO-leakage through cross modulation [132]-[139]. From (7.15), the cross modulation product is proportional to the LO-leakage, the square of the TX-leakage and the inverse of $IP_3^2$. The feedback mixer shown in paper VIII reduces the cross modulation effect by suppressing the mixer core mismatch that creates LO-leakage. If the LO signal is a square-wave with harmonics at $(2n+1)f_{LO}$, the AM modulation of the TX leakage will transfer to these harmonics as well. Depending on the LO leakage at the LNA input, cross modulation can also occur in the LNA.

9.2.2 Double balanced mixers

The topology of the doubled balanced mixer is shown in Fig. 9.2. It suppresses common-mode signals at the $RF_p$ and $RF_n$ input nodes [126]. Even with a mismatch between the $LO_p$ and $LO_n$ signals, second order common mode $IM_2$ signals at the output of $M1$ and $M2$ will be suppressed at the differential mixer output. Errors in LO duty cycle in the double balanced mixer thus do not increase the level of $IM_2$ products [104]. With the single balanced topology, mismatches in the LO signal will increase the leakage of $IM_2$-products from the transconductance stage. Another advantageous property of the double balanced mixer is that it suppresses noise from the LO-driver, resulting in a lower mixer noise figure compared to the single ended version [88], [126]. In Fig. 9.2, this noise is represented with the noise voltage source $v_n$. 
Due to its structure, the double balanced mixer has built in suppression of LO signal leakage to both the mixer output and the differential RF input [102] A duty cycle mismatch of the LO signal will not increase the LO feed through. In contrast, LO signal duty-cycle mismatch in the single balanced mixer will result in an LO signal at the RF input. For the double balanced mixer this leakage will be in common-mode. The leakage is, however, affected by device mismatch in the core, e.g. mismatch between M3 and M6 increases leakage.

9.3 Passive mixer

Comparing bipolar and CMOS devices, CMOS has a much higher level of 1/f noise. The 1/f noise originates from traps in the interface between the gate oxide and the silicon that randomly trap and release charges [103]. The passive mixer [4], [88], [103], is therefore the preferred architecture for DCRs in CMOS technology [4], [88], [126]. The 1/f noise is proportional to the DC current of the devices. Using a passive mixer, with zero DC current, therefore reduces the mixer 1/f noise significantly.

The conversion of a single ended active mixer to its passive counterpart is illustrated in Fig. 9.3 [103]. The bias current $I_D$ of the transconductance NMOS in the active mixer is removed and replaced with a capacitor in the passive mixer to illustrate that no DC current is flowing. The supply voltage $V_{DD}$ is replaced with a bias level $V_{CM}$. The resistor load in the active mixer is replaced with biasing resistors $R_B$ and load capacitors $C_L$. The multiband LNA presented in paper VII is intended to be used together with a double balanced passive mixer [126]. The simulated input impedance of the mixer was used as a load on the balun secondary side. The passive mixer is preferably implemented with CMOS devices, since compared to a bipolar device, the MOS device can have a high channel conductance with zero DC current. Ideally, since the DC current is zero, the passive mixer does not generate 1/f noise.
Due to device mismatch, a small DC channel current can however flow in the passive mixer [103].

**Fig. 9.3.** Single ended active (a) and passive mixers (b) [126]

RF signal self-mixing, generating second order distortion, can be a problem in passive mixers [103], [126]. A strong RF signal will modulate the switch gate-source voltage, $V_{gs}$, and therefore also the time varying conductance $g(t)$ [103]. The issue can however be mitigated by using a complementary architecture with both NMOS and PMOS switches [103]. The total conductance of the NMOS and PMOS switch $g_C(t)$, equal to $g_N(t) + g_P(t)$, will be independent of the RF signal under the condition that $\mu_P W_p = \mu_N W_N$ [103]. When the devices are switching it is difficult to maintain this condition [103], [126] and device bias optimization is therefore required to reduce the self-mixing. Even if RF self-mixing is minimized, second order distortion products can still be created due to nonlinearities and mismatch of the mixer devices [126].

A double balanced complementary passive mixer architecture, used in the DCR presented in [182], is shown in Fig. 9.4 [126]. The differential RF signals, $RF_p$ and $RF_n$, are created in an on-chip balun with center tap biasing at the secondary side, and the LNA connected to the primary side. The mixer outputs, $Out_p$ and $Out_n$ are loaded with the capacitive baseband filter input impedance. A four phase differential clock, providing clock signals LO_I_N, LO_I_P, LO_Q_N, and LO_Q_P, is used to drive the mixers. If an LO-signal with 50% duty cycle would drive the mixer, the mixer I- and Q outputs would be short-circuited when the I-and Q clock signals overlapped [126].
A four-phase clock [126], [175], [177], [182], with 25% duty cycle can be used to provide isolation between the I and Q path. The overlaps between the I and Q clock signals can then be eliminated. One way to generate the 25% clock signals is to use a VCO that operates at $2f_{LO}$. A succeeding I/Q frequency divider will generate signals at $f_{LO}$ with 50% duty cycle. By combining the 50% duty cycle signals at $2f_{LO}$ and $f_{LO}$ in NAND gates, 25% duty cycle clock signal can be created [126]. The passive mixer translates the low pass baseband impedance at its output into a high Q band pass filter at the RF input through the frequency translation effect [127], [146]-[149], [174], [175] Interferers at the RF input will therefore be attenuated. Another way of eliminating the effect of overlap in the I and Q signals is described in [183]. Here, each switch in the passive mixer consists of two transistors in series, controlled by different LO signals, thereby preventing overlap.
CHAPTER 10

10 Future work

The number of front end components, i.e. duplexers, SAW-filters, switches and matching components for LNAs, increase for each new product generation [126]. A typical terminal is equipped with an FEM (Front End Module) that contains the duplexer and SAW filters for frequency bands that are used in all regions. Depending on which region the terminal will be used in, other duplexers and filters are then added to the platform. The increasing number of frequency bands that are supported, together with radio performance enhancement features like RX diversity, MIMO, and carrier aggregation, have made the cost of the external components exceed that of the RF ASIC itself [126]. The RF inputs that are used for RX diversity do not require duplexers, but a SAW filter is still required. Research efforts have lately been targeted towards radio architectures that could potentially eliminate the need of both SAW filters and duplexers.

10.1 Duplexer elimination in FDD systems

In LTE, the maximum allowed TX output power is +24 dBm at the antenna [120] resulting in +26 dBm PA output power, assuming a loss of 2 dB in the duplexer [126]. Depending on manufacturer and frequency band, the duplexer attenuates the TX signal 50-55 dB. With 52 dB attenuation in the duplexer, a TX power of -26 dBm will be present at the LNA input. A topology that instead of duplexer filtering of the TX signal uses the concept of electrical balance is outlined in Fig. 10.1 [127], [150]. The PA feeds the center tap of the primary side of an on-chip transformer. The secondary side of the transformer is connected to a differential LNA. Using a tunable balancing network that mimics the antenna impedance, the TX signal will be strongly attenuated at the differential LNA input. The suppression of the TX signal is, however, very sensitive to mismatch between the antenna and balancing network impedances. If the antenna environment changes, e.g. the position of the hand holding the terminal changes, the TX signal attenuation will be strongly reduced.
Adaptive tuning with high resolution is therefore needed. Another drawback is that half of the TX power is lost in the tuned load, thus halving the effective PAE of the transmitter. Further research in isolation and antenna impedance tracking are therefore called for.

In the 3GPP specification for E-GSM it is stated that the receiver must be able to handle an out of band interferer with a power of 0dBm with a limited amount of sensitivity degradation [126]. With a simultaneous 0 dBm blocker at the RF input, the maximum NF should be 15dB [120]. The blocker can reside close to the band edge. In the PCS 1900 band the interferer is at 80 MHz from the band edge, while in the E-GSM low band the distance is only 20 MHz [126]. For a receiver architecture including a SAW filter at the RF input, the receiver is designed to manage a -23 dBm in-band blocker at 3 MHz from the wanted signal. The 0 dBm blocker is attenuated with at least 23 dB. One new RX topology with high compression point is the mixer first receiver [184], [185], that does not have an LNA in front of the mixer [126]. The noise figure of these architectures, however, tend to be too high to be commercially competitive [126]. Since there is no suppression of e.g. 3f_{LO} before the mixer, another large drawback is the harmonic down-conversion of RF signals at three times the wanted signal [126]. The LO leakage to the antenna input can be high and exceed the 3GPP maximum level [126]. In general, removing external filters is also troublesome due to reciprocal mixing. A strong interferer at an offset from the received frequency will be down converted to an in-band interferer by the LO phase noise. The requirements on the phase noise are therefore increased, resulting in a higher current consumption. In [186], a novel architecture, using a current mode passive mixer, is presented, where the baseband output is up converted with the LO signal and fed back to the RF input, thereby providing a programmable narrow band input match. The narrow band LNA is as also described in paper VII very advantageous for interferer suppression.

Fig. 10.1. Architecture using electrical balance to cancel the TX signal [126]
In the upcoming 5G cellular systems, the user terminals will support mm-wave frequency bands for high speed communication, since the RF channel bandwidth can be increased a lot [187]. Developing wireless communication systems at mm-wave frequencies, however, will result in new technical difficulties that need to be addressed in future exiting research fields. One issue is the high path loss in building materials at mm-wave frequencies, resulting in a high isolation between outdoor and indoor signals [187]. Another is rain and atmospheric absorption. A high path loss can be mitigated though, by shrinking the cell size. In [187], a study of mm-wave propagation at 28 GHz and 38 GHz showed a sufficient coverage in an urban environment with a cell-radius of 200 meters. With such a dense deployment, the price of each base station will become increasingly important for an operator to be competitive. Therefore, the integration level of the radio architecture needs to be high. In this thesis, building blocks for a complete fully integrated 1.5 V mm-wave beam steering transmitter has been developed that fit well into such a roadmap. Beam steering, as described in papers II and III, using phased array antennas can be used to reduce the effect of multipath fading in non-line of sight (NLOS) transmission. Since many antennas are required, the current consumption of each transmit or receive path needs to be kept low. Minimizing current consumption has therefore been an important target in papers I to VIII of this thesis. The 5G terminals also need to be multimode, resulting in e.g. large IC packages and complex PCB routing if differential LNAs are used. The work in papers VII and VIII is therefore based on single-ended front-ends, reducing the number of RF input pins.
11 Conclusions

The research described in this doctoral dissertation has been about two topics, millimeter wave beam steering transmitters for E-band wireless communication, and single-ended multiband LNAs and mixers for cellular receivers. Although the operating frequencies of the radio systems that the designs target differ, the design objectives are still largely the same. The main difference between designing an RFIC at 2 GHz or 84 GHz lies in the design process flow. For a design with an operating frequency at 84 GHz, parasitic inductance becomes extremely important to control. At cellular frequencies, these do not need to be considered. Another difference related to parasitic inductance, is that it is much more difficult to achieve a good signal ground at 84 GHz. Therefore for millimeter wave circuits, differential architectures utilizing a virtual ground are preferred compared to their single ended counterparts relying on a good signal ground. The transit frequency for the fastest devices of the silicon technology used for the presented cellular receivers, operating around 900 MHz and 2 GHz, were 40 GHz for the BiCMOS process and 120 GHz for the CMOS process. The millimeter wave designs were designed in a SiGe bipolar process with an $f_T$ of 200 GHz, i.e. the ratio between operating frequency and transit frequency is much smaller. With less gain available at millimeter wave frequencies, layout parasitics become more important to control. For the cellular designs the performance on a schematic level compared to a parasitic extracted level do not differ significantly, while for a millimeter wave circuit, the differences can result in a redesign and change of floor plan of the whole circuit. To mitigate the effect of capacitive parasitics, on-chip inductors and transformers, which resonate with the capacitive parasitics at the operating frequency, are heavily used in millimeter wave designs to increase the gain. In the work flow for designing the millimeter wave circuits, the majority of the design effort is on the layout level while for cellular designs a schematic representation is often sufficient for design optimization. Designing and simulating on-chip inductors and transformers is a time consuming task that will probably be easier in future design kits. For the designs in this thesis, the transformers were manually drawn and then simulated in ADS Momentum. The Momentum simulator is quite accurate but also time consuming. However, transformers could quite easily be designed using automated tools and
specified by number of turns, diameter and trace width. For transformer performance it would be beneficial to first use a tool that quickly can calculate the transformer parameters without using an electromagnetic simulator such as ADS Momentum. The more time consuming electromagnetic simulation could then be made on a more mature version of the transformer structure. The architectures of future millimeter transceivers for use in radio communication systems will probably undergo the same change as the cellular RFICs once did. In the early 90s, when terminals for mobile communication was just about to become a mass market, the integration level was quite low. The terminals only supported a single frequency band, and on the terminal PCBs there were quite a few ICs together with modules, external SAW filters and other passive components. The current consumption was high, resulting in short standby and talk time. With rapidly growing volumes and price erosion, research efforts were targeted towards making the chip sets cheaper, i.e. reducing the number of circuits and modules in the platform, eliminating as many off-chip components as possible, and designing RFICs that supported several frequency bands. Power consumption became an issue when consumers required longer battery time. Compared to differential front-end architectures, single ended topologies presented in this thesis have several advantages, especially in multiband architectures. Using single ended LNAs decreases the PCB and package complexity significantly. Utilizing the design techniques of the thesis, the performance of single ended LNAs and mixers can be made high enough to meet the stringent requirements on receiver performance for FDD systems like WCDMA and LTE. In the past, the integration level of the millimeter wave systems has been quite low, since volumes have been low. With the deployment of the future heterogeneous 5G networks including small pico- and femtocell base stations the price pressure on the base stations will most likely increase, resulting in a strive for more cost efficient solutions. High performance PAs have in the past been designed in GaAs and InP process technologies offering many advantages over Silicon technology. However, these technologies are expensive and do not offer any integration possibilities with digital control circuits. Upcoming new process technologies in both SiGe BiCMOS and CMOS have device performance that is good enough to start competing with GaAs and InP devices. Even if the mm-wave transceiver circuit might not be battery operated, low power consumption is also highly important for heat dissipation reasons, since thermal gradients on the chip can reduce performance significantly. The SiGe millimeter wave circuits developed in this work fits well in a possible future roadmap for E-band wireless communication. The target of the work has been to develop SiGe circuit blocks that can give adequate performance even with low supply voltage. With a low supply, power consumption is reduced and the supply can be common for both the RF and digital part. An interesting future research project would be to complete the entire beam steering transmitter with integrated PA and mount the die on an LTCC substrate with integrated antennas. This would certainly be a highly cost effective solution for E-band mm-wave communication.
References


[120] 3rd Generation Partnership Project (3GPP), www.3gpp.org


Part II

Included papers
Summary of included papers

Paper I

In paper I, “A 28 GHz SiGe QVCO with an I/Q phase error detector for an 81-86 GHz E-band transceiver”, a quadrature voltage controlled oscillator together with an I/Q phase error detector is presented. The design was implemented in a SiGe bipolar process with $f_T$ equal to 200 GHz. The QVCO was designed for an 81-86 GHz E-band transceiver. E-band transceivers use e.g. 64 QAM modulation schemes and the quality of the radio link is then sensitive to I/Q phase errors. In the presented design, the I/Q phase error can therefore be mitigated by tuning of four varactors connected to the QVCO outputs. The detector consists of two cross-coupled Gilbert mixers that generate a differential DC-voltage proportional to the phase error. At 1 MHz offset, the QVCO phase noise equals $-105$ dBc/Hz, corresponding to a FOM of $-181$ dBc/Hz and a FOM$_T$ of $-186$ dBc/Hz. The performance of the detector was verified using Monte Carlo simulations, giving a 3 sigma detector phase error of one degree. The QVCO consumes 14 mA from a 1.5 V supply. The detector and output measurement buffers uses a 2.5 V supply and consume 57 mA.

**Contribution:** I did the schematic design and layout and wrote the paper under supervision of the second author. The third and fourth authors contributed with their expertise in technical discussions.

Paper II

In paper II,”A 28 GHz SiGe PLL for an 81-86 GHz E-band beam steering transmitter and an I/Q phase imbalance detection and compensation circuit”, simulation results for a 1.5 V low supply 28 GHz beam steering PLL are presented together with measurement results for the QVCO and detector presented in paper I. The PLL was designed in a SiGe process with $f_T = 200$ GHz. The PLL is intended for use in an 81-86 GHz E-band transmitter. QVCO phase control was implemented with DC current injection into the load of a Gilbert type phase detector. Four current-mode-logic (CML) dividers were cascaded for a division factor of 16 and a reference frequency of 1.75 GHz. Closed loop simulation in Spectre RF using a Verilog representation of the QVCO resulted in a phase of $-115$ dBc/Hz at 1 MHz offset. The PLL consumes 52 mW from the 1.5 V supply, plus a minimum of 7 mW from a dedicated variable supply for an active loop filter of the PLL. The measured QVCO phase noise equals $-100$ dBc/Hz at 1 MHz offset. The detector functionality was verified in measurements, however, the internal QVCO mismatch of the measured samples was low.
**Contribution:** I did most schematic design and all layout of the PLL chip. The fourth and fifth author contributed with initial schematic design and simulation of the active loop filter and phase detector. I designed the divider and added the phase control functionality to the phase detector. I made most of the top level simulations with some assistance from the third author. The sixth author contributed in the design review with important comments before tapeout. I made all measurements of the QVCO and detector and wrote the paper under supervision of the second author.

**Paper III**

Measurement results for the PLL presented in paper II are provided in paper III, “A 1.5 V 28 GHz beam steering SiGe PLL for an 81-86 GHz E-band transmitter.” The PLL obtains a measured phase noise of -107 dBc/Hz at 1MHz offset. In this work, a small signal PLL model was also developed, obtaining good correspondence with the measured data. A feedback loop for the I/Q phase tuning presented in papers I and II was designed and simulated. The noise contribution of this feedback loop was added to the small signal model. The measured phase control sensitivity equals 2.5 °/µA with excellent linearity. A theoretical model for the linearity of the presented phase control is also provided.

**Contribution:** I made the measurements and wrote the paper under supervision of the seventh author. The third author contributed with help during the measurement setup. The second, fourth, fifth and sixth author have contributed in the design of the circuit presented in paper II and measured in paper III.

**Paper IV**

Design and simulation results for a two-stage 81-86 GHz E-band PA are presented in paper IV, “A SiGe Power Amplifier for 81-86 GHz E-band.” The PA is based on output power combination in a stacked transformer to enable the use of a low 1 V supply. Capacitive cross-coupling is used in both the driver and output stage to increase the gain. A low supply voltage is advantageous, since it eliminates the need of a separate voltage regulator for the PA in an integrated transceiver. The two-stage PA was designed in a SiGe process with $f_T = 200$ GHz and achieves a power gain of 12 dB. The saturated output power is equal to 16 dBm and the peak PAE is 14%. The robustness of the design with the cross-coupling devices, implemented either as MIM capacitors or diode connected transistors, was verified using Monte Carlo simulations.

**Contribution:** I made the design, performed all simulations, and wrote the paper under supervision of the second author. The third and fourth authors contributed
with technical suggestions during the design, and the fourth author also contributed with appreciated proof reading of the paper.

Paper V

Measurement results for two SiGe E-band power amplifiers (PAs), designed for the E-band at 81-86 GHz, are presented in paper V, "Comparison of two SiGe 2-stage E-band Power Amplifier Architectures". The PAs were designed in a process with four Cu metal layers and a maximum $f_T$ of 200 GHz. The first design uses a cascode topology with a 2.7 V supply voltage for both the driver and output stages. The second design has a supply voltage of only 1.5 V and instead uses capacitive cross-coupling to mitigate the effect of the collector-base parasitic capacitance on the power gain. The first design achieves a measured gain of 16 dB at 92 GHz, while the gain of the second design is 10 dB at 93 GHz. For the first design, the measured and simulated maximum gain correspond well, while the measured gain of the second design is 6 dB lower than simulated. By analyzing the measured and simulated stability factors and 3-dB bandwidth, it is concluded that the realized cross-coupling capacitance is too small. The large signal performance was simulated using Spectre RF. Both designs have a peak PAE of 16 %.

Contribution: I made the schematic and layout design of both circuits. I performed the measurements and wrote the paper under supervision of the second author. The third author contributed with help in setting up and calibrating the measurement equipment. The fourth author contributed in technical discussions.

Paper VI

In paper VI, "System simulations of a 1.5 V SiGe 81-86 GHz E-band transmitter circuit based on a 28 GHz QVCO", simulation results for a complete transmitter are presented. The circuit simulations are based on a parasitic extracted view of the individual blocks together with Momentum models for the inductors. To ease the requirements on the compression point of the transmit mixer, a three-stage PA based on capacitive cross-coupling was designed. The EVM of the complete transmitter was simulated for a 1 GHz 16 QAM signal using transient analysis. For these simulations, lumped models of all transformers and inductors were developed. At an average output power of +7.5 dBm the EVM equals 7.2 % including effects from phase noise corresponding to -100 dBC/Hz at 1 MHz offset and an I/Q phase error of 1.0°. A phase error of 1.0° is what can be achieved using the phase error tuner and detector presented in papers I and II.

Contribution: I made the schematic design and layout and performed all simulations. The MATLAB code required to extract the EVM was written in cooperation with the second author, where the second author wrote the core EVM calculator and I wrote the part that finds the minimum EVM for a certain output.
power level and phase. The third author contributed with setting up the simulation environment with a behavioral model sampler that saved the output data. I wrote the paper under supervision of the fifth author. The fourth author contributed with solving some simulation setup issues.

Paper VII

In Paper VII, “Single-Ended Low Noise Multiband LNA with Programmable Integrated Matching and High Isolation Switches”, a multiband single ended LNA and balun in CMOS technology is described. The LNA is implemented as a common-source stage with inductive degeneration together with on-chip programmable matching. In simulations, the design achieves 28dB voltage gain with 1.8 dB noise figure. By reconfiguration, it can cover band I, II and III, i.e. it covers frequencies from 1805 to 2170 Hz. Programmable switches preceding the LNA are shown, that can provide both low insertion loss and high isolation for the TX signal, important in a multiband architecture. The combination of a narrow band input matching and a balun is shown to have sufficient selectivity to avoid coexistence issues for a built-in WLAN transceiver, transmitting at 5.8 GHz. Since the noise figure is impacted by the integrated matching inductor, the performance of the design would be significantly enhanced in an SOI process with high-Q inductors.

Contribution: I made the design, performed the simulations and wrote the paper under supervision of the second author.

Paper VIII

Paper VIII,”” A BiCMOS single ended multiband RF-amplifier and mixer with DC-offset and second order distortion suppression “”, presents simulation results of a low NF multiband single ended LNA and single ended active mixer with a DC feedback loop around the mixer in BiCMOS technology. The design performance is verified for band I, III and VIII. The mixer transconductance stage has been implemented as a programmable current-to-current feedback amplifier with suppression of the 3rd harmonic of $f_{LO}$ for improved noise figure. The mixer topology consists of one main mixer and one trim mixer connected in parallel. The feedback loop around the mixer suppresses both mixer DC-offset as well as second order distortion. The low frequency noise from the feedback loop is suppressed by the partition into a main and trim mixer. In Monte Carlo simulations, the presented work achieves at least +47dBm $IIP_2$ in band I with 32dB conversion voltage gain, i.e. a performance sufficient for a 3G transceiver. The design utilizes two external large filter capacitors that can be placed inside the IC package.

Contribution: I made the schematic design, performed the simulations and wrote the paper under supervision of the main supervisor of my licentiate thesis, Dr. Pietro
Andreani at the department of Electrical- and Information Technology, Lund Institute of Technology
Paper I

A 28 GHz SiGe QVCO with an I/Q phase error detector for an 81-86 GHz E-band transceiver

A 28 GHz SiGe QVCO with an I/Q phase error detector for an 81-86 GHz E-band transceiver

Tobias Tired, Henrik Sjöland, Carl Bryant, Markus Törmänen, Lund University

Abstract—This paper presents a 28 GHz QVCO intended to be used in an 81-86 GHz E-band transceiver. E-band transceivers using e.g. 16 QAM modulation schemes are sensitive to I/Q phase error. Already a three degree error significantly degrades the bit-error rate, and careful control of the phase error of the 28 GHz QVCO is therefore required. In the presented design the phase error can be tuned using four varactors, each connected to one of the QVCO outputs. The phase error is detected in two cross-coupled active mixers, creating a DC-level proportional to the phase error. The accuracy of the detector has been verified by Monte Carlo simulations showing a 3 sigma phase error of one degree. The QVCO is designed in a SiGe process with \( f_T = 200 \text{ GHz} \). The current consumption is 14 mA from a 1.5 V supply and 57 mA from a 2.5 V supply. The 2.5 V supply is dedicated to the detector and output buffers. At 1 MHz offset the phase noise equals -105 dBc/Hz with a FOM of -181 dBc/Hz and a FOM\(_d\) of -186 dBc/Hz. The die area equals 1.3 mm\(^2\).

Index terms – mm-wave, QVCO, E-band, phase error, SiGe

I. INTRODUCTION

The E-band frequencies located at 71-76 GHz and 81-86 GHz are used for wireless point-to-point communication [1] and offers data rates in the gigabit range. One application for E-band links is wireless backhaul in LTE systems. The first generation of E-band radio links used modulation schemes such as QPSK tolerating higher I/Q phase error. At present spectral efficiency has become important even for E-band radio links and higher order modulation schemes e. g. 16 QAM and 64 QAM are now available in commercial radio links. Using higher order QAM modulation however puts stringent requirement on the I/Q phase error of the transceiver [1]-[3]. If not minimized, the bit-error-rate (BER) will increase rapidly with increasing phase error. The I/Q phase error originates from both static sources and random ones, changing from circuit to circuit. The random error is caused by device mismatch, whereas static errors can be caused by thermal gradients on the chip, especially important to control in a design with an integrated power amplifier. A mm-wave integrated design is sensitive to mismatch in capacitive parasitics and therefore designing a QVO operating at the TX carrier frequency is not preferred. Another reason is that the Q-value of the varactor decreases with increasing operating frequency. This paper describes the design and layout of a 1.5V 28 GHz QVCO with tunable phase of the quadrature output signals, together with a detector for I/Q phase error calibration. The QVCO is the core in the transmitter architecture depicted in figure 1. The 84 GHz TX carrier is generated in a two-step up conversion. The 28 GHz QVCO presented in this paper creates four LO signals, separated in phase by 90 degrees that drive the 28 GHz I/Q mixer. The 56 GHz LO signals for the second mixer are created by the differential second harmonic which is present at the emitters of the cross coupled transistors [4], see figure 3 outlining one half of the QVCO. The design is made in a 0.18 \( \mu \)m SiGe HBT process with four Cu metal layers and an \( f_T \) of 200GHz.

II. QVCO AND DETECTOR DESIGN

A. QVCO and detector architecture

The 28 GHz QVCO and detector architecture is illustrated in figure 2. Two identical VCO’s are locked together with coupling transistors to provide quadrature oscillation [5]. The four QVCO outputs, \( I_p, I_n, Q_p, \) and \( Q_n \) are connected to both one buffer driving the phase error detector and to an open collector output buffer required to measure the QVCO signal. In the presented design only one of the signals, \( Out_{I_p} \), from the output buffer is used for measurement. The other three buffer outputs are terminated on-chip to the supply. The output buffer must deliver high enough power for phase noise measurements.

Relative to the QVCO core, the power consumption of the detector buffer and the phase error detector is high.
These parts should therefore preferably only be active during calibration. Alternatively the detector buffer should be active even after calibration not to alter the QVCO load.

B. QVCO core
The schematic of one of the QVCO cores is depicted in figure 3. The supply is 1.5 V to reduce the power consumption, and the main stage and injection stage are biased with $I_c = 5.8$ mA and 1 mA, respectively. Generally larger main stage bias current improves the oscillator phase noise. The current sources are degenerated with resistors with a voltage drop of 180 mV to reduce their noise. In conventional VCO design with bipolar devices with high voltage swing at the output nodes, AC coupling between the base and collectors in the main stage core is utilized to prevent forward biasing of the base collector junction. In the presented design AC coupling has been eliminated, reducing core area and parasitics. At 360 mV peak output signal the base collector voltage equals 720 mV, i.e. on the limit of forward biasing. To keep the layout as symmetric as possible the tail current sources are duplicated for both cores of the QVCO.

Each part of the QVCO contains two phase error tuning blocks biased with control voltages $V_{low,j}$ and $V_{low,n}$, i.e. in total there are four tuning blocks biased with control voltages ranging from 0 to 7.7V, which is the maximum voltage of the varactor. In order to be able to correct for the phase errors of the QVCO, a sufficient range in capacitance of the frequency tuning block for varying control voltage is required. This was accomplished by scaling the size of the AC coupling capacitor and the phase tuning varactor. In the presented design the capacitance simulated on an extracted view of one phase tuning block standalone varies from 44 fF to 36 fF. It is important to keep the ratio of the total capacitance of the four phase tuning blocks and the frequency tuning varactors low enough not alter the QVCO frequency too much when the phase error is optimized.

C. Output and detector buffers
The output buffer given in figure 4 is designed as an open collector cascode. The buffers are required to be able to measure the QVCO output signal. Four buffers are included in the presented circuit, out of which three have their outputs connected to the on chip 2.5 V supply voltage $V_{CC_{op}}$. The fourth output buffer has its input connected to node $I_p$. The phase error originating from not having all buffer outputs connected to identical loads is less than one degree due to high isolation provided by the cascode. In a fully integrated transceiver the output buffers could be reduced in size not having to drive 50 Ω loads. In measurements the open collector output is connected to a bias-T and a 50 Ω load as depicted in figure 4a. Biased with $I_i = 7.8$ mA the buffer delivers -2 dBm at 28 GHz to the 50 Ω load. In order not to load the QVCO core with a too large capacitive load, thereby lowering the QVCO frequency, the input AC coupling is set to 50 fF and the input device is degenerated with resistor $R_E$. The cascode device improves the high frequency gain and isolation of the buffer. The four common collector detector buffers driving the detector are also connected to the QVCO output. The detector buffers are biased with $I_i = 3.3$ mA. A common collector architecture was selected for its high input impedance. Due to capacitive parasitics the stage will not operate as a voltage follower with unity gain at 28 GHz. Instead the QVCO output signal is attenuated while still high enough to drive the detector though.

The same approach to reduce the capacitive load as for the output buffer is used for the detector buffer. At the detector input the signal equals 215 mVpp.

D. Detector design
The detector is implemented as two cross coupled active mixers with a differential output voltage proportional to the phase difference from 90 degrees. Minimum internal detector phase error is crucial for the possibility to tune out the phase error of the QVCO. Monte Carlo simulations of the detector standalone have therefore been used to secure that the error of the detector is low enough not to impact the accuracy of the QVCO phase error tuning. The detector consists of two double balanced active mixers that are cross coupled. As depicted in figure 5 for mixer 1, the detector buffer output signals to the transconductance devices are denoted $I_p$ and $I_n$ and the signals to the switching pairs are named $Q_p$ and $Q_n$. Using only one active mixer as a detector would result in a detector that would present an output differing from zero volts for input signals that are 90 degrees out of phase due to internal phase shift in the active mixer. With two cross coupled mixers, i.e. with mixer 2 transconductance and switching pair signals swapped, the effect of the internal phase shift of the mixers is cancelled.
There are three different sources of error of the detector. The first is inductive and capacitive leakage coupling the LO and RF signals inside the active mixers. This error can be mitigated by careful layout, i.e. keeping the LO and RF signals well separated and preferably perpendicular. The second error arises from coupling between wires and differences in wire length in the routing of the four signals from the QVCO to the two active mixers as indicated in figure 6. The third error originates from mismatch of the active and passive devices in the active mixers. In the presented design this error has been minimized by device up scaling. Generally the high frequency performance of an active mixer is reduced if too large active devices are used, but in this implementation only the DC voltage at the mixer output is of interest, making it possible to reduce the detector error due to mismatch using large active devices sized a factor of 7 and 5 times the optimum size for maximum $f_2$ for the transconductance and switching pair respectively. The gain of the detector standalone is simulated to 22 mV/degree phase error. The detector requires a dedicated supply voltage of 2.5V, which is shared with the QVCO measurement buffers and the common collector buffers driving the detector. The bias current of the detector in figure 5 equals 12.9 mA. The detector is however only intended to be used during calibration and should thereafter be disabled.

E. QVCO inductors

To minimize capacitive losses to the substrate the inductors are implemented in the top Cu layer with a thickness of 2.8 μm. The octagonal inductors depicted in figure 6 are sized with an inner diameter of 50 μm and a trace width of 11 μm. The differential inductance of the inductor equals 0.12 nH with a Q value of 18 at 28 GHz. The supply voltage of each VCO is connected through a center tap on the inductor. It is highly important to place the phase error detector as close as possible to the detector buffers. Too large wire inductance in combination with a large detector input capacitance will otherwise create a low-pass filter that reduces the amplitude of the signal to the detector. Unequal wire length will also result in different amplitude and phase of the four LO signals. The QVCO was simulated and optimized using the ADS Momentum 2.5D EM simulator. In total a 22 port s-parameter model based on the routing layout in figure 6 was used in the simulations, accounting for coupling between the two VCO inductors as well as inductive and capacitive parasitics from wiring to both the measurement output buffers and the detector buffers.

III. CHIP LAYOUT

The chip layout is given in figure 7. The size of the die equals 928 μm x 1448 μm, of which the QVCO and detector occupy a smaller part, 370 μm and 515 μm. The remaining area is used for decoupling capacitors, both metal-insulator-metal type for low frequency decoupling and metal-oxide-metal capacitors realized by parallel planes for high frequency decoupling. In total there are 30 pads.

IV. SIMULATED PERFORMANCE

The performance of the QVCO and detector were simulated using the Cadence Spectre RF tool. Gummel Poon models were used for the active devices. The QVCO inductor and buffer routing depicted in figure 6 was modeled with the ADS Momentum tool. A second 12 port s-parameter model was used for the inductive and capacitive routing parasitics from the $I_n$, $I_p$, $Q_n$, and $Q_p$ connections given in figure 6 to the respective input of the mixers depicted in figure 5. All simulations are based on an extracted view of the design including capacitive and resistive parasitics. The simulated QVCO frequency vs. varactor control voltage $V_{ctrl}$ is given in figure 8. When the control voltage is too low, the varactors will be forward biased and the phase noise will increase strongly. Maximum control voltage equals the breakdown voltage of the varactor, 7.7 V, plus the 1.5 V supply voltage, i.e. 9.2V. The QVCO can be tuned between 26 GHz and 31.0 GHz, i.e. 18 % tuning range for a maximum phase noise increase of 3 dB. If the routing from the $I_n$, $I_p$, $Q_n$, and $Q_p$ connections given in figure 6 to the two mixers in the detector is not perfectly symmetric, the detector output voltage will differ from zero volts when there is exactly 90 degrees
between the signals $I_n$, $I_p$, $Q_n$, and $Q_p$ in figure 6. This adds a static offset of 2.4 degrees to the phase tuning.

The midpoint of the tuning voltage range equals half of 7.7 V, i.e. 3.85 V. A phase tuning voltage, $V_{\text{tune ctrl}}$, is defined to vary the four phase tuning varactors simultaneously by setting $V_{\text{tune ctrl}} = V_{\text{tune ctrl}, n} = 3.85 - V_{\text{tune ctrl}, p}$ and $V_{\text{tune ctrl}, p} = V_{\text{tune ctrl}, n} = 3.85 + V_{\text{tune ctrl}, n}$. In figure 9 the phase change and differential detector output voltage is simulated vs. $V_{\text{tune ctrl}}$ without inductive and capacitive coupling in the routing to the mixers. At the detector input, curve M2, the I/Q phase can be changed 14.5 degrees. The detector output voltage, curve M1, has a 0.44V output voltage range for a 14.5 degree I/Q phase error at the detector input.

The internal error of the phase error detector was verified in a Monte Carlo simulation with the detector driven by four LO signals separated by 90 degrees.

With 200 iterations $|\mu| + 3|\sigma|$ equals 23mV in figure 10. Since an I/Q phase error of 1.0 degree corresponds to a detector output voltage of 22 mV, the accuracy of the detector is high enough to detect a QVCO phase error of 1 degree. The simulated performance of the presented QVCO is compared to other published and measured QVCOS in table 1. The performance is in line with these works, but this work has the additional advantage of accurate quadrature generation through calibration.

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>[8]</td>
<td>SiGe QVO</td>
<td>Y</td>
<td>24.2-25.2</td>
<td>-112</td>
<td>1.35</td>
<td>24</td>
<td>-186</td>
<td>-178</td>
</tr>
<tr>
<td>[6]</td>
<td>SiGe QVO</td>
<td>N</td>
<td>36.4-38.4</td>
<td>-102</td>
<td>5.0</td>
<td>5.8</td>
<td>-146</td>
<td>-182</td>
</tr>
<tr>
<td>This work</td>
<td>SiGe Y</td>
<td>20.0-31.0</td>
<td>-105</td>
<td>1.5</td>
<td>21</td>
<td>-181</td>
<td>-183</td>
<td></td>
</tr>
</tbody>
</table>

Table 1. Performance comparison to published VCOs and QVCOS

V. CONCLUSIONS

The presented QVCO including a detector for I/Q phase error minimization makes it possible to optimize the transceiver for lowest possible EVM. This is of high importance for E-band radio links using higher order modulation schemes like 16 QAM and 64 QAM. The QVCO features dedicated varactors for quadrature phase fine tuning, with a range of 14 degrees. If feedback from the detector output to the control voltage of these varactors is implemented, the 3σ accuracy of one degree of the detector would enable excellent EVM performance.

ACKNOWLEDGEMENT

The authors would like to thank the Swedish government funding agency Vinnova, the System Design on Silicon (SoS) industrial excellence center, and Infineon Technologies for sponsoring this project.

REFERENCES

[8] Q. Zou, K. Ma, W. Ye, K. S. Yeo, “A Low Power Millimetre-wave VCO in 0.18 µm SiGe BiCMOS Technology, IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), 2012
Paper II
Paper II

A 28 GHz SiGe PLL for an 81–86 GHz E-band beam steering transmitter and an I/Q phase imbalance detection and compensation circuit

A 28 GHz SiGe PLL for an 81–86 GHz E-band beam steering transmitter and an I/Q phase imbalance detection and compensation circuit

Tobias Tired1 · Henrik Sjöland1,2 · Per Sandrup3 · Johan Wernehag1 · Imad ud Din2 · Markus Törnänen1

Received: 25 January 2015 / Revised: 11 May 2015 / Accepted: 18 June 2015 / Published online: 26 June 2015
© Springer Science+Business Media New York 2015

Abstract This paper presents two circuits, a complete 1.5 V 28 GHz SiGe beam steering PLL and a standalone 28 GHz QVCO with I/Q phase imbalance detection and compensation. The circuits were designed in a SiGe process with $f_T = 200$ GHz. The PLL is intended to be used for beam steering in an 81–86 GHz E-band transmitter. Phase control is implemented by DC current injection at the output of a Gilbert architecture phase detector showing a simulated phase control sensitivity of $1.2^\circ/\mu A$ over a range close to $180^\circ$. The simulations use layout parasitics for the QVCO, frequency divider, and phase detector, and an electromagnetic model for the QVCO inductors. The divider is implemented with four cascaded divide-by-two current-mode-logic blocks for a reference frequency of 1.75 GHz. For closed loop simulations of PLL noise and stability, the QVCO is represented with a behavior model with added phase noise. This simulation technique enabled faster simulation time of the PLL. The PLL in band phase noise at 1 MHz offset equals $-115$ dBc/Hz. Excluding output buffers, the entire PLL consumes 52 mW plus a minimum 7 mW from a variable high voltage supply required to extend the PLL locking range. The measured standalone QVCO equals $-100$ dBc/Hz at 1 MHz offset. Since E-band radio links utilize higher order QAM modulation, the bit-error rate is sensitive to I/Q phase error. In the measured standalone QVCO with I/Q phase imbalance detection and compensation, the error is detected in two cross coupled active mixers that have an output DC level proportional to the phase error. The error can then be eliminated adjusting the bias of four varactors connected to the QVCO outputs. The current consumption of the chip equals 14 mA from a 1.5 V supply and 57 mA from a 2.5 V supply dedicated to the detector and 28 GHz output measurement buffers.

Keywords PLL · E-band · mm-Wave · EM-simulation · QVCO · Beam steering

1 Introduction

Data links with gigabit capacity for wireless point-to-point communication [1] can be implemented in the two E-bands located at 71–76 and 81–86 GHz. To achieve sufficient range, highly directional antennas with narrow lobes have to be used at both receiver and transmitter side. To ease installation and maintenance, beam steering [1] would thus be beneficial. Beam steering can be implemented in the RF path, in the LO path, or in the baseband [2], where implementation in the digital domain is the most flexible solution but also has the drawback that separate ADCs and DACs are required for each receive and transmit path. The architecture of the LO signal generation is also important to consider. Using a single quadrature LO generation and distributing its four output signals across the chip to each transceiver is a less attractive solution [2, 3]. At mm-wave frequencies the power consumption of the LO buffers required for signal distribution would be very large in comparison with other transceiver parts. Long routing will also result in phase and amplitude mismatch of the LO signals [2–4]. Since spectral efficiency of the radio link is important, next generation E-band links will use e.g. 256
and 512 QAM modulation. To maintain a low bit-error-rate (BER) there will then be stringent requirements on I/Q phase error [5–7].

In the presented transmit path architecture given in Fig. 1, a 28 GHz QVCO is used to generate the 84 GHz TX carrier in two steps [3, 8, 9]. An architecture with a QVCO operating directly at the 84 GHz carrier frequency is not preferred due to stringent requirements on I/Q phase error for higher order QAM modulation [5–7]. The effect of device mismatch is worse in a higher frequency QVCO and I/Q mixer. Instead a 28 GHz I/Q mixer first up-converts the baseband signal. The 84 GHz TX signal is then created by mixing with the 56 GHz differential second harmonic present at the emitters of the cross coupled QVCO transistors [3, 8, 9].

This paper first describes the design and layout of circuit 1, a 28 GHz beam steering PLL, see Fig. 2.

For high speed, the frequency divider is designed with high speed current mode logic (CML) [10, 11], and divides by 16 for a reference frequency of 1.75 GHz. The phase detector (PD) [12–16] is using a Gilbert type architecture [17–19]. The loop filter is implemented as an active filter followed by a passive RC link [17]. Phase control is implemented by injecting a DC current into the phase detector load [3, 4]. The phase detector output current is close to proportional to the phase difference of the phase detector inputs. By offsetting the detector output current, its input phase difference must change to keep the output voltage and QVCO frequency constant. The issues of routing mm-wave LO signals across the chip have been avoided by using a separate PLL for each transmit path, see Fig. 3. The PLL is intended to reside as close as possible to the up conversion mixers, and the only signal routed to different transmit paths is the low frequency reference signal of 1.75 GHz. The phase of each LO signal can be controlled by injecting DC current into the loop filter of the PLL.

The important requirement of low I/Q phase error is addressed in circuit 2, outlined in Fig. 4, featuring a 28 GHz QVCO with I/Q phase control together with a QVCO phase error detector. In the detector, the QVCO phase error generates a differential DC output voltage that is proportional to the phase error between the I and Q output, i.e. to the phase deviation from 90° [8]. The four outputs from the QVCO, $I_p$, $I_n$, $Q_p$, and $Q_n$ are connected to...
the measurement output buffer as well as a buffer driving the phase error detector. Measurement results are presented for both the QVCO and the detector and are compared with previously simulated performance of this circuit [8].

Both circuit 1 and 2, outlined in Figs. 2 and 4 respectively, are designed in a 0.18 μm SiGe HBT process with four Cu metal layers with a top layer thickness of 2.8 μm. The process does not have any MOS devices. In this work, SiGe technology was selected instead of CMOS due to planned integration of the transmitter with a power amplifier (PA). An advantage with SiGe technology is the high \( f_T \) in combination with high base collector breakdown voltage, \( BV_{CEO} \), making it possible to design millimeter wave PAs with high output power. There are three different npn devices available optimized with different tradeoffs between open base collector emitter breakdown voltage, \( BV_{CEO} \), and transit frequency, \( f_T \). For all blocks except the active low pass filter, a device with a nominal breakdown voltage of 1.5 V is used. In the active low pass filter of circuit 1, using a higher supply voltage, a high voltage of 1.5 V is used. In the active low pass filter of circuit 2, outlined in Fig. 4, a high voltage of 1.5 V is used.

In the detection path, a device with a nominal breakdown voltage \( BV_{CEO} \) equal to 4.0 V and a lower \( f_T \) of 35 GHz. Where applicable, metal-oxide-metal (MOM) capacitors have been formed by connecting the Cu 1 and Cu 2 layers together to form a bottom plate, as well as connecting the Cu 3 and Cu 4 layers to form a top plate. At mm-wave frequencies, e.g. at 28 GHz, the Q-value of the custom made MOM capacitors exceeds that of the MIM capacitors provided by the SiGe technology [8]. Since the supply is 1.5 V and the process does not have any MOS devices it is difficult to design the logic blocks required for a conventional phase frequency detector using a charge pump [9, 12–16].

2.2 QVCO core design

The schematics of the QVCO core and architecture for I/Q phase error tuning [8, 9], used both standalone, see Fig. 4, and in the PLL, see Fig. 2, are shown in Fig. 5(a) and (b) respectively. The main and injection stages are designed with bias currents of 5.8 and 1.0 mA respectively. The supply voltage equals 1.5 V. For layout symmetry purposes there are two main varactors realized as reversed biased pn-junctions and controlled with bias voltage \( V_{ctl} \). The QVCO inductors, \( L_{QVCO} \), are simulated and modeled using ADS Momentum. The design is identical to the QVCO presented in [8, 9], except that the main varactor is 10 % smaller in the PLL design in order to increase the maximum oscillation frequency. Each part of the QVCO contains two phase error tuning blocks biased with control voltages \( V_{tune,p} \) and \( V_{tune,n} \). With two QVCO cores there are four tuning blocks biased with control voltages ranging from 0 to 7.7 V, which is the maximum allowed voltage of the varactor in the SiGe process. The I/Q phase error can be minimized by altering these control voltages [8]. Changing its control voltage from 0 to 7.7 V reduces the capacitive load of one tune block from 44 to 36 fF. This is a sufficient capacitance change to enable a simulated I/Q phase tuning of 14.5° [8]. With control voltages as high as 7.7 V the relation between control voltage and phase error is not proportional.
perfectly linear [8]. This nonlinearity can however be compensated for using a look-up table.

2.3 Frequency divider design

The frequency divider [9] in Fig. 2 is implemented with four cascaded CML divide by two circuits [10, 11]. The architecture of the CML latch and divider are given in Fig. 6(a) and (b), respectively. The divide by two function is realized with the two latches in Fig. 5(b) connected in negative feedback with data outputs of the second latch connected to data inputs of the first latch. The presented divider performance is optimized for a supply voltage as low as 1.5 V. Common CML dividers in SiGe technology operate with higher supply voltages allowing for tail current sources, and also for emitter followers at the CML latch output [10, 11]. Low supply voltage operation has been achieved with the topology shown in Fig. 5(a) in combination with a low voltage swing at the latch output nodes \( Q_p \) and \( Q_n \). If the voltage swing is too large, the base–collector junction of the lower pair will be forward biased when the voltage at \( Q_p \) and \( Q_n \) is at its lowest value, thus significantly reducing the maximum divider frequency.

The first divide-by-two stage is biased with a total current of 6.0 mA and has load resistors of 70 \( \Omega \). The second stage consumes 3.3 mA with load resistors equal to 150 \( \Omega \). The third and fourth stages are equal and consume 2.1 mA each and are loaded with 200 \( \Omega \). The total power consumption of the divider equals 20 mW. For isolation purposes two differential buffers, see Fig. 8(a), are placed between the QVCO and the divider. The buffers are biased with tail currents of 1.1 mA and have load resistors \( R_c \) of 400 \( \Omega \). One buffer is driven by the \( I_p \) and \( I_n \) QVCO outputs and a second buffer is driven by the \( Q_p \) and \( Q_n \) outputs. The second buffer is connected to the divider while the first buffer is unloaded. Using only one buffer, the load of the QVCO would become asymmetric resulting in a large I/Q phase error.

2.4 Phase detector and phase control design

A conventional PLL designed in a CMOS or BiCMOS technology is usually implemented using a phase frequency

---

Fig. 5 QVCO core schematic and architecture. a QVCO core schematic, b QVCO architecture

Fig. 6 CML divider block. a CML latch, b CML divider
detector (PFD) and a charge pump (CP) [12–15]. The PFD can be implemented using two D-type flip flops and an AND gate [13]. In the presented PLL, see Fig. 2, a Gilbert mixer type of phase detector (PD) [17–19] is instead used. A drawback of using this type of phase detector compared to a PFD is the reduced input phase range, which in the PD is \( \pi \) radians [19] and in the PFD \( 4\pi \) radians [13]. Furthermore, the common architecture comprising a PFD and CP generates current pulses that are dependent on both the frequency and phase difference of the reference signal \( f_{\text{ref}} \) and the divided signal of the VCO, \( f_{\text{div}} \), facilitating PLL lock acquisition. The Gilbert type phase detector, on the other hand, has a DC output that depends on the phase difference alone. If the frequency of the two input signals \( f_{\text{ref}} \) and \( f_{\text{div}} \) are different, the output signal from the PD will be time varying resulting in potential PLL locking difficulties. In Fig. 7 outlining the PD, the divided and buffered QVCO signal is connected to the voltage-to-current converting transistors Q1, while the buffered reference signal is supplied to the bases of the current commutating transistors Q2. If the current switching from one branch to the other takes long time there will be a region of reduced gain in the output voltage versus input phase offset characteristic. Therefore a limiting buffer has been placed between the external sinusoidal reference signal and the PD to make the reference signal more square wave shaped with fast transitions. The resistive load \( R_1 \) at the PD output is chosen to 50 \( \Omega \) to give enough voltage headroom for the output signal. To attenuate signal feed through of harmonics of the reference frequency, capacitors \( C_1 \) of 2.5 pF are placed in parallel with the resistive load. The \(-3\) dB frequency equals 1.3 GHz, which is far above the loop bandwidth and will thus not affect the loop dynamics. The gain of the PD depends on the size of the load resistors \( R_1 \) and the collector bias current, \( I_{c,PD} \), of the transconductance transistors. The current was equal to 0.7 mA, which resulted in a gain of 0.55 mV/degree.

One way of mitigating the drawbacks of the PD is to implement a bias circuit that during the PLL locking process sweeps the VCO control voltage in order to make \( f_{\text{ref}} \) and \( f_{\text{div}} \) equal at a certain time, thus making the PLL lock more quickly and reliably. In this work, however, the sweep circuit has not been implemented. Instead PLL locking is enabled by using a separate and variable supply voltage for the active low pass filter [17], in Fig. 9, which has its output connected to the QVCO varactor. Phase control of the 28 GHz QVCO was implemented by injecting the collector DC current from transistor Q3 into the PD load, thereby offsetting the PD and forcing a phase difference between the phase detector inputs that depends on the amount of injected DC current. A similar method of phase control, but for a CMOS design including a charge pump, has been described in [3] and [4]. Both the reference and divider signals in Fig. 7 are AC-coupled into the phase detector and biased using current mirrors. The variable DC current used for phase control is also created using a current mirror. In Fig. 7 these current mirrors have been omitted for clarity. Using this biasing strategy is beneficial since the phase of the 28 GHz QVCO output signal is then controlled by a current ratio, which can be accurately programmed by matched current sources.

### 2.5 Reference and divider buffer plus 1.75 GHz divider output buffer design

Buffers, outlined in Fig. 8(a), are placed between the divider and the phase detector in Fig. 2, as well as between the input reference signal and the phase detector. The divider buffer is required to isolate the divider output from the phase detector. Otherwise the divider output load changes as a function of the phase difference between \( f_{\text{ref}} \) and \( f_{\text{div}} \). Such a change in load impedance would result in discontinuities in the PD output voltage versus input phase difference. The buffer between the external reference signal \( f_{\text{ref}} \) and the PD makes the signal more square-wave shaped which is required to achieve a constant gain of the PD versus the phase difference between its input signals. The divider buffer, driving the transconductance part of the PD, was biased with a tail current of 1.1 mA and loaded with resistors \( R_2 \) equal to 100 \( \Omega \). This gives a differential output signal of 97 mV\(_p\). The reference frequency buffer driving the switching pairs of the PD was biased with a tail current of 3.0 mA, which in combination with load resistors \( R_2 \) of 100 \( \Omega \) gives a differential output signal swing of 650 mV\(_p\). A differential open collector cascode measurement buffer, shown in Fig. 8(b), for the 1.75 GHz divider output signal, is placed after the divider buffer. In total the measurement buffer consumes 10.4 mA. The purpose of the buffer is to verify correct operation of the divider. One side of the buffer is connected to an external bias-T, while the other is internally terminated to the supply. This buffer does not need to be active during PLL operation.

---

**Fig. 7** Phase detector with PLL phase control, biasing circuitry not shown.
2.6 Loop filter design

The PLL loop filter, given in Fig. 9, has been implemented as a combination of an active RC filter and a passive RC link [17]. With R1 set to zero ohm, $C_1$ equal to 7 pF in combination with phase detector load resistor $R_1$, see Fig. 7, equal to 50 $\Omega$, a pole is introduced at 8.2 MHz. The active filter also provides a zero at a frequency determined by the feedback resistor $R_1$ equal to 1 k$\Omega$ and capacitor $C_1$ that can be used to reduce the phase shift of the filter thereby improving the PLL phase margin. The filter is loaded with resistor $R_2$ equal to 1 k$\Omega$. The filter tail bias current, $I_{bias_LF}$ is equal to 2.0 mA. The common collector stage, i.e. transistor $Q_2$ and resistor $R_3$ equal to 3.5 k$\Omega$, provides isolation between the active filter and the passive RC link consisting of resistor $R_4$ and capacitor $C_4$. The intention of the passive filter is to introduce a second pole in order improve spur suppression. The passive filter has $R_4$ equal to 1 k$\Omega$ and $C_4$ equal to 20 pF where 4 pF out of $C_4$ reside close to the QVCO core to improve suppression of high frequency signals on the control voltage. From [9], the simulated tuning range of the QVCO is 4.7 GHz for a varactor control voltage, $V_{ctrl}$, ranging from 1.0 to 9.2 V. In order to be able to lock the PLL over such a wide varactor control voltage range, the locking range of the PLL can be moved across the QVCO tuning range by altering its supply voltage VCC_LF [17]. High voltage npn devices with open base breakdown voltage, $BV_{CEO}$, of 2.5 V are used in the low pass filter design. The common collector devices must be sized large enough to handle the increase in collector current that results from an increase in supply voltage. For i.e. VCC_LF equal to 9 V, the collector current of the $Q_2$ devices is 2.0 mA, giving a total loop filter power consumption of 54 mW, whereas for a supply of 2.7 V giving $V_{ctrl} = 1$ V the power consumption is only 7 mW.

2.7 Verilog-A representation of the QVCO

In this work a method using a Verilog-A behavioral model [20–22] of the QVCO is proposed to simulate the PLL in locked mode using SpectreRF periodic state analysis (PSS). With increasing division ratio of the PLL divider, the frequency ratio between the reference frequency $f_{ref}$ and the QVCO frequency increases, causing convergence difficulties if a layout parasitic extracted representation of the QVCO is used. To further speed up simulation the behavioral model only models a VCO and not a QVCO, since only two phases of the QVCO are required as input to the divider block. The behavioral model of a fixed frequency oscillator with phase noise provided in the standard Cadence module library rfLib has been modified into a voltage controlled oscillator (VCO) with phase noise. The VCO model is based on the relation between frequency and phase, i.e. phase is equal to the time-integral of frequency [22]. The frequency of the Verilog-A VCO is controlled by the four input parameters, the VCO sensitivity, $K_{VCO}$, the nominal frequency, $f_{nom}$, the varactor voltage, $V_{varac}$, and the varactor nominal control voltage, $V_{nom}$. The instantaneous VCO frequency is given by

$$f_{VCO} = f_{nom} + K_{VCO}(V_{varac} - V_{nom}) \quad (1)$$

The integrator operator in Verilog-A, $idt$ [22], is then used to derive the current output signal from the VCO, $V_{out}$, where the VCO output amplitude equals $A$.

$$V_{out} = A \sin(2\pi f_{VCO} dt) \quad (2)$$

In Verilog-A (2) is written as

$$V(out) < + A * \sin (2 * M.PI * idt(f_{VCO})) \quad (3)$$

In the Cadence Verilog-A module, the phase noise is added to the VCO output signal using two noise levels, $n_1$.
and \( n_2 \), defining the break points in frequency, \( f_1 \) and \( f_2 \), for a phase noise slope of 20 and 30 dB/decade respectively. The flicker noise can be activated using a control parameter \( f_c \). Using the phase noise shaping parameters a good match between behavioral and schematic level QVCO phase noise can be achieved. The \( K_{VCO} \) parameter is set to a value that matches the sensitivity for a schematic level simulation at a certain control voltage, \( V_{nom} \), and the nominal frequency, \( f_{nom} \), is set to the QVCO frequency at that voltage. The behavioral model of the QVCO enables fast closed loop PLL noise performance simulations using Spectre RF PSS. In conventional PLL design [17], the phase detector, VCO and frequency divider are replaced with equivalent linear behavioral models that can be used for ac simulations only. These small signal models are then used to simulate the open and closed loop transfer functions as well as stability. The presented simulation technique using a Verilog-A model of the QVCO including phase noise performance and stability can be simulated directly with parasitic extracted views for the phase detector, low pass filter and divider without using linear models.

3 28 GHz QVCO and I/Q phase imbalance detection and compensation circuit design

3.1 Introduction

The beam steering PLL is designed based on a QVCO core that has been designed and measured in a previous circuit outlined in Fig. 4; a 28 GHz QVCO with I/Q phase imbalance detection and compensation [8]. A phase error in the QVCO can be cancelled by changing the bias of the four tuning varactors shown in Fig. 4. This is a sufficient capacitance change to enable a simulated I/Q phase tuning range of 14.5°. The phase error is measured using two cross coupled active mixers, outlined in Fig. 11. This detector [8] has a differential DC output voltage proportional to the phase difference from 90° [8]. Minimum internal detector phase error is crucial for the possibility to tune out the phase error of the QVCO. Monte Carlo simulations of the detector standalone have therefore been used to secure that the error of the detector is low enough, i.e. the internal error of the detector is below 1.0° [8]. As can be seen in Fig. 11 for mixer 1, the detector

![Figure 10 Buffer architectures. a Output buffer, b detector buffer](Figure10.png)

3.2 Output and detector buffers

From Fig. 4, output and detector buffers are coupled to the collectors of the switching pairs of the QVCO [8]. The output buffer given in Fig. 10(a) is designed as an open collector cascode. A similar buffer is present in the PLL design. The buffers are required to be able to measure the QVCO output signal. Four buffers are included in the presented circuit, out of which three have their outputs connected to the on-chip 2.5 V supply \( V_{CC_{det}} \). The fourth output buffer has its output connected to an RF pad. The phase error originating from not having all buffer outputs connected to identical loads is less than 1° due to the high isolation provided by the cascode. In a fully integrated transceiver, not having to drive 50 \( \Omega \) loads, the output buffers could be reduced in size. In measurements the open collector output is connected to a bias-T and a 50 \( \Omega \) load as depicted in Fig. 10. Biased with \( I_c = 7.8 \) mA, the buffer delivers \(-2 \) dBm at 28 GHz to the 50 \( \Omega \) load. In order not to load the QVCO core with a too large capacitive load, thereby lowering the QVCO frequency, the input AC coupling is set to 50 fF and the input device is degenerated with resistor \( R_{E_1} \). A cascode architecture improves the high frequency gain and isolation of the buffer. The detector buffers use a common collector stage given in Fig. 10(b) supplied with \( V_{CC_{det}} \) and biased with \( I_c = 3.3 \) mA. A common collector architecture was selected for its high input impedance. Due to capacitive parasitics the stage will not operate as a voltage follower providing unity gain at 28 GHz. Instead the QVCO output signal voltage is attenuated while still high enough to drive the detector. At the detector input the signal amplitude is 215 mVpp.

3.3 Detector design

The QVCO phase error detector in Fig. 4, outlined in Fig. 11, is implemented as two cross coupled double balanced active mixers with a differential output voltage proportional to the phase difference from 90° [8]. Minimum internal detector phase error is crucial for the possibility to tune out the phase error of the QVCO. Monte Carlo simulations of the detector standalone have therefore been used to secure that the error of the detector is low enough, i.e. the internal error of the detector is below 1.0° [8]. As can be seen in Fig. 11 for mixer 1, the detector
buffer output signals to the transconductance devices are denoted $I_p$ and $I_n$, and the signals to the switching pairs $Q_p$ and $Q_n$. Using only one active mixer as a detector would result in an output differing from zero volts for input signals that are 90° out of phase, due to internal phase shift in the mixer. With two cross coupled mixers connected in parallel, with mixer transconductance and switching pair signals swapped, the I and Q signals have a symmetrical load and the effect of the internal phase shift of the mixers is cancelled [8].

There are three different sources of error in the detector [8]. The first is inductive and capacitive leakage coupling the LO and RF signals inside the active mixers. This error can be mitigated by careful layout, i.e. keeping the LO and RF signals well separated and preferably perpendicular. The second error arises from coupling between wires and differences in wire length in the routing of the four signals from the QVCO to the two active mixers as indicated in Fig. 12. The third error originates from mismatch of the active and passive devices in the active mixers. In the presented design this error has been minimized by device up scaling. Generally the high frequency performance of an active mixer is reduced if too large active devices are used, but in this case only the DC voltage at the mixer output is of interest, making it possible to reduce the detector error due to mismatch using large active devices. The gain at DC is not affected by the device up-scaling so the upscaling is limited only by detector buffer load handling. Downconversion of 2nd harmonic is reduced due to reduced mismatch. The devices are sized a factor of 7 and 5 times the size for maximum $f_T$ for the transconductance and switching pair respectively. The simulated gain of the detector standalone is equal to 22 mV/degree phase error [8]. The detector requires a dedicated supply voltage of 2.5 V, which is shared with the QVCO measurement buffers and the common collector buffers driving the detector. The bias current of the detector in Fig. 11 equals 12.9 mA. To save current, the detector is only intended to be used during calibration and should thereafter be disabled.

### 4 Layout and parasitics

Inductive and capacitive layout parasitics are important to model for a mm-wave design. For a PLL, however, the inductive parasitics can be disregarded for the lower frequency phase detector as well as the loop filter. For the presented PLL, which is the core of an 84 GHz E-band transmitter, also the balance between different parasitic elements is important. Using higher order QAM modulation, even small differences in capacitive parasitics in the range of a few femtofarads may result in phase errors that cause a significant degradation in the radio link bit-error-rate (BER) [5–7]. Careful layout of the QVCO core, and its inductors, is therefore crucial for minimizing the phase error. If there is still a residual phase error present, this can be minimized using the presented phase tuning in Sect. 3. The octagonal inductors of the QVCO together with the routing to the output buffers as well as the routing to the detector/divider buffers are depicted in Fig. 12. The identical structure, used both for the PLL and the QVCO with I/Q phase error detector chip, was represented with a 22 port s-parameter model extracted using the ADS Momentum 2.5D EM simulator. To minimize capacitive losses to the substrate, the inductors are implemented in the top Cu layer. The octagonal inductors are sized with an inner diameter of 50 μm and a trace width of 11 μm. The
differential inductance of the inductor equals 120 pH with a Q value of 18 at 28 GHz [8, 9].

Excluding the decoupling capacitors between supply and ground as well as between bias wires and ground, the four stage frequency divider layout given in Fig. 13 has a size of 287 × 85 µm². Parasitic capacitances are mainly an issue for the first stages in the divider chain since these operate at a higher frequency [9].

The layout of the beamsteering PLL chip is shown in Fig. 14. The total size of the die equals 1448 × 928 µm², of which more than 70 % is used for pads and on-chip decoupling. In total there are 34 pads, out of which 9 are ground connections. The ground-signal-ground (GSG) pads for the 28 GHz output signal are located on the top left side of QVCO. These pads are placed with a pitch of 100 µm. The ground plane (blue) is implemented in the bottom metal layer. The three different supply voltage domains all use the top metal layer. The PLL loop filter capacitor has been split into one part residing close to the active low pass filter in Fig. 9 and another part close to the QVCO. Any high frequency signals picked up by the long wire from the active filter to the QVCO will thereby be more effectively decoupled to chip ground.

The chip photo of the QVCO with I/Q phase error detector is shown in Fig. 15. The chip size is identical to that of the PLL. The divider, phase detector and low pass filter of the PLL occupy a larger area compared to the I/Q phase error detector. The additional area required for these parts has been made available by removing large MOM capacitors used for decoupling between supply and ground in the I/Q phase error detector circuit. The QVCO with phase error detector chip has 30 pads, out of which 10 are ground pads. The QVCO part plus the output buffers and GSG measurement pads are placed in the same position as in the PLL chip layout.

5 Beam steering PLL simulated performance

The PLL is designed to use three different supply voltages. A supply of 1.5 V is used for all parts of the PLL except for output measurement buffers and the active low pass filter. The supply voltage of the filter can be changed in order to lock the PLL over a wider frequency range. The output buffers for the 28 GHz QVCO signals and the 1.75 GHz divider signals use a 2.5 V supply. Using the maximum VCC_LF equal to 9.8 V and including current used in current mirrors, the PLL consumes 35 mA from the 1.5 V supply. The output measurement buffers that could be turned off when the PLL is operating consume 42 mA. The active low pass filter consumes 6.6 mA with VCC_LF equal to 9.8 V. All stability and noise performance has been simulated using SpectreRF PSS analysis in closed loop with a Verilog-A representation of the QVCO.

All stages in the frequency divider are implemented with current mode logic [9–11]. Such a divider will self-oscillate near the self-oscillation frequency (SOF) the divider has the best performance regarding phase noise and sensitivity, i.e. it will be able to operate with very small input signal [9]. The divider together with its buffer has an SOF of 29 GHz, and a sensitivity of 8.4 mVp at 28 GHz. The divider differential output signal equals 200 mVp. The divider operates correctly between 5 and 35 GHz for a differential input signal level of 50 mVp.
The phase noise versus frequency for the divider and buffer for a 28 GHz input signal of 50 mVp is given in Fig. 16. At 1 MHz offset the phase noise is equal to $-134$ dBc/Hz. This level should be compared to $-129$ dBc/Hz from ideal divide by 16 of the minimum simulated phase noise of the QVCO standalone. Below 1 MHz offset the QVCO has more phase noise than the divider.

The simulated phase detector and active filter output voltage versus phase difference between the 1.75 GHz reference and divider signals are shown in Fig. 17. This has been simulated for a total PD bias current of 1.4 mA, and an active filter bias current of 2 mA from a supply voltage of 6.0 V. The simulation was made using Spectre RF PSS analysis with large signals both as external reference signal and input signal to the divider. The QVCO was excluded from the simulation and instead represented as a sinusoidal input to the divider. The gain of the PD only equals $0.55$ mV/degree while simulated at the active filter output the gain has increased to $8.6$ mV/degree, i.e. the gain of active filter is equal to $24$ dB. The filter has a 1.2 V output range.

The phase steering of the PLL was simulated using a Verilog-A model of the QVCO with a nominal frequency of 28 GHz for $V_{ctrl}$ equal to 3.0 V. The sensitivity in the model was set to 200 MHz/V. To improve convergence the supply voltage, $V_{CC,LF}$, of the active filter was set to 4.8 V to achieve a DC value of $V_{ctrl}$ close to 3.0 V. In Fig. 18 the phase of the 28 GHz QVCO signal is plotted versus the injected DC collector current, $I_{phase}$ of device Q3 in Fig. 7. Except for injected currents below $20 \mu$A, there is a linear dependency of the phase of the 28 GHz signal versus the injected current. With an injected current of $125 \mu$A the phase has turned $175^\circ$. For the linear part the slope is equal to $1.2^\circ/\mu$A.

The phase noise performance of the closed loop PLL was simulated with a PSS and PNOISE analysis. The Verilog-A model was tuned for a phase noise level that corresponds to a simulated value of $-105$ dBc/Hz at 1 MHz offset. The 1.75 GHz reference signal was noiseless in this simulation. Identical model settings as for the phase steering simulation were used. The phase noise spectrum is shown in Fig. 19. Depending on the phase noise level of the reference signal the phase noise will in reality increase for very low offset frequencies. The in band phase noise equals $-115$ dBc/Hz at 1 MHz offset frequency and is dominated by the QVCO and frequency divider. Outside the PLL loop bandwidth of 4.5 MHz, the phase noise is set by the Verilog-A QVCO model which has a slope of 20 dB/decade.

The stability of the closed loop was analyzed using the Periodic Stability Analysis (PSTB) in Spectre RF [23]. Using this, stability analysis can be made for a circuit with a periodically time-varying operating point, e.g. a PLL, without breaking the loop. A probe, i.e. a zero valued DC voltage source was placed in series with the varactor control voltage, $V_{ctrl}$. The stability could then be analyzed.
using gain margin, phase margin, or by plotting the loop gain and phase versus frequency. To optimize the phase margin of the system, two design parameters were altered: the bias current of active low pass filter, $I_{bias,LF}$, and the value of the resistor $R_4$ in the passive low pass link. The bias current changes the gain of the active filter, while the resistor value changes the position of the passive filter pole. The phase margin with $I_{bias,LF}$ equal to 2.0 mA and a $K_{VCO}$ equal to 200 MHz/V, was simulated to 44° at 4.1 MHz offset frequency. Using the presented PLL simulation technique with a Verilog-A model of the QVCO, the phase noise and stability of a complete PLL with a divider ratio of 16 can be simulated within less than 10 min.

All optimization of the PLL performance regarding phase noise and stability was performed using the Verilog-A representation of the QVCO. In order to improve convergence in Spectre RF, the pads, ESD protection and bond wires were excluded from simulation. The PLL locking was, however, also verified using a parasitic extracted model of the QVCO core together with the 22-port s-parameter model of the inductors and routing together with the complete package model. In Fig. 20 the varactor voltage, $V_{ctrl}$, is simulated versus time. As can be seen the PLL locks in a time less than 500 ns giving a stable $V_{ctrl}$ value of 7.7 V.

The simulated PLL performance is summarized in Table 1 below.

### 6 QVCO and I/Q phase error detector circuit measurement results

The measurement system setup is depicted in Fig. 21. The phase noise measurement system is a Eurotest PN9000 requiring an input frequency between 2.0 and 18 GHz. The 28 GHz signal from the QVCO measurement buffer was

---

**Table 1 PLL design parameter summary**

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
<th>Unit</th>
<th>Note</th>
</tr>
</thead>
<tbody>
<tr>
<td>Supply voltage</td>
<td>1.5/2.5/var</td>
<td>(V)</td>
<td>Varying supply voltage for active LF to match QVCO varactor voltage</td>
</tr>
<tr>
<td>$P_{DC}$</td>
<td>52 + 7</td>
<td>(mW)</td>
<td>Excluding measurement buffers. Active LF consumes minimum 7 mW</td>
</tr>
<tr>
<td>QVCO: PN @ 1 MHz offset</td>
<td>−105/−103</td>
<td>dBC/Hz</td>
<td>Min/max simulated PN</td>
</tr>
<tr>
<td>QVCO frequency range</td>
<td>26.0–31.0</td>
<td>GHz</td>
<td>Simulated values [8]</td>
</tr>
<tr>
<td>QVCO sensitivity</td>
<td>200</td>
<td>MHz/V</td>
<td>Value used for QVCO Verilog-A model</td>
</tr>
<tr>
<td>Divider PN @ 1 MHz offset</td>
<td>−134</td>
<td>dBC/Hz</td>
<td></td>
</tr>
<tr>
<td>Divider input frequency range</td>
<td>5–35</td>
<td>GHz</td>
<td></td>
</tr>
<tr>
<td>Gain of PD and active LF</td>
<td>8.6</td>
<td>mV/degree</td>
<td></td>
</tr>
<tr>
<td>Phase control sensitivity</td>
<td>1.2</td>
<td>Degrees/μA</td>
<td>PD biased with 750 μA</td>
</tr>
<tr>
<td>PLL bandwidth</td>
<td>4.5</td>
<td>MHz</td>
<td></td>
</tr>
<tr>
<td>PLL phase margin</td>
<td>44</td>
<td>Degrees</td>
<td>At 4.1 MHz offset</td>
</tr>
<tr>
<td>PLL inband PN @ 1 MHz offset</td>
<td>−115</td>
<td>dBC/Hz</td>
<td></td>
</tr>
<tr>
<td>PLL lock time</td>
<td>500</td>
<td>ns</td>
<td></td>
</tr>
</tbody>
</table>
therefore down converted to 2 GHz using an external passive mixer, Marki M90765, fed by a 26 GHz local oscillator signal from an Agilent E8275D signal generator. A Rhode & Schwarz FSU50 spectrum analyzer was used to measure the frequency tuning characteristic of the QVCO. The open collector output buffer depicted in Fig. 10(a) was connected to an external bias-T and loaded by either the 50 Ω input impedance of the external passive mixer or the spectrum analyzer. Infinity probes from Cascade Microtech configured as ground-signal-ground (GSG) were used to probe the open collector output. To increase the power level of the mixer output IF signal, a low noise amplifier (LNA) was connected between the mixer output and the phase noise measurement system. This was necessary in order for the input signal to the PN9000 to be above the required minimum level of −10 dBm. Since the accuracy of the I/Q phase error detector is limited by mismatch two samples were measured.

The measured QVCO frequency tuning characteristic together with the sensitivity is plotted in Fig. 22 for a supply of 1.5 V. The tuning range equals 6 GHz for control voltages between 0.8 and 6.6 V with a maximum frequency of 28.2 GHz. The sensitivity is strongly varying with $V_{ctrl}$. At $V_{ctrl}$ equal to 6.0 V giving 28 GHz output frequency, the sensitivity equals 200 MHz/V. At low control voltages the sensitivity is increased, e.g. at $V_{ctrl} = 1.5$ V it is equal to 2 GHz/V. If the wide tuning range of the QVCO should be utilized in the presented PLL, the large sensitivity variation could be compensated for by adjusting the bias of the active low pass filter, thereby keeping the overall PLL loop gain and bandwidth constant. The simulated tuning range in [8] is between 26 and 31 GHz.

In Fig. 23 the phase noise when the QVCO is tuned to 28 GHz is measured versus QVCO main bias current, $I_{bias\_main}$ in Fig. 5, for a supply of 1.2 V. The current $I_{tot}$ is defined as the sum of the bias current of the main and coupling transistors. In the measurement, the bias current
of the coupling transistors, two times $I_{bias,coupl}$ in Fig. 4, was constant and equal to 2.3 mA. Increasing the main bias current with 6 mA, i.e. changing the total bias current from 11 to 17 mA, reduced the phase noise at 10 MHz offset by 12 dB, from $-110$ to $-122$ dBc/Hz.

The phase noise versus varactor control voltage, $V_{ctrl}$, at 1 and 10 MHz offset for $V_{CC} = 1.2$ V is given in Fig. 24. The total bias current was equal to 15.5 mA. At 1 MHz offset the minimum phase noise equals $-100$ dBc/Hz, while at 10 MHz offset the level is centered at around $-120$ dBc/Hz, i.e. the slope of the phase noise equals 20 dB/decade in this region. For $V_{ctrl} < 1$ V the phase noise level strongly increases due to the main varactor in Fig. 4 being forward biased. In [8] the simulated minimum phase noise at 1 MHz offset equals $-105$ dBc/Hz.

In Fig. 25 the QVCO frequency sensitivity to the total bias current and supply voltage is depicted. The bias current of the QVCO injection transistors was constant while the bias current of the main devices was varied for a supply voltage of 1.2 and 1.5 V, respectively. The total current of the four injection devices equaled 1.86 mA, i.e. each device was biased with 0.46 mA. The varactor voltage $V_{ctrl}$ was set to 2.1 V. As can be seen in Fig. 25, the QVCO frequency varies approximately linearly with the collector current of the main QVCO core devices for a total current exceeding 13 mA. As seen from Fig. 27, at lower main bias currents, the QVCO outputs deviate from 90° phase difference, resulting in the reduced frequency dependence on the main bias current seen in Fig. 25.

The detector output voltage versus tuning varactor voltage is shown in Fig. 26 for two measured samples. The detector has been verified with the QVCO control voltages [4] $V_{tune,I,P}$ equal to $V_{tune,I,N}$ and $V_{tune,Q,P}$ equal to $V_{tune,Q,N}$. The QVCO supply voltage was set to 1.2 V and the total bias current equaled 19 mA with the coupling devices biased with 2.3 mA. In the first measurement the tuning voltages for the Q-part of the QVCO were set to zero volts, while the I-part tuning voltages were simultaneously varied from 0 to 5 V. In the second measurement the I-part tuning voltages were set to zero volts while the Q-part voltages were swept. Comparing the simulated tuning performance in [8] with the measured data, there is a discrepancy in the detector behavior. In [8], the detector output voltage was simulated versus differential tuning voltage, giving an approximately constant sensitivity versus tuning voltage of 57 mV/V. In the measurements, the detector output was measured versus tuning voltage with either I or Q-part held constant at zero volt, resulting in a halved sensitivity. In simulations [8] the detector was
biased with 12.9 mA, while in the measurements, the bias current was reduced to 8.0 mA. The bias current reduction gives an expected 38 % reduction in sensitivity. The measured detector characteristic also shows a sample dependent decreasing sensitivity with increasing tuning voltage. From the sample 1 curves, the I-sweep curve has a maximum sensitivity of 30 mV/V for low tuning voltages, while the sensitivity is reduced to 5.5 mV/V for tuning voltages approaching 5 V. Both measured samples show a higher sensitivity for the I-sweep curves than the Q-sweep. For low tuning voltages, close to zero volts, the average sensitivity for the I-sweep curves equals 46 mV/V while it is -23 mV/V for the Q-sweep curves, indicating a systematic layout issue in either the QVCO or the detector, making it less sensitive to Q-part tuning. With both tuning voltages set to zero volts, the absolute value of the detector output is less than 5 mV. Using the simulated performance in [4] and compensating for reduced detector gain due to reduced sensitivity due to different measurement and simulation conditions, a detector output voltage of 5 mV corresponds to a QVCO phase error of 5.9°, i.e. the phase imbalance between the detector inputs that results in a zero voltage detector output.

In Fig. 27 the detector output is measured versus total bias current $I_{tot}$ of the QVCO for supply voltages of 1.2 and 1.5 V, respectively, when either the bias current of the main devices, $I_{main}$, or coupling devices, $I_{coupl}$, is varied. All four varactor tuning voltages were set to zero volts. For the curve labeled $I_{main}$ 1.2 the bias current of the coupling devices was held constant at 2.3 mA while the bias current of the main devices was varied. For the curve labeled $I_{coupl}$ 1.2, the bias current of coupling devices was varied while the bias current of the main devices was constant at 16.7 mA. For the measurement at $VCC = 1.5$ V the two curves refer to a constant current $I_{main}$ of 16.4 mA and a constant current $I_{coupl}$ of 2.6 mA, respectively. The I/Q imbalance is strongly dependent on bias current, with an increasing sensitivity to bias current $I_{coupl}$ when it exceeds 3 mA. The I/Q balance degrades when the main bias current, $I_{main}$, is decreased, with an optimal value around 16.6 mA. The detector output voltage is reduced when the supply voltage is increased.

7 QVCO performance comparison

The simulated and measured performance of the presented QVCO is compared to published measured VCOs and QVCOs in Table 2. The simulations were made with extracted resistive and capacitive layout parasitics of the QVCO core together with a 22 port ADS Momentum model of the QVCO inductor. The minimum simulated phase noise [8] of the QVCO equals $-105 \text{ dBc/Hz}$ at 1 MHz offset for a supply voltage of 1.5 V and $V_{ctrl}$ equal to 1.5 V. For high varactor voltages the phase noise equals $-103 \text{ dBc/Hz}$. However, the measured phase noise at the same offset frequency is slightly higher, $-100 \text{ dBc/Hz}$, thereby degrading the $FOM$ and $FOM_T$ [25]. As the QVCO is fully functional for a supply voltage as low as 1.2 V, however, part of the performance degradation can be compensated for by lower power.

8 Conclusions

This paper presents the design and layout of a SiGe 28 GHz PLL for an 81–86 GHz beam steering E-band transmitter together with measurement results for a 28 GHz QVCO with I/Q phase error detection and compensation. The PLL is designed for a supply voltage as low as 1.5 V except for the supply of the active loop filter. With the described phase control for beam steering, using DC current injection into the phase detector output, a competitive architecture is achieved. Power consuming routing of high frequency LO or RF signals is avoided due to the
modular architecture with a separate PLL for each transmitter. It is also possible to avoid phase and amplitude mismatch related to long high frequency interconnects degrading the BER. Closed loop simulations of PLL behavior in Spectre RF have been performed using a Verilog-A model of the QVCO. The 28 GHz QVCO with I/Q phase error detection and compensation has a measured phase noise of $-100$ dBC/Hz at 1 MHz offset, compared to a simulated value of $-103$ dBC/Hz at high varactor control voltages. The I/Q phase error detector functionality has been verified in measurements. For an E-band radio link application, the I/Q imbalance detection and correction circuitry offer great advantages, since increase in radio link BER related to I/Q phase imbalance can be mitigated by adjusting the QVCO phases.

Acknowledgments The authors would like to thank the Swedish government funding agency Vinnova, the System Design on Silicon (SoS) excellence center at Lund University and Infineon Technologies for sponsoring this project.

References

Tobias Tired was born in Lund 1967. He received the M.Sc. degree in Engineering Physics in Lund 1992 and the Technology Licentiate degree in 2012. Since 2012 he is a Ph.D. student at the department of Electrical and Information Technology at Lund University. Between 1993 and 1996 he was at Ericsson Microelectronics in Stockholm, Sweden as semiconductor process engineer. In 1996 he joined Ericsson Mobile Communications in Lund, Sweden as a circuit designer designing BiCMOS and CMOS integrated radio circuits for mobile terminals. His Ph.D. studies are targeted towards millimeter wave transmitter circuits in SiGe for wireless base station backhaul.

Henrik Sjöland received the M.Sc. degree in electrical engineering in 1994, and the Ph.D. degree in Applied Electronics in 1997, both from Lund University. He was appointed Docent in electronic circuit design in 2002. His research interests include the design and analysis of analog integrated circuits, feedback amplifiers and RF CMOS. He spent 1 year visiting the Abidi group at UCLA as a Fulbright postdoc in 1999. He is also the author of a book on integrated wideband amplifiers. In 2008 he became a full professor in analog circuit design at Lund University.

Per Sandrup (M’03) received the B.S.E.E. degree from the Royal Institute of Technology, Stockholm, Sweden, in 1999. He has been with the Ericsson organization since 1994, designing RF-hardware. Between 2001 and 2011 he was working as an RFASIC designer of transmitters and PLLs. He has also long experience of chip top level simulation and modeling using Verilog-A and VHDL. He is currently working with small cell base-station development. His main design interests are within the areas of high frequency design and PLLs.

Johan Wernehag was born in Kristianstad, Sweden, 1978. He received the M.Sc. degree in Electrical Engineering and the Ph.D. in circuit design from Lund University, Lund, Sweden, in 2002 and 2008, respectively. In 2009 he joined Nokia where he was working as a Senior RF Design Engineer, integrating and verifying specification and regulatory compliance of wireless chip-set in mobile devices. In 2010 he joined Ericsson Research as a Researcher in the Modem Hardware group in Lund, Sweden. Since 2013 he is an associate professor at Lund University. His research interests are in the area of RF, mm-wave, and mixed-signal circuits for wireless communication. He is the recipient of the 2008 IEEE Asian Pacific Conference on Circuit and System ‘Outstanding Student Paper Award’.

Imad ud Din completed his M.Sc. degree in Electrical Engineering from Lund University, Lund, Sweden in 2007. He is currently working as an Experienced Researcher in the Modern Hardware Group at Ericsson Research in Lund. His research interests include power efficient circuits and re-configurable RF and baseband circuits for cellular radios.

Markus Törmanen received the M.Sc. degree in Electrical Engineering in 2002 and the Ph.D. degree in Circuit Design in 2010 from Lund University, Sweden. He was a Research Engineer at Lund University in 2003–2006; Department of Electrosence (2003–2004) and MAX-lab (2004–2006). Since 2014 he is Associate Professor in the Analog/RF group at the Department of Electrical and Information Technology, Lund University. His research interests include design of CMOS analog, RF, microwave, and mm-wave circuits.
Paper III
Paper III

A 1.5 V 28 GHz beam steering SiGe PLL for an 81-86 GHz E-band transmitter

A 1.5 V 28 GHz beam steering SiGe PLL for an 81-86 GHz E-band transmitter

Tobias Tired1, Johan Wernehag1, Waqas Ahmad1, Imad ud Din2, Per Sandrup1, Markus Törmänen1, Henrik Sjöland1,2

1Lund University, 2Ericsson Research, Ericsson AB

Abstract—This paper presents measurement results for a low supply voltage 28 GHz beam steering PLL, designed in a SiGe bipolar process with \( f_T = 200 \) GHz. The PLL, designed around a QVCO, is intended for a beam steering 81-86 GHz E-band transmitter. Linear phase control is implemented by variable current injection into a Gilbert type phase detector, with a measured nominal phase control sensitivity of 2.5 \( \mu \text{A} \). The demonstrated LO generation method offers great advantages in the implementation of beam steering mm-wave transmitters, since only the low frequency PLL reference signal of 1.75 GHz needs to be routed across the chip to the different transmitters. Except for an active loop filter, used to extend the locking range of the PLL, the design uses a low supply voltage of 1.5 V. The PLL obtains a measured in band phase noise of -107 dBc/Hz at 1 MHz offset. The power consumption equals 54 mW from the 1.5 V supply plus 1.8 mW for the variable supply of the active low pass filter.

Index terms—mm-wave, PLL, beam steering, phase control, phase noise, SiGe, E-band, transmitter

I. INTRODUCTION

Gigabit capacity mm-wave radio links can be implemented in the E-band at 71-76 GHz and 81-86 GHz [1]. Using directional antennas the radiation lobes become narrow, making beam steering attractive to simplify the link installation [1]. Beam steering can be implemented in different parts of the transmitter. Implementation in the digital domain is a flexible solution, however, separate DACs are then required for each transmitter path. In the analog domain on the other hand, it is possible to implement beam steering in the RF path, in the LO path, and in the baseband [2], each associated with different pros and cons. Regardless of beam steering technique there is a need for high frequency LO signals in the transmitters. Routing of high frequency LO signals across the die to the different transmitter paths requires current consuming buffers. Long routing also introduces phase and amplitude mismatch [2]. This paper presents measurement results for a 28 GHz PLL, for which the design was presented in [2], intended for LO generation and beam steering in an E-band transmitter. Routing of mm-wave signals has been avoided with an architecture using local PLLs, one for each transmitter path, see figure 1, [2-4]. The only high frequency signal routed to the transmitters is the 1.75 GHz reference signal. A 28 GHz QVCO is used to generate the 84 GHz TX carrier in two steps. The 28 GHz I/Q mixer first up-converts the baseband signal. The 84 GHz TX signal is then created by mixing with the 56 GHz differential second harmonic present at the emitters of the cross coupled QVCO transistors [2], [3]. The frequency divider is designed using CML logic [2], and the phase detector (PD) is implemented as a Gilbert mixer [2], [4]. Phase control is implemented with a current mirror that injects DC current into one side of the resistive load of the PD [2], [3]. At 90° phase difference between its inputs the Gilbert mixer produces zero output. In a surrounding range it is close to linear so that the PD output voltage is nearly proportional to the phase deviation between its inputs. For the loop to remain locked the QVCO tuning voltage must remain unchanged, and thus the PD must counteract DC-current injected at its output. A phase difference is thus forced between the PD inputs. This results in a phase shift at the output, which due to the properties of the PD is nearly proportional to the injected current.

II. PLL DESIGN

A. PLL design considerations

In order to increase the spectrum efficiency of backhaul mm-wave radio links, higher order modulation schemes, e.g. 64 QAM are utilized. These schemes put hard requirements on LO in band phase noise. The PLL is intended to be used in a fixed mm-wave radio link and there are therefore no stringent requirements on PLL locking time, i.e. the loop filter bandwidth can be chosen low. In [2] the minimum simulated phase noise of the QVCO is equal to -105 dBc/Hz at 1 MHz offset, corresponding to -129 dBc/Hz after ideal division by 16. The simulated divider noise of -134 dBc/Hz at 1 MHz offset is not suppressed by the closed loop [2] and if only phase noise optimization is targeted the PLL bandwidth should thus be approximately 1 MHz. Taking into account also loop filter area changed the designed bandwidth to 4.5 MHz. With a reference frequency of 1.75 GHz, the low bandwidth is beneficial for reference spur suppression.

B. QVCO and frequency divider design

The QVCO is biased with 15.5 mA [2]. The measured QVCO phase noise at 28 GHz equals -100 dBc/Hz at 1 MHz offset [2]. The four QVCO outputs are connected to cascode output buffers. Three buffers have their output connected to a 2.5 V supply while the fourth has an open collector output for a 50 Ω load. The divider is implemented with four cascaded CML divide by two circuits [2] and consumes 20 mW.

C. Phase detector and active low pass filter design

The PD, given in figure 2, and the active low pass filter [2], [4],
are the core of the beam steering PLL. The PD has been implemented as a Gilbert mixer instead of the common tri-state phase frequency detector (PFD). The latter, commonly used in CMOS technologies, can be realized with ECL logic using only bipolar devices [5]. The Gilbert mixer was selected due to its absence of deadzones, resulting in superior linearity in the zero-crossing region [6]. In figure 2 the divided and buffered signals from the QVCO are connected to the transconductance devices \( Q_1 \) while the buffered reference signals are driving the bases of the current switching devices \( Q_2 \) [2, 4]. \( R_1 \) and \( C_1 \) of the PD load equals 50 \( \Omega \) and 2.5 pF respectively, creating a pole at 1.3 GHz, attenuating harmonics of \( f_{off} \). The total PD operational current, \( I_{PD} \), equals 1.3 mA. Phase control of the 28 GHz output signal has been implemented by injecting a DC current, into one side of the PD load through a current mirror, device \( Q_3 \) and \( Q_4 \), receiving a control current \( I_{phase-ctrl} \).

The advantage of the presented phase control technique is that the phase shift depends nearly linearly on the ratio between the PD bias current, \( I_{PD} \) and the injected current \( I_{phase-ctrl} \). The PD output voltage, \( V_{IPD} \), depends on the cosine of the phase difference, \( \Delta \phi = \pi/2 + \delta \phi \) between its inputs [9].

\[
V_{IPD} = V_{DC}(\Delta \phi + \pi)
\]  
(1)

By approximating the PD response with a straight line that passes through \( \Delta \phi = \pi/2 \) with a slope of 1, the percentage deviation from linearity, \( d \), can be expressed as [9].

\[
d = (\delta \phi - \sin \delta \phi) / \delta \phi \times 100
\]  
(2)

For \( N \) equal to 16 and \( \delta \phi_{QVCO} \) equal to 2.6\(^\circ\) radians, the deviation factor \( d \) equals 2.6\%, corresponding to 9.2 degrees at the QVCO output. The relation between the phase shift of the QVCO, \( \delta \phi_{QVCO} \), the PD bias current, \( I_{PD} \), the control current, \( I_{phase-ctrl} \), and the divider ratio \( N \) is given by [2, 3]. A linear approximation is valid, giving the phase shift at the QVCO frequency, \( \delta \phi_{QVCO} \), as

\[
\delta \phi_{QVCO} = \pi N \frac{I_{phase-ctrl}}{I_{PD}}
\]  
(3)

By fitting the slope of a straight line, the largest deviation from linearity across 360\(^\circ\) reduces to 2.2\(^\circ\) at the QVCO output. The active low pass filter is a combination of an active RC filter [2, 4] and a passive RC link. The passive filter introduces a second pole that increases the spur suppression. The PLL locking range can be extended by shifting the supply voltage of the active low pass filter [2, 4]. High voltage devices with \( B\) \( V_{CEO} \) equal to 2.5 V have been used to handle an increased supply voltage of the filter. The output signal of the filter, \( V_{out} \), is connected to the varactor of the QVCO. In an application, locking is preferably accomplished by implementing a bias circuit that sweeps the filter supply voltage, and detects phase lock.

III. MEASUREMENT RESULTS

A. Die photo

The die photo of the PLL is shown in figure 3. The die size equals 1448 \( \mu m \times 928 \mu m \). The different blocks of the PLL together with the 28 GHz GSG output pads are outlined. The die was mounted on a PCB to which all pads except those of the 28 GHz GSG probe were wire-bonded.

B. Measurement setup and results

The measurement setup for measuring phase shift and phase noise is given in figure 4.

The PLL reference signal from the HP 8642B signal generator was split into two. One signal was connected to the PLL chip through a discrete balun on the PCB, and the second signal was fed to the first input on the Rohde & Schwarz ZVC vector network analyzer (VNA). The PLL divider output was connected to the second VNA input. The PLL phase shift was then measured as the phase difference between the VNA inputs. Since the QVCO operates at 16 times the divider output frequency, the phase shift at the PLL output is 16 times larger than at the divider output. The QVCO output signal was measured with a Cascade Infinity GSG probe and down converted to a frequency below 2 GHz using a Marki M90765 mixer driven by an Agilent E8257D signal generator as LO source. The phase noise was measured with a Eurotest PN 9000 system. The phase noise versus frequency measured at 27 GHz for the locked PLL is given in figure 5. The phase noise at 1 MHz offset equals -107 dBc/Hz. The phase noise curve peaks at the loop filter bandwidth at 5 MHz. At low frequencies the PLL phase noise is impacted by the noise of the reference generator that also has spurs around 100 kHz offset frequency. At 1 MHz offset the measured phase noise of the QVCO standalone equals -100 dBc/Hz [2]. From figure 5, the measured reference phase noise at 1 MHz offset equals -144 dBc/Hz.

From simulations of a linear model, the output phase noise
contributions at 1 MHz offset for the reference, QVCO, divider, PD and active low pass filter equals -119, -114, -110, -114 and -118 dBc/Hz respectively. The PLL could be locked between 24.6 and 27.8 GHz, i.e., a tuning range of 12%. A PLL performance comparison is given in Table 1. According to simulations, a feedback loop for the IQ-error calibration in [2], gives a low contribution from common mode loop amplifier noise. At frequency offsets outside the PLL bandwidth there is a slight increase in phase noise (<1.5 dB in this case) due to differential mode noise. This can be mitigated by adding a differential capacitor at the output of the loop amplifier.

![Fig.5. Phase noise versus frequency for the locked PLL and reference](image)

In figure 6 the locked spectrum at 27.5 GHz is shown with a 10 GHz span and a resolution bandwidth of 5 kHz. The reference spur at 1.72 GHz offset frequency is at -59 dBc.

![Fig.6. Locked spectrum at 27.5 GHz with 10 GHz span](image)

The phase shift at the divider output versus the injected DC current $I_{\text{phase-ctrl}}$ is given in figure 7. The phase tuning sensitivity is 0.15˚/μA, corresponding to 2.3˚/μA at the PLL output. A 360˚ phase change at the PLL output corresponds to a divider output change of 22.5˚.

![Fig.7. Phase of divider output signal versus $I_{\text{phase-ctrl}}$](image)

<table>
<thead>
<tr>
<th>Parameter</th>
<th>[3]</th>
<th>[4]</th>
<th>[7]</th>
<th>[8]</th>
<th>This work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology</td>
<td>CMOS</td>
<td>SiGe</td>
<td>SiGe</td>
<td>CMOS</td>
<td>SiGe bipolar</td>
</tr>
<tr>
<td>Supply (V)</td>
<td>1.2</td>
<td>3.5/μA</td>
<td>2.5</td>
<td>1.3</td>
<td>1.5/μA</td>
</tr>
<tr>
<td>Tuning range (GHz)</td>
<td>18.8-20</td>
<td>64-84°</td>
<td>23.8-</td>
<td>24.12-</td>
<td>24.6-</td>
</tr>
<tr>
<td>Division ratio</td>
<td>16</td>
<td>32</td>
<td>768</td>
<td>16</td>
<td>16</td>
</tr>
<tr>
<td>PN @1 MHz/offs. (dBc/Hz)</td>
<td>-105</td>
<td>-106</td>
<td>-114</td>
<td>-120</td>
<td>-107</td>
</tr>
<tr>
<td>Bandwidth (MHz)</td>
<td>-</td>
<td>50</td>
<td>1</td>
<td>&lt;0.1</td>
<td>5</td>
</tr>
<tr>
<td>Ref. spur (dBc)</td>
<td>-</td>
<td>-37</td>
<td>-49.5</td>
<td>-38</td>
<td>&lt;59</td>
</tr>
<tr>
<td>$P_{\text{dc}}$(mW)</td>
<td>80.8</td>
<td>432-517</td>
<td>50</td>
<td>26.4</td>
<td>54±2.7</td>
</tr>
</tbody>
</table>

* Receiver design including LNA and mixers
* Frequency doubler at VCO output
* Dual band 24/77 GHz PLL, data in table for 24 GHz mode

### IV. CONCLUSIONS

Measurement results for a 28 GHz phase controlled PLL with 1.5 V supply voltage for the QVCO, divider and phase detector have been presented. The phase is controlled with high linearity, with a maximum phase error equal to 2.2˚ across 360˚, by injection of DC current at the output of a Gilbert type phase detector. The PLL in band phase noise at 1 MHz offset equals -107 dBc/Hz. As a beam steering transmitter architecture this technique is highly advantageous since on-chip high frequency routing is avoided.

**ACKNOWLEDGEMENT**

The authors would like to thank the Swedish funding agency Vinnova, the System Design on Silicon (SoS) excellence center and Infineon Technologies for sponsoring this project.

### REFERENCES


Paper IV
A 1V power amplifier for 81-86 GHz E-band

A 1 V power amplifier for 81–86 GHz E-band

Tobias Tired · Henrik Sjöland · Carl Bryant · Markus Törmänen

Abstract The design and layout of a two stage SiGe E-band power amplifier using a stacked transformer for output power combination is presented. In EM-simulations with ADS Momentum, at E-band frequencies, the power combiner consisting of two individual single turn transformers performs significantly better than a single 2:1 transformer with two turns on the secondary side. Imbalances in the stacked transformer structure are reduced with tuning capacitors for maximum gain and output power. At 84 GHz the simulated loss of the stacked transformer is as low as 1.35 dB, superseding the performance of an also presented alternative power combiner. The power combination allows for a low supply voltage of 1 V, which is beneficial since the supply can then be shared between the power amplifier and the transceiver, thereby eliminating the need of a separate voltage regulator. To improve the gain of the two-stage amplifier it employs a capacitive cross-coupling technique not yet seen in mm-wave SiGe PAs. Capacitive cross-coupling is an effective technique for gain enhancement but is also sensitive to process variations as shown by Monte Carlo simulations. To mitigate this two alternative designs are presented with the cross coupling capacitors implemented either with diode coupled transistors or with varactors. The PA is designed in a SiGe process with $f_T = 200$ GHz and achieves a power gain of 12 dB, a saturated output power of 16 dBm and a 14 % peak PAE. Excluding decoupling capacitors it occupies a die area of 0.034 mm².

Keywords Power amplifier · E-band · mm-wave · Transformers · EM-simulation · Capacitive cross coupling

1 Introduction

In a near future 5G networks will start to be deployed. The difference compared to the present 4G networks is that the 5G networks will be heterogeneous, i.e. they will contain both macro, pico and femto cells. There will be a huge increase in the number of base stations. This is necessary for the operator to be able to provide high data rates for the end user. Due to the larger number of base stations a wire connected backhaul with optical fibers will be very costly and therefore a wireless backhaul using point-to-point communication [1] has become a large research activity. The E-band frequency ranges at 71–76 and 81–86 GHz are highly suited for these wireless links since at these frequencies the attenuation in the atmosphere is low, and allows for communication in the range of a few kilometers. The 5 GHz spectrum in each sub-band enables data rates of several gigabits per second, i.e. high enough to replace the optical fiber in the backhaul [1]. In an E-band link the power amplifier is the block that consumes the most power. Typically high saturated output power ($P_{out}$), high output compression point ($OCP_{1dB}$) and high power added efficiency (PAE) are the key performance parameters for the PA [2]. The performance requirements in combination with the high operating frequency make the PAs a key block in mm-wave transceiver design. The high saturated output power is required for long distance links and a high compression point is needed if the amplitude of the carrier contains modulation, like in 16-QAM which is commonly used in commercial E-band links. Operation above the compression point is then not attractive since increased distortion of the constellation diagram will result in higher bit-error rates. Beamforming techniques with spatial power combination [2] can be used to increase the output power and compression point without substantial efficiency
degradation. The performance of PA designs targeted for the 81–86 GHz band is limited by the $f_T$ of the selected semiconductor technologies and for the design presented in this paper a 200 GHz SiGe technology is used. Techniques such as capacitive cross coupling in differential amplifiers [3–5] are needed to increase the PA gain when the operating frequency is so close to the $f_T$ of the semiconductor technology. The performance of mm-wave PAs also depends strongly on the passive components, i.e. the Q-value of the integrated inductors, and of the coupling coefficient that can be obtained in a transformer [6]. This paper describes the design and layout of a 1 V PA using a low loss stacked transformer to combine the outputs of two differential amplifiers to a single-ended antenna port, see Fig. 1. Each amplifier terminal is loaded by 12.5 $\Omega$, which enables a significant output power even at low supply voltage. The design is made in a 0.18 $\mu$m SiGe HBT process with four Cu metal layers. The architecture is similar to the one presented in [7]. In this paper two alternative and improved ways of implementing the capacitive cross coupling, addressing process spread, are described. An alternative output transformer structure is described as well. The design in [7] has been also been extended with an input transformer in order to be able to measure the PA performance using two single ended probes. Finally a complete chip layout is presented.

2 Power amplifier active part design

2.1 Power amplifier architecture

The active part of the power amplifier depicted in Fig. 2 is a two-stage differential design with interstage matching. In conventional designs a larger number of cascaded stages are used to increase the power amplifier gain. However, the larger number of stages will reduce the efficiency of the power amplifier, since each stage consumes a certain amount of power. Given that a sufficient gain can be reached, a design with only one stage will thus have the highest PAE. In this paper a two-stage design was chosen in order not having to force a single stage design close to instability to achieve sufficient gain. A differential design is advantageous compared to a single ended one since the PA can then be directly connected to the differential transmit mixer output. The load impedance is also higher for the same compression point and supply voltage. A single ended amplifier would need to be loaded by 6.25 $\Omega$ to deliver the same peak power. The output power is further increased by combination of the power of two amplifiers as depicted in Fig. 1. The inherent power supply rejection of a differential design also makes it less sensitive to supply and ground bounce. Each of the two stages uses capacitive cross-coupling neutralization [3–5] to reduce the effective base collector parasitic capacitance. The task of the interstage matching is to match the higher output impedance $Z_{out_1}$ of the driver amplifier to the lower input impedance $Z_{in_2}$ of the output amplifier.

2.2 Power amplifier core

The power amplifier core schematic depicted in Fig. 3 is except for device sizes and bias currents identical for both the driver and the output stage. In conventional PA designs a cascode stage is used to provide isolation and increase the high frequency gain, but in the presented architecture the cascode is removed so the supply voltage can be reduced to 1 V with high PAE. The 12.5 $\Omega$ single ended load provided by the output transformers reduces the output voltage swing needed to obtain the desired compression point. Capacitive cross-coupling [3–5] is utilized to increase the power gain of the stage by reducing the effective base collector capacitance. If the capacitor $C_c$ is too large however, the gain will increase but the amplifier will not be unconditionally stable. To maximize the power gain both stages are implemented with the fastest devices available in the technology with $f_T = 200$ GHz. This, however, also
implies a low open base breakdown voltage of 1.5 V, but in a real design the base terminal is not open and the breakdown voltage is therefore slightly higher [2]. All devices are scaled with an emitter current for maximum $f_T$. The driver stage is biased with $I_c = 9.5$ mA, i.e. a current high enough to drive the output stage into compression.

Especially for the output stage which is biased with a large current, $I_c = 20$ mA, to drive the low impedance load, the current density requirement results in devices with large base–collector parasitic capacitances, $C_{bc}$. In absence of cascodes the parasitic capacitance lowers the power gain and decreases the stability [3], [5]. The cross coupled capacitors $C_c$ in Fig. 3 reduce the effective base–collector capacitance and increase the performance of the stage. Depending on the device sizes there is a certain value of $C_c$ that maximizes the stability factor $k$ [3], [5]. However, this value of $C_c$ does not result in the highest maximum available gain, $G_{max}$, i.e. the power gain when both the input and output ports are matched. For larger $G_{max}$ a higher value for $C_c$ is required, resulting in positive feedback. A too large capacitor value, however, makes the design unstable and using the cross coupling technique there is a clear tradeoff between stability and gain. For the designed input and output stage $C_c$ equals 46 fF and 50 fF, respectively.

2.3 Alternative power amplifier cores

Capacitive cross coupling can be implemented in different ways. For the design presented in this paper MIM-capacitors, i.e. metal–insulator-metal capacitors have been used as cross coupling capacitors. This requires substantial design margin, as it is sensitive to production spread of transistors and the cross coupling capacitors. There is also a sensitivity to temperature and supply voltage variation since the base–collector capacitance of transistors and the MIM-capacitors will have a different dependency of temperature and supply voltage. One way to mitigate these effects is to instead implement the cross coupling capacitances with diode coupled transistors [5] as depicted in Fig. 3. In this architecture the base–collector depletion capacitance, $C_{bc\_diode}$ of the diode coupled transistor is used to cancel the effect of the base–collector capacitance, $C_{bc}$, of the CE-stage devices (Fig. 4).

The advantage with this technique is that only one type of device is used to implement the complete amplifying stage. A process spread in $C_{bc}$ of the common-emitter devices is followed by a similar change in the capacitance of the diode connected devices implementing the cross coupling capacitors. There is also a benefit regarding the large signal properties [5] of the amplifying stage especially important for a power amplifier. Large signal effects of the non-linear base–collector junction of the CE devices will also be counteracted since the large signal also modulates the base collector junction of the diode connected devices. To reduce the effect of $C_{bc}$ as much as possible, the turned off diode connected devices should be chosen slightly smaller than the CE devices, since for equally sized devices $C_{bc\_diode}$ is larger than $C_{bc}$.

Another option is to use an architecture with reversed biased junction diodes as the cross coupling devices, see Fig. 5. The amount of cross coupling can then be carefully tuned with a control voltage, $V_{ctrl}$. By having a tunable gain, the design margins can be further reduced, resulting in increased performance, as effects of process spread, supply voltage and temperature can be compensated for.

2.4 Interstage matching and biasing

The interstage matching depicted in Fig. 6 is crucial for the performance of the two-stage amplifier, as mismatch in the
interface between the amplifiers would result in reduced gain. The two amplifiers were optimized separately using an interface shown as in Fig. 6. The driver was tuned to a real differential impedance $Z_1$ equal to $400 \Omega$ with inductor $L_1$. The low input impedance of the output stage equal to $14 \Omega$ is transformed to a higher real valued impedance $Z_2$ using an L-C matching network formed by shunt inductor $L_2$ and series capacitor $C_1$. Inductor $L_3$ is a mm-wave choke for biasing of the output stage. For maximum power gain $Z_1$ should equal $Z_2$ for conjugate match. In the complete circuit with the amplifiers connected together, $L_1$ and $L_2$ are in parallel and are merged into a single inductor. The impedance transformation ratio of the interstage matching is important for the bandwidth of the power amplifier. The ratio can be reduced by increasing the cross coupling capacitors in the driver stage, which will lower $Z_1$ as well as increase the driver stage input impedance. This must be applied with care however, as too much cross coupling will result in a reduced stability factor. The cross coupling of the output stage is limited by the requirement that for maximum power transfer to the stacked transformer the transformer/output stage interface should be in resonance. Increased capacitance thus must be compensated by reduced transformer inductance, which results in more transformer loss. Increasing $Z_2$ could instead be implemented by adding inductive degeneration to the output stage. This will reduce the gain but it will be beneficial for the power amplifier matching bandwidth and circuit robustness. It should be noted here, however, that gain at E-band frequencies is an expensive parameter to trade-off.

An alternative matching and biasing network is depicted in Fig. 7 where the bias chokes $L_3$ have been replaced by small resistors $R_3$ of about $30 \Omega$. The modification has a negative impact on the compression behavior of the output stage and it will also affect the PAE. When the output stage starts to compress the DC collector current as well as the base current increases. With the base biased with $R_3$ instead of $L_3$ the increase in base current results in a reduced base voltage. This partly counteracts the increase in $I_C$, reducing the compression point. The bias circuit with the inductor $L_3$ is therefore preferred.

The resistors used for biasing of the junction diodes in Fig. 5 could be implemented using any of two available poly resistor types in the SiGe process, i.e. $p^+$ poly with $150 \Omega/25A1$ or $p^-$ poly with $1000 \Omega/25A1$. The small resistor in Fig. 7 is preferably implemented with the tantalum nitride having $20 \Omega/25A1$.

2.5 Temperature compensated biasing

An E-band power amplifier has a quite low power added efficiency compared to a PA designed for lower frequencies. The large DC power dissipation will result in unwanted behavior resulting in that the bias current in the
driver and output amplifier varies with temperature. If biased with a voltage source at the base, increasing device temperature for a bipolar device will result in higher collector current that can result in thermal runaway. The compression point and gain will also vary strongly with altered collector current. The temperature dependency can be solved by the architecture depicted in Fig. 8. A bandgap reference block is used to create a current \( I_{\text{ref}} \) that is virtually temperature independent. The output current biases the active part of the PA through a current mirror multiplying the current \( I_{\text{ref}} \) \( N \) times through the scaling of the device sizes in the current mirror. The inductors connected to the bias node are used as RF chokes. To further mitigate the temperature effect the PA together with the bandgap reference block could be mounted on a heat sink.

3 Transformer design

3.1 Introduction

The power combining transformer at the output is very important for the performance of the PA [7]. A loss in the transformer will reduce the overall gain and PAE. Therefore transformer design and optimization is an important part of mm-wave power amplifier research. The main design idea of this paper is to design a low voltage power amplifier with high OCP\(_{1\text{dB}}\) and PAE. To accomplish this, the 50 \( \Omega \) antenna impedance must be down transformed. At lower frequencies this could be implemented by an \( N:1 \) transformer, where \( N \) turns on the secondary side connected to the antenna. However, at E-band frequencies, already with \( N = 2 \) the performance of such a transformer regarding loss and imbalance suffers too much from interwinding capacitances on the secondary side.

3.2 Stacked transformer design

In this paper instead a stacked transformer, given in Fig. 9, consisting of two single turn transformers is used. The primary sides of the single turn transformers depicted in Fig. 1 provide a 25 \( \Omega \) differential interface for each of the two output stages of the PA, while the secondary sides are series connected between the 50 \( \Omega \) antenna and ground. The power combination of two output stages increases the OCP\(_{1\text{dB}}\) and \( P_{\text{sat}} \) of the PA. As losses in the power combining transformer have a negative impact on efficiency, output power and gain, they must be minimized. The loss is minimized with capacitor \( C_{\text{gnd}} \) equal to 58 fF for optimal performance. Any imbalance in the load of the differential output stages will also deteriorate the performance of the PA. Imbalance is caused by capacitive coupling between the primary and secondary side of the transformers. In order to reduce the imbalance both in amplitude and phase, the combiner is tuned with capacitors \( C_{t1} \) and \( C_{t2} \) as depicted in Fig. 8. For large input signals approaching the PA compression point the DC collector current in the output devices increases. Transformer imbalances make the current increase unequal in the two output devices in each output stage. This is counteracted by the tuning capacitors \( C_{t1} \) and \( C_{t2} \) equal to 50 fF each. The optimal value of these two capacitors is also depending on \( C_{\text{gnd}} \). The current increase can otherwise result in reduced life-time if the DC collector current increases too much.

To reduce capacitive losses to the substrate the transformer is implemented in the two top Cu layers with the higher voltage secondary side in Cu 4 with a thickness of 2.8 \( \mu \)m and the primary side in 1.05 \( \mu \)m Cu 3. The transformers are sized with an inner diameter of 18 \( \mu \)m and a trace width of 6 \( \mu \)m. The supply voltage to the output stages is connected through a center tap on the primary inductors. The inductance of the transformer is constrained by the output capacitance of the output stages since for maximum gain and minimum power loss these should be in resonance at 84 GHz. Increasing the width of the transformer metal traces decreases the resistive losses but increases the capacitance. A larger coil diameter also decreases the loss, but increases the inductance thereby
requiring a smaller output device. The transformer was simulated and optimized using the ADS Momentum 2.5D EM simulator. Circular inductors are convenient for performance optimization compared to drawing octagonal shapes, but some foundries do not accept circular designs due to issues with mask generation. The simulated differences in performance are however negligible. The stacked transformer design suffers from imbalances both in amplitude and phase that can be mitigated with the three tuning capacitors $C_{t1}$, $C_{t2}$, and $C_{gnd}$.

3.3 Transformer tree design

Another way of implementing the power combiner with low impedance load for the output stages is to use the transformer tree architecture depicted in Fig. 10.

The transformer tree consists of three identical single turn transformers. The transformer connected to the antenna converts its 50 Ω load to two 25 Ω load impedances, one for each of the two transformers to the left. The structure is tuned to resonate at 84 GHz using the capacitor $C_{tune}$ connected between the terminals of the primary side of the output transformer. As can be seen in Fig. 11 the transformer was implemented with Cu4 for secondary side connected to the antenna and Cu 3 for the primary side.

As for the stacked transformer the output transformer has the secondary side in metal Cu 4 in order to minimize the capacitive losses to the substrate from the high voltage signal. The two input transformers have the primary side in Cu 4 to make an efficient layout without vias in the signal path. The supply voltage to the output stages is connected through a center tap on the primary side of the input transformers. The transformers were implemented with an octagonal shape with an inner radius of 18 μm and a trace width of 6 μm. The main advantage of the transformer tree architecture is that due to the inherent symmetry it does not require any tuning capacitors to obtain low phase and amplitude imbalances. On the other hand, three transformers are required, instead of two as for the stacked transformer architecture. This results in significantly higher losses for the transformer tree, making it a less good choice for an output power combiner. The stacked transformer architecture on the other hand has the properties of simultaneous ease of design and low loss.

3.4 Input transformer design

In order to verify the performance of the power amplifier standalone, an input transformer is required to convert the single ended RF input to a differential signal, but in a fully integrated transmitter this transformer would not be needed since the transmit mixer provides a differential output. The architecture of the input transformer is depicted in Fig. 12. A tuning capacitor, $C_{tune\_input}$, of 45 fF was placed between RF_IN and GND to cancel the imaginary part of the input impedance at 84 GHz. Two parallel single turn transformers convert the single-ended input signal to differential signals for the two amplifiers. The inductance of each input transformer is designed to resonate with the capacitive load of its amplifier input. The design is advantageous compared...
to a solution with a single transformer connecting to both driver stage inputs, as the routing can be more compact, reducing parasitics, and since the inductance of a single transformer would be far from optimum regarding losses. A center tap on the secondary side is used for input stage biasing.

The layout of the input transformer is depicted in Fig. 13. The primary side is designed in metal Cu 4 in order to reduce the capacitively coupled substrate losses, whereas the Cu 3 layer is used for the secondary side. As for the stacked transformer at the output the input diameter equals 18 μm and the trace width is 6 μm. Compared to the stacked transformer at the power amplifier output, the loss of the input transformer is less critical though. Since the PA has a gain of 12 dB a power loss at the input transformer has a small impact on the PAE compared to a loss at the output. On the other hand the amplitude and phase imbalance of the four signals connected from the input transformer to the driver stage of the PA has a large impact on both the overall gain and PAE. Any imbalance of these signals results in a non-optimal power combination at the output.

4 Power amplifier layout and parasitics

4.1 Power amplifier core layout

Inductive and capacitive layout parasitics have a large impact on millimeter wave designs [8], and there is typically a major difference between simulations results of a schematic design and a design with extracted parasitics. The larger design, the more parasitics from interconnects that impact the performance. The presented design with a small core based on a differential CE stage is thus advantageous. For the power amplifier core depicted in Fig. 3 the most important parasitic is the inductance between the emitter and signal ground [8], denoted $L_{E1}$ and $L_{E2}$ in Fig. 14, showing the layout of the output stage.

Significant gain loss (>1 dB) occurs already for parasitic emitter inductances $L_{E1}$ and $L_{E2}$ of 10 pH. Therefore a multi fingered device structure configured as Collector–Emitter–Base–Emitter (CEBE) has been used, with double emitter contacts for each emitter diffusion. To minimize the effective emitter inductance, 12 emitters per device are connected to the ground rail in the center of the structure. The symmetrical layout ensures that the rail is well grounded for differential signals. The cross coupling capacitors denoted $C_c$ in Fig. 3 were placed above and below the collector rails denoted $C_1$ and $C_2$. Each capacitor has one side connected to the corresponding collector terminal and the other side connected to the base of the opposite transistor through an interconnect wire. Since the series inductance of the interconnect of the cross coupling capacitors is large enough to alter the reactance of the cross coupling, the wires were simulated in ADS Momentum. The cross coupling capacitors in the driver stage were decreased from 52 fF in a simulation with the interconnect inductance unaccounted for, to 46 fF to compensate for the series inductance. The inductance of the transformer was tuned to resonate with the total capacitance at the collector nodes $C_1$ and $C_2$, i.e. the device collector capacitance plus the parasitic layout capacitances. The output pad for the antenna connection, modeled in Momentum, has a solid grounded shield underneath implemented in Cu 1 to reduce loss to the substrate.

4.2 Power amplifier layout

The layout of the PA excluding decoupling capacitors and pads is shown in Fig. 15. It has a size of $260 \times 130$ μm and features two identical amplifier chains in parallel consisting of driver amplifier, interstage matching and output amplifier.
The complete chip occupying an area of approximately 1 mm² is depicted in Fig. 16. In total there are 20 ground pads and four supply pads. Two pads are used for bias of the driver and output stage, respectively. At the amplifier input and output, pads are placed in order to match ground-signal-ground (GSG) probes with a pitch of 100 µm. The blue bottom metal, Cu 1, has been used as a ground plane covering the whole die. There is a similar plane in Cu 4 for the supply voltage. Two types of capacitors have been used for decoupling between the supply and ground planes, MIM capacitors and metal-grid capacitors. The metal-grid capacitors were implemented with the bottom plate consisting of joint Cu 1 and Cu 2 metal grid planes and a top plate of joint Cu 3 and Cu 4 planes. At high frequencies, i.e. 81–86 GHz the Q-value of the metal-grid capacitors exceeds the Q-value of the MIM-capacitors. The capacitance per area, however, is higher for the MIM capacitor and the two types of capacitors thus provide an attractive combination in achieving efficient decoupling across a wide frequency range.

5 Simulated performance

The large signal performance of the PA was simulated using the Cadence Spectre RF tool. Gummel Poon models were used for the active devices. The transformers and the millimeter wave interconnects were modeled with s-parameters from ADS Momentum simulations. The loss of the stacked transformer power combiner was simulated using two combiners connected back to back [8] as depicted in Fig. 17. An important benefit from of this simulation setup is that effects of phase and amplitude imbalances are cancelled out, so that the loss can be characterized separately.

The total loss of the two combiners, simulated as maximum available gain [8] \( G_{\text{max}} \) versus frequency, with a
tuning capacitor $C_{tune}$ equal to 164 fF connected differentially between the 12.5 Ω terminals, is shown in Fig. 18.

As can be seen the loss for the two combiners was simulated to 2.7 dB at 84 GHz, i.e. a single combiner has 1.35 dB loss. The alternative combiner design consisting of the transformer tree depicted in Figs. 9 and 10 was also simulated using an identical setup. The maximum available gain versus frequency is shown in Fig. 19.

Using the same calculation as for the stacked transformer, the loss of the transformer tree consisting of three individual transformers becomes 3.2 dB at 84 GHz, i.e. a significantly higher loss compared to the loss of only 1.35 dB for the stacked structure. The simulated $G_{max}$ for the less critical input transformer is shown in Fig. 20. The loss for the two input transformers was simulated to 2.54 dB at 84 GHz, i.e. a single transformer has 1.3 dB loss.

In order to reach a high compression point and gain it is important that the stacked transformer at the output is well balanced, i.e. the load that the transformer presents to the output stage should be as equal as possible for the four output transistors. If not, one device might compress for a lower input power than the others. If the load is asymmetric the power combination that takes place in the stacked transformer will also be less effective. To simulate the balance two equal AC current sources were placed at the two primary inputs of the stacked transformer. The same value, $C_t$, was used for the two tuning capacitors $C_{t1}$ and $C_{t2}$ in Fig. 8. In Fig. 21 the magnitude of the four impedances presented to the output stage is plotted versus $C_t$.

As can be seen impedances will not be perfectly balanced, but by using the tuning capacitances, $C_t$, it is possible to improve the situation considerably. With $C_t$ equal to zero the difference between the minimum and maximum...
impedance magnitude equals 2.65 Ω. With $C_t$ equal to 50 fF the imbalance instead equals 1.55 Ω, i.e. the imbalance has been reduced by 42%. With further increased value of $C_t$ the balance improves, but the gain suffers since the output stage is then no longer in resonance with the stacked transformer.

The input and output matching, $S_{11}$ and $S_{22}$, power gain, $S_{21}$, and reverse isolation, $S_{12}$, simulated with a post-layout S-parameter analysis for the extracted layout is shown in Fig. 22.

The bandwidth of the input, $S_{11}$, and output, $S_{22}$, matching equals 11 and 28 GHz, respectively, for a return loss $<-8$ dB. With the selected values for cross coupling capacitors in stage 1 and stage 2, the power gain, $S_{21}$, is 12.0 dB at 84 GHz, and the isolation $S_{12}$ equals $-41$ dB. The input and output of the PA are not matched for exactly the same frequency. The input return loss has a minimum at 84 GHz whereas the output return loss has a minimum at 77 GHz. The matching is, however, still good with a return loss of $-19$ dB at 84 GHz. Modifying the input and output transformer inductances can further optimize the matching. The shape of the input matching curve depends on the Q-value of the combination of the input transformer and the input impedance of the active part of the power amplifier. In the same way the shape of the output matching depends on the output impedance of the active part and the Q-value of the stacked transformer. The reverse isolation, $S_{12}$, depends strongly on the value of the cross coupling capacitor $C_c$. For a certain value of $C_c$ the base–collector parasitic capacitance will cancel, thereby improving the reverse isolation.

The power amplifier is intended to be biased with constant collector currents versus temperature. This is accomplished by using the temperature compensated bias architecture given in Fig. 8. With this approach the output compression point will not be dependent on temperature. On the other hand the gain, $S_{21}$, will show a temperature dependency as depicted in Fig. 23, where $S_{21}$ is simulated versus frequency for four different temperatures, i.e. $T = 0, 27, 50$ and $85$ °C, corresponding to gain curves labeled M0 to M3. The quite large spread is due to the temperature dependence of $g_m$ in combination with temperature dependent parasitic base collector capacitance altering the effect of the cross coupling capacitor. The gain spread could thus be reduced by implementing the architectures depicted in Figs. 4 and 5 with cross coupling capacitors implemented with diode coupled transistors or with controllable junction diodes.

The matching of the input, $S_{11}$, and the output, $S_{22}$, do however not depend significantly on temperature. This is depicted in Fig. 24 showing $S_{11}$ and $S_{22}$ for $T = 0$ and $85$ °C.
The isolation from output to input, $S_{12}$, is plotted versus frequency for $T = 0$ and $85 \, ^\circ C$ in Fig. 25. The temperature dependency of $S_{12}$ is quite small.

In Fig. 26 the stability factor $k$ together with $G_{\text{max}}$ are simulated versus frequency for driver stage $C_c$ equal to 36 fF, 46 fF and 56 fF.

As can be seen the stability of the circuit is sensitive to the values of the cross coupling capacitors. The stability factor is highest when the cross coupling capacitor equals the base collector capacitance $C_{bc}$. For $C_c$ equal to 36 fF and 46 fF the design is unconditionally stable at all frequencies. For $C_c = 56 \, fF$, $G_{\text{max}}$ at 84 GHz is 4.2 dB higher, but the design is on the limit of not being unconditionally stable, with minimum $k = 1.0$. The output compression point, $CP_{1\text{dB,\,out}}$, and saturated output power, $P_{\text{sat}}$, were simulated with a PSS analysis at 84 GHz to 13.0 and 15.9 dBm respectively, see Fig. 27(a). The power added efficiency (PAE) versus input power is given in Fig. 27(b). The PA achieves a peak PAE of 13.9 %, and at the compression point the PAE equals 10.4 %.

As can be seen in Fig. 26 the circuit is sensitive to variations in cross-coupling capacitance. Two alternative ways of implementing the cross coupling, shown in Figs. 4 and 5, that will be more resistant to process variations than if realized with MIM capacitors, have therefore been investigated. Figure 28 (a and b) show Monte Carlo simulations with process variations and mismatch of the PA gain at 84 GHz, for the presented design and the alternative design with diode coupled transistors as in Fig. 4. The size of the diode connected devices where 0.95 times the size of the CE devices [5]. Each design has been simulated with 400 iterations.

As expected the design with diode connected devices has significantly smaller spread, $\sigma = 1.1 \, \text{dB}$, compared to the design with MIM capacitors that has $\sigma = 2.0 \, \text{dB}$. The mean gain is also 1 dB higher in that design.

The simulated performance of the presented power amplifier, with extracted resistive and capacitive layout parasitics plus Momentum modeled transformers and millimeter wave interconnects is compared to published measured E-band SiGe PAs in Table 1. Even with a supply voltage as low as 1.0 V the presented PA has competitive performance. The gain is lower compared to the gain of the Pas in [9–14] but can be significantly increased by altering the cross coupling capacitances.
Conclusions

A low supply voltage two-stage E–band power amplifier with high power added efficiency has been presented. The amplifier design is based on differential CE stages with capacitive cross coupling to increase the gain as well as the stability by reducing the effective differential base–collector capacitance. Different ways of implementing the

Table 1 Performance comparison of SiGe PAs

<table>
<thead>
<tr>
<th>Ref.</th>
<th>Techn. f_T/f_{max} (GHz)</th>
<th>Freq. (GHz)</th>
<th>Arch.</th>
<th>VCC (V)</th>
<th>Gain (dB)</th>
<th>P_{1dB} (dBm)</th>
<th>P_{sat} (dBm)</th>
<th>Peak PAE (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>[9]</td>
<td>230/280</td>
<td>77</td>
<td>2 CAS</td>
<td>2.5</td>
<td>22.5</td>
<td>12</td>
<td>15</td>
<td>7.5</td>
</tr>
<tr>
<td>[10]</td>
<td>200/-</td>
<td>84</td>
<td>CAS + 2CB</td>
<td>2.5</td>
<td>27</td>
<td>16</td>
<td>18</td>
<td>9</td>
</tr>
<tr>
<td>[12]</td>
<td>230/280</td>
<td>79</td>
<td>4 CE</td>
<td>1.8</td>
<td>18.1</td>
<td>12.5</td>
<td>17</td>
<td>6.4</td>
</tr>
<tr>
<td>[13]</td>
<td>200/200</td>
<td>77</td>
<td>4 CE</td>
<td>1.8</td>
<td>17</td>
<td>14.5</td>
<td>17.5</td>
<td>12.8</td>
</tr>
<tr>
<td>[14]</td>
<td>230/300</td>
<td>77</td>
<td>1 CAS + 2 CE</td>
<td>2.5</td>
<td>19</td>
<td>12.0</td>
<td>14.5</td>
<td>15.7</td>
</tr>
<tr>
<td>This work</td>
<td>200/250</td>
<td>84</td>
<td>2 CE</td>
<td>1.0</td>
<td>12.0</td>
<td>13.0</td>
<td>15.9</td>
<td>13.9</td>
</tr>
<tr>
<td>This work with input transformer</td>
<td>200/250</td>
<td>84</td>
<td>2 CE</td>
<td>1.0</td>
<td>10.7</td>
<td>12.9</td>
<td>15.8</td>
<td>13.1</td>
</tr>
</tbody>
</table>

6 Conclusions

Fig. 27 Large signal behavior a Output power versus input power b Power added efficiency versus input power

Fig. 28 Monte Carlo simulations of S_{21}; a implementation with MIM-capacitor, b implementation with diode connected devices

Table 1 Performance comparison of SiGe PAs

<table>
<thead>
<tr>
<th>Ref.</th>
<th>Techn. f_T/f_{max} (GHz)</th>
<th>Freq. (GHz)</th>
<th>Arch.</th>
<th>VCC (V)</th>
<th>Gain (dB)</th>
<th>P_{1dB} (dBm)</th>
<th>P_{sat} (dBm)</th>
<th>Peak PAE (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>[9]</td>
<td>230/280</td>
<td>77</td>
<td>2 CAS</td>
<td>2.5</td>
<td>22.5</td>
<td>12</td>
<td>15</td>
<td>7.5</td>
</tr>
<tr>
<td>[10]</td>
<td>200/-</td>
<td>84</td>
<td>CAS + 2CB</td>
<td>2.5</td>
<td>27</td>
<td>16</td>
<td>18</td>
<td>9</td>
</tr>
<tr>
<td>[12]</td>
<td>230/280</td>
<td>79</td>
<td>4 CE</td>
<td>1.8</td>
<td>18.1</td>
<td>12.5</td>
<td>17</td>
<td>6.4</td>
</tr>
<tr>
<td>[13]</td>
<td>200/200</td>
<td>77</td>
<td>4 CE</td>
<td>1.8</td>
<td>17</td>
<td>14.5</td>
<td>17.5</td>
<td>12.8</td>
</tr>
<tr>
<td>[14]</td>
<td>230/300</td>
<td>77</td>
<td>1 CAS + 2 CE</td>
<td>2.5</td>
<td>19</td>
<td>12.0</td>
<td>14.5</td>
<td>15.7</td>
</tr>
<tr>
<td>This work</td>
<td>200/250</td>
<td>84</td>
<td>2 CE</td>
<td>1.0</td>
<td>12.0</td>
<td>13.0</td>
<td>15.9</td>
<td>13.9</td>
</tr>
<tr>
<td>This work with input transformer</td>
<td>200/250</td>
<td>84</td>
<td>2 CE</td>
<td>1.0</td>
<td>10.7</td>
<td>12.9</td>
<td>15.8</td>
<td>13.1</td>
</tr>
</tbody>
</table>

6 Conclusions

A low supply voltage two-stage E–band power amplifier with high power added efficiency has been presented. The amplifier design is based on differential CE stages with capacitive cross coupling to increase the gain as well as the stability by reducing the effective differential base–collector capacitance. Different ways of implementing the
cross coupling for improved tolerance to process, voltage and supply voltage variations are investigated. A low loss power combining stacked transformer with single turn inductors is presented, which performance supersedes an alternative design with a symmetrical transformer tree. The stacked transformer is significantly easier to design at 84 GHz than a two-turn 2:1 transformer, and has the additional advantage of providing power combination of two output stages. The differential input of the power amplifier is advantageous since in a fully integrated transmitter it can be directly connected to the differential output of the transmit mixer, thereby eliminating the loss of a differential to single ended input transformer. For power amplifier verification an input transformer is however necessary. Simulated performance with layout parasitics is shown for the power amplifier with and without the designed low loss input transformer, and results that are competitive with state-of-the-art published SiGe E-band PAs are achieved using just 1 V supply compared to at least 1.8 V for competing designs.

Acknowledgments The author would like to thank the Swedish government funding agency Vinnova, the System Design on Silicon (SoS) excellence center and Infineon Technologies for sponsoring this project.

References


Tobias Tired was born in Lund 1967. He received the M.Sc. degree in Engineering Physics in Lund 1992 and the Technology Licentiate degree in 1992. Since 2012 he is a Ph.D. student at the department of Electrical and Information Technology at Lund University. Between 1993 and 1996 he was at Ericsson Microelectronics in Stockholm, Sweden as semiconductor process engineer. In 1996 he joined Ericsson Mobile Communications in Lund, Sweden as a circuit designer designing BiCMOS and CMOS integrated radio circuits for mobile terminals. His Ph.D. studies are targeted towards millimeter wave transmitter circuits in SiGe for wireless base station backhaul.

Henrik Sjöland received the M.Sc. degree in electrical engineering in 1994, and the Ph.D. degree in Applied Electronics in 1997, both from Lund University. He was appointed Docent in electronic circuit design in 2002. His research interests include the design and analysis of analog integrated circuits, feedback amplifiers and RF CMOS. He spent one year visiting the Abidi group at UCLA as a Fulbright postdoc in 1999. He is also the author of a book on integrated wideband amplifiers. In 2008 he became a full professor in analog circuit design at Lund University.
Carl Bryant was born in Helsingborg, Sweden, in 1983. He received the M.Sc. degree in Electrical Engineering and PhD degree from Lund University, Sweden, in 2007 and 2013, respectively. His Master’s thesis was on the subject of RF CMOS power amplifiers and was performed with the IC Design Group, University of Twente, Netherlands. Upon completion he spent a year at Ericsson Research continuing his work on power amplifiers. His PhD studies focused on the subject of ultra-low power radio front-ends. He received the PhD degree in 2013. His research interests include CMOS circuit design, primarily at radio frequencies.

Markus Törmänen received the M.Sc. degree in Electrical Engineering in 2002 and the PhD degree in Circuit Design in 2010 from Lund University, Sweden. He was a Research Engineer at Lund University in 2003–2006; Department of Electrosience (2003–2004) and MAX-lab (2004–2006). Since 2010 he is Assistant Professor in the Analog/RF group at the Department of Electrical and Information Technology, Lund University. His research interests include design of CMOS analog, RF, microwave, and mm-wave circuits.
Paper V

Comparison between two 2-stage SiGe E-band Power Amplifiers

Comparison of two SiGe 2-stage E-band Power Amplifier Architectures

Tobias Tired, Henrik Sjöland, Göran Jönsson, Johan Wernehag
Department of Electrical and Information Technology, Lund University; Sweden
E-mail: tobias.tired@eit.lth.se

Abstract—This paper presents simulation and measurement results for two 2-stage E-band power amplifiers implemented in 0.18 μm SiGe technology with $f_T = 200$ GHz. To increase the power gain by mitigating the effect of the base-collector capacitance, the first design uses a differential cascode topology with a 2.7 V supply voltage. The second design instead uses capacitive cross-coupling of a differential common emitter stage, previously not demonstrated in mm-wave SiGe PAs, and has a supply voltage of only 1.5 V. Low supply voltage is advantageous since a common supply can then be shared between the transceiver and the PA. To maximize the power gain and robustness, both designs use a transformer based interstage matching. The cascode design achieves a measured power gain, $S_{21}$, of 16 dB at 92 GHz with 17 GHz 3-dB bandwidth, and a simulated saturated output power, $P_{sat}$, of 17 dBm with a 16% peak PAE. The cross-coupled design achieves a measured $S_{21}$ of 10 dB at 93 GHz with 16 GHz 3-dB bandwidth, and a simulated $P_{sat}$ of 15 dBm with 16% peak PAE. Comparing the measured and simulated results for the two amplifier architectures, the cascode topology is more robust, while the cross-coupled topology would benefit from a programmable cross-coupling capacitance.

Keywords—mm-wave, E-band, SiGe, power amplifier

I. INTRODUCTION

The future 5G networks with high data rate capacity is a key driver for the research on E-band transceivers for wireless communication. These networks will be heterogeneous, i.e. they will contain both macro, pico and femto cells. Since the number of base stations will increase there will be a shift towards a wireless backhaul. Deploying an optical fiber backhaul will no longer be cost effective. The E-band with three sub-bands located at 71-76 GHz, 81-86 and 92-95 GHz is highly suitable for wireless point-to-point communication [1] of gigabits per second. Due to low volume and design difficulties at high frequencies early E-band commercial transceivers had a low integration level. E-band transceivers of today, however, are highly integrated in plastic packages [2].

The power amplifier (PA) of the transmitter is a key block, determining the saturated output power ($P_{sat}$), the output compression point (OCP$_{1\text{dB}}$), and the power efficiency [3]. Since higher order modulation schemes used today, e.g. 64-QAM, require a power amplifier with high linearity, the PA should typically be optimized for a maximum power added efficiency (PAE) at the OCP$_{1\text{dB}}$, whereas the PAE in saturation is less important. Typically SiGe PAs use a cascode topology [4]-[8] with a supply of several volts requiring a dedicated regulator. One way to increase the integration level is therefore to use a single low supply voltage for the entire transceiver. This paper presents simulations and measurement results for two 2-stage PAs with an overall architecture given in figure 1. Both PAs use up-transformation of the input impedance of the second stage to increase the power gain. The first design is uses a cascode topology, with a supply voltage of 2.7 V for maximum PAE at the compression point. The second design is based on neutralization by capacitive cross-coupling [9]-[11], having a supply voltage of only 1.5 V, thereby enabling increased transceiver integration. The designs are implemented in a 0.18 μm SiGe HBT process with four Cu metal layers with a maximum $f_T$ of 200 GHz for a device with an open-base breakdown voltage, $BV_{CEO}$, of 1.5 V.

The overall architecture, given in figure 1, uses three single turn transformers. At E-band frequencies single turn transformers are advantageous, as interwinding capacitances make it difficult to design multi-turn transformers that are well balanced. The first transformer is used as a balun to provide a differential input signal for stage 1 from the single ended source. The second transformer is part of the interstage matching and is required to match the output impedance of the driver stage to the input impedance, $Z_{in2}$, of the output stage. The output transformer converts the differential signal to single-ended and provides a 25 Ω load impedance for each output transistor of stage 2. The supply voltage was optimized to maximize the PAE for a 25 Ω single ended load, resulting in a supply of 2.7 V and 1.5 V for the cascode and cross-coupled design, respectively.

B. Power amplifier cores

The PA output stage core topologies for the two designs, shown in figure 2, for the architecture given in figure 1, is except for device sizes, bias currents and resistor $R_i$ identical to the driver stages. From figure 1, the first stage is biased through a transformer center tap on the secondary side, and $R_i$ is therefore not needed. In conventional PA designs a cascode stage is used
to provide isolation and increase the gain [4]-[8]. To maximize the gain, both PA cores are designed using the fastest devices provided in the technology, with $f_t = 200$ GHz and a Collector-Emitter-Open base breakdown voltage, $BV_{CEO}$, equal to $1.5$ V. The voltage handling is higher than $BV_{CES}$ in a real design since the base terminal is not completely open [3]. For the cascode design, the breakdown voltage is limited by the collector-base breakdown voltage with open base, $BV_{CBO}$, for the common base devices [3], which is larger than $BV_{CES}$ For the selected devices, $BV_{CES}$ is equal to $5.8$ V. The device type has its maximum $f_t$ for an emitter current density of $6.5$ mA/μm$^2$. For the simulations of the cascode design, the driver and output stages are biased with a total current of $13$ mA and $51$ mA respectively.

In the simulations of the cross-coupled design, the driver and output stage were biased with $8$ mA and $40$ mA, respectively. To minimize the inductance from the emitter to ground, creating a parasite inductive degeneration of the stage, parallel multi fingered devices, configured as Collector-Emitter-Base-Emitter-Collector (CEBEC) were used. From figure 2, the cross coupled capacitors were realized with diode connected devices $Q_2$. Compared to using MIM-capacitors, this is preferred in order to mitigate effects of process spread [13]. Depending on the size of $Q_2$, there is a certain size of $Q_2$ that maximizes the stability factor $k$. An unconditionally stable design has $k>1$ [9], [11], [12]. However, to increase the gain a larger size of $Q_2$ was used, corresponding to positive feedback. If the size of $Q_2$ is increased too much, the design becomes unstable. For the simulated design, the minimum $k$ was equal to $4.8$, i.e. the design is unconditionally stable. Comparing the two architectures, the cascode topology offers a more robust way of increasing the gain, while the cross-coupling technique is sensitive but has the potential to be utilized in low supply voltage designs.

C. Interstage matching
The architecture of the interstage matching network, common for both designs, is shown in figure 3. The up-transformation of $Z_{in,2}$, the differential input impedance of the second stage,

increases the load impedance of the first stage, thereby boosting the power gain. For the cascode design the real part of $Z_{in,2}$ is $15$ Ω. Due to the positive feedback, the cross-coupled design has a higher $Z_{in,2}$ of $55$ Ω. For the cascode and cross-coupled designs, $C_1$ equals $100$ fF and $110$ fF, respectively. The transformer inductance is tuned to resonate with the circuit capacitances. The up transformation with a combination of transformer shunt inductance and a series capacitor, $C_1$, provides a load resistance of the first stage of $180$ Ω, and $200$ Ω for the cascode and cross-coupled designs respectively.

D. Transformer design
The individual transformer design together with minimization of electromagnetic coupling between different transformers is critical for the overall performance of the PA [14]. In this work, all transformers and inductors were therefore co-simulated in ADS Momentum. The model for the cascode design is given in figure 4, showing input, interstage and output transformers. In figure 6, the wiring traces to the cross-coupling devices are indicated. These traces were included in the model for the cross-coupled design. Since the reactance of the wire is of the same order as that of the diode connected device, careful modeling is essential for gain and stability optimization. As shown in figure 4, all three transformers were modelled including the lower level metal interconnects to the center of the active devices. Since the devices are wide in comparison to the diameter of the transformers, the inductance of the lower level interconnect, Cu2 (red) cannot be neglected. The higher resistivity of the interconnects also reduces the effective Q-value of the transformer. For the cascode design, the input, interstage and output transformer have an inner diameter of 26, 28 and $29$ μm, respectively, with trace widths equal to 5.5, 5.1 and 4.6 μm. For the cross-coupled design the inner diameters equal 26, 22 and 24 μm with trace widths of 5.5, 5.6 and 5.5 μm. Since losses in the transformers have a negative impact on PAE, $P_{sat}$ and $S_{21}$, these losses were minimized by optimizing the Q-values of the transformers in the 81-86 GHz frequency range.

To minimize trace resistance and to avoid capacitive losses to the substrate [14], the transformers were implemented in the two top layers (Cu4 and Cu5), having thicknesses of $2.8$ μm and $1.05$ μm, respectively. The supply voltage to the stages was connected through a center tap on the primary inductors. The transformer inductance is constrained by the output capacitance of the circuit, since for maximum gain and minimum loss these should be in resonance at $84$ GHz. Increasing the width of the
transformer traces decreases the resistive losses, but increases the capacitance. A larger coil diameter also decreases the loss, but increases the inductance, thereby requiring smaller devices in the amplifier stages.

III. MEASUREMENT AND SIMULATION RESULTS

The small signal performance of the two designs was measured using a Rohde & Schwarz ZVA vector network analyzer together with Z110E 75-100 GHz extenders. Cascade Infinity GSG probes with 100 μm pitch and WR10 waveguide interface were used at both input and output. Short low loss coax cables with WR10 interface were used to connect the probes to the extenders. Due to limited extender output power only small signal measurements were performed. The chip photo of the cascode design is given in figure 5. The larger part of the chip area of 0.86 mm² is used for decoupling, using both Metal-Insulator-Metal (MIM) and Metal-Oxide-Metal (MOM)-capacitors.

Fig.5. Chip photo for the cascode design

The small signal performance of the two designs was measured using a Rohde & Schwarz ZVA vector network analyzer together with Z110E 75-100 GHz extenders. Cascade Infinity GSG probes with 100 μm pitch and WR10 waveguide interface were used at both input and output. Short low loss coax cables with WR10 interface were used to connect the probes to the extenders. Due to limited extender output power only small signal measurements were performed. The chip photo of the cascode design is given in figure 5. The larger part of the chip area of 0.86 mm² is used for decoupling, using both Metal-Insulator-Metal (MIM) and Metal-Oxide-Metal (MOM)-capacitors.

Fig.5. Chip photo for the cascode design

The chip photo of the active part of the PA in the second design is given in figure 6. Except for the active part, the layout is identical to figure 5.

Fig.6. The active part of the cross coupled design

Indicated in figure 6 are the interconnect wires for the cross-coupled diode connected devices located on the outside of the common-emitter devices in the center. These four wires were included in the Momentum model for the cross-coupled design. The large signal and small signal performance of the two designs was simulated using the Cadence Spectre RF tool. Gummel Poon models were used for the active devices. In figure 7, the measured input and output matching, S11 and S22, power gain, S21, and reverse isolation, S12, together with simulated performance for a parasitic extracted view of the active stages and an s-parameter model of the transformers is shown for the cascode design.

Fig.7. Measured and simulated s-parameters for the cascode design

The measured maximum power gain, S21, equals 15.7 dB at 92 GHz, while the simulated S21 equals 16.4 dB at 81 GHz. Comparing the frequency for the measured and simulated maximum S21, the measured performance has its peak gain 11 GHz higher than the simulated. The measured 3-dB bandwidth is 17 GHz compared to a simulated bandwidth of 15 GHz. The measured and simulated s-parameters for the cross-coupled design are shown in figure 8. The maximum S21 equals 10.1 dB and 14.8 dB respectively. The gain differs by 4.7 dB even when the bias current of the first stage has been increased by 16 mA compared to simulations. The measured and simulated 3-dB bandwidth equals 16 GHz and 10 GHz, respectively. The measured minimum k-factor equals 10 at 87 GHz compared to the simulated value of 4.8. This together with the reduced gain indicates that the realized effective cross-coupling capacitance is too small in comparison with simulations.

Fig.8. Measured and simulated s-parameters for the cross-coupled design

The frequency offset between simulated and measured maximum gain is 11 GHz, i.e. the same offset as for the cascode design. The cross-coupling technique is sensitive to the effective reactance of the cross-coupling path. From figures 2 and 6, the inductance of the interconnects and the capacitance of the cross-coupled devices form a series resonance circuit. The models of the active devices are not optimized for the diode configuration with zero collector current. Therefore, the cross-coupled design would need a tunable reactance to optimize the gain. One way to design the tuning is provided in [13] were the
cross-coupling capacitance is implemented as the series combination of a MIM capacitor and a variable junction capacitor. For the cascode design, the output compression point, \( OCP_{1dB,\text{out}} \), and saturated output power, \( P_{\text{sat}} \), were simulated at 84 GHz with a PSS analysis using the harmonic balance engine, to 12.2 and 17.2 dBm, respectively, see figure 9. The maximum PAE is equal to 15.9% in saturation, while it is reduced to the 9.0% at the compression point.

The simulated \( P_{\text{sat}} \) and PAE versus input power for the cross-coupled design at 82 GHz is given in figure 10. The design achieves a \( P_{\text{sat}} \) of 14.7 dBm and an \( OCP_{1dB,\text{out}} \) of 9.0 dBm. The PA achieves a peak PAE of 16.3%. At the compression point the PAE is reduced to 9.0%, similar to the cascode design.

The performance of the presented PAs is compared to other published E-band SiGe PAs in table 1.

### TABLE 1. PERFORMANCE COMPARISON OF SiGe PAs

<table>
<thead>
<tr>
<th>Ref</th>
<th>Tech.</th>
<th>( f_{\text{ref}} ) [GHz]</th>
<th>Freq. [GHz]</th>
<th>Arch.</th>
<th>VCC [V]</th>
<th>Gain [dB]</th>
<th>( P_{\text{sat}} ) [dBm]</th>
<th>( P_{\text{out}} ) [dBm]</th>
<th>Peak PAE [%]</th>
</tr>
</thead>
<tbody>
<tr>
<td>[4]</td>
<td>230/280</td>
<td>77</td>
<td>2 CAS</td>
<td>2.5</td>
<td>22.5</td>
<td>12</td>
<td>15</td>
<td>7.5</td>
<td></td>
</tr>
<tr>
<td>[5]</td>
<td>200-</td>
<td>84</td>
<td>CAS+2CB</td>
<td>2.5</td>
<td>27</td>
<td>16</td>
<td>18</td>
<td>9</td>
<td></td>
</tr>
<tr>
<td>[6]</td>
<td>300/450</td>
<td>84</td>
<td>2-CAS</td>
<td>3.3</td>
<td>28</td>
<td>17</td>
<td>19</td>
<td>8.4</td>
<td></td>
</tr>
<tr>
<td>[7]</td>
<td>200/250</td>
<td>31-39</td>
<td>3-CAS</td>
<td>3.3</td>
<td>23</td>
<td>15.5</td>
<td>17.3</td>
<td>14.8</td>
<td></td>
</tr>
<tr>
<td>[8]</td>
<td>230/300</td>
<td>77</td>
<td>1-CAS 2 CE</td>
<td>2.5</td>
<td>19</td>
<td>12.0</td>
<td>14.5</td>
<td>15.7</td>
<td></td>
</tr>
<tr>
<td>This work 1</td>
<td>200/250</td>
<td>92-34</td>
<td>2 CAS</td>
<td>2.7</td>
<td>15.7</td>
<td>12.2</td>
<td>17.2</td>
<td>15.9</td>
<td></td>
</tr>
<tr>
<td>This work 2</td>
<td>200/250</td>
<td>97-32</td>
<td>1-Crosst</td>
<td>1.5</td>
<td>10.1</td>
<td>90</td>
<td>14.7</td>
<td>16.9</td>
<td></td>
</tr>
</tbody>
</table>

Simulated

**IV. CONCLUSIONS**

In the presented work, two different architectures for a two-stage E-band PAs have been implemented and compared. Both PAs have a measured maximum gain at a frequency in the E-band at 92-95 GHz. The cascode design achieves a measured and simulated maximum \( S_21 \) of 15.7 and 16.4 dB, respectively. The design using capacitive cross-coupling instead achieves a simulated gain of 14.8 dB compared to a measured gain of 10.1 dB. According to simulations, the capacitive cross coupling technique enables a high gain and \( P_{\text{sat}} \) even with a supply voltage as low as 1.5 V. However, the technique requires tunable cross-coupling capacitance to optimize the gain. At E-band frequencies, the cascode topology is less sensitive to process variations and modeling issues, and if the higher supply voltage is acceptable it is a more robust choice.

**ACKNOWLEDGEMENT**

The author would like to thank the Swedish government funding agency Vinnova, the System Design on Silicon (SoS) excellence center, and Infineon Technologies for sponsoring this project.

**REFERENCES**


Paper VI

System simulations of a 1.5 V SiGe 81-86 GHz E-band transmitter

Abstract—This paper presents simulation results for a sliding-IF SiGe E-band transmitter circuit for the 81-86 GHz E-band. The circuit was designed in a SiGe process with $f_{r} = 200$ GHz and uses a supply of 1.5 V. The low supply voltage eliminates the need for a dedicated transmitter voltage regulator. The carrier generation is based on a 28 GHz quadrature voltage oscillator (QVCO). Upconversion to 84 GHz is performed by first mixing with the QVCO signals, converting the signal from baseband to 28 GHz, and then mixing it with the 56 GHz QVCO second harmonic, present at the emitter nodes of the QVCO core devices. The second mixer is connected to a three-stage power amplifier utilizing capacitive cross-coupling to increase the gain, providing a saturated output power of +14 dBm with a 1 dB output compression point of +11 dBm. E-band radio links using higher order modulation, e.g. 64 QAM, are sensitive to I/Q phase errors. The presented design is based on a 28 GHz QVCO, the lower frequency reducing the phase error due to mismatch in active and passive devices. The I/Q mismatch can be further reduced by adjusting varactors connected to each QVCO output. The analog performance of the transmitter is based on ADS Momentum models of all inductors and transformers, and layout parasitic extracted views of the active parts. For the simulations with a 16 QAM modulated baseband input signal, however, the Momentum models were replaced with lumped equivalent models to ease simulator convergence. Constellation diagrams and error vector magnitude (EVM) were calculated in MATLAB using data from transient simulations. The EVM dependency on QVCO phase noise, I/Q imbalance and PA compression has been analyzed. For an average output power of 7.5 dBm, the design achieves 7.2% EVM for a 16 QAM signal with 1 GHz bandwidth. The current consumption of the transmitter, including the PA, equals 131 mA from a 1.5 V supply.

Keywords E-band, mm-wave, EVM, transmitter, power amplifier, 16 QAM, SiGe

I. INTRODUCTION

High capacity Gb/s wireless point-to-point communication links can be implemented in the E-band at 71-76 and 81-86 GHz. Optical fiber has previously been preferred for the backhaul networks [1], [2]. However, it is not always possible to deploy an optical fiber due to regulations, installation time and cost [1], [2]. In the upcoming 5G heterogeneous networks the number of base stations will increase, making a wireless backhaul more favorable. In Europe, the 5 GHz spectrum of each sub-band is divided into 250 MHz channels [2], [3] which can be merged if higher data capacity is required. In the United States the bands are instead divided into 1.25 GHz channels [4].

A typical E-band transceiver product consists of several mm-wave ASICs plus external power amplifiers (PAs). In [2], [5] a SiGe E-band transceiver product is presented, demonstrating a 3.18 Gbps radio link using 256-level quadrature amplitude modulation (QAM) [6] in a 500 MHz RF channel bandwidth, with 8 dBm output power at the antenna. The architecture consists of separate receiver and transmitter ASICs, an external phase locked loop (PLL) together with an external power amplifier (PA) and low noise amplifier (LNA) in GaAs technology. In industry, there has so far been less focus on integration level of E-band transceivers. Compared to chipsets for cellular communication, the integration level for E-band transceivers is therefore still low. As the volumes of wireless links will increase with the deployment of the upcoming 5G networks, integration level will be a key driver for product cost reduction. To address this, in this paper, a 1.5V E-band transmitter for the 81-86 GHz E-band is presented. The transmitter is fully integrated, i.e. it consists of upconversion mixers together with an integrated PA that share a common supply. The upconversion is based on an on-chip 28 GHz QVCO [7]-[9], which creates four LO phases for an I/Q upconversion mixer for the baseband signal. In a second mixing stage, the 56 GHz second harmonic, present at the emitter nodes of the QVCO core devices, upconverts the 28 GHz signal to 84 GHz [7]-[10]. Using a single supply voltage of only 1.5 V for the entire transmitter eliminates the need of a dedicated voltage regulator, since, the supply can then be shared between the transmitter and the digital control circuits. The low supply three-stage PA uses capacitive cross-coupling [11]-[15] to increase the power gain and isolation of each stage. Early E-band transmitters used simple modulation schemes such as binary phase-shift keying (BPSK) or on-off keying (OOK) [1]. These modulation schemes do not require a high linearity transmitter but are on the other hand less spectral efficient [16].

In E-band systems of today, to support spectral efficient transmission with high data rates, M-ary QAM is used. For low bit-error (BER), data links using QAM modulation put more stringent requirements on transmitter nonidealities, resulting in tight error-vector-magnitude (EVM) specifications [17]-[22]. In this paper, the effects on simulated EVM, for a 2 GHz 16 QAM signal, of local oscillator (LO) phase noise, I/Q
imbalance, and PA compression are therefore investigated. Transient simulations were performed using parasitic extracted views of the circuit parts and lumped model equivalents of the inductors and transformers. The EVM was calculated by importing the demodulated data into MATLAB. For each modulation scheme, there is a known relationship between EVM and bit-error-rate (BER) [22]. Using the EVM as a metric to evaluate the performance is advantageous, since more time consuming BER calculations can then be avoided at an early stage of the design phase [22]. The transmitter was designed in a 0.18 μm SiGe HBT process, with four Cu metal layers with a top layer thickness of 2.8 μm, and with an f_t of 200 GHz. The process does not have any MOS devices. In this paper the presented transmitter and the EVM simulation setup are first briefly discussed. In section II, the transmitter architecture is described together with a comparison to other transmitter topologies. The design of the different circuit parts is then discussed in section III. In section IV, the design and layout of the inductors and transformers are presented, together with the layout of the complete transmitter, including the power amplifier. The simulation results for a non-modulated baseband signals are provided in section V. In section VI, the EVM as a metric to analyze transmitter imperfections is discussed together with the simulation setup. A flow chart for the developed MATLAB program for EVM calculation is described and the simulation results are presented. The conclusions are given in section VII.

II. TRANSMITTER ARCHITECTURE

There are several possible architectures for creating an E-band TX carrier. Direct conversion architectures [23]-[25] are often used, however, with different implementations of the generation of the LO frequency. In [23], digital correction is implemented. In [24], a direct conversion E-band transmitter, using an external LO, was simulated and measured. Polyphase filters were used to create quadrature LO signals. Since a direct conversion architecture is susceptible to process mismatch, the work is focused on tuning methods to suppress LO feed-through and I/Q imbalance, using I/Q phase calibration to improve the EVM. The 84 GHz TX carrier can also be generated using injection lock techniques. In [26], a 90 GHz carrier was generated from a 30 GHz VCO using either injection locked or harmonic-based LO tripler circuits. An 84 GHz quadrature injection locked oscillator (QILO) can be injection locked by a harmonic-based LO tripler circuits. An 84 GHz quadrature injection locked oscillator (QILO) can be injection locked by a harmonic-based LO tripler circuits. An 84 GHz quadrature injection locked oscillator (QILO) can be injection locked by a harmonic-based LO tripler circuits.

The conversion gain of the active mixers depends on the amplitude of the signal driving the bases of the current commutating devices. Below a certain signal level, there is a significant reduction of conversion gain. LO-buffers are therefore placed before both the 28 and 56 GHz mixers to secure a sufficient LO level. In [8], [9] the 28 GHz QVCO was locked to an external 1.75 GHz reference signal in a PLL. A buffer was then used to isolate the QVCO from the PLL divider. For symmetry reasons, two buffers are therefore connected to the I-and Q-output, respectively, of the QVCO. In [8], [9], beam steering was also implemented by DC current injection into the load of a Gilbert type phase detector [31] of the PLL [10].

III. TRANSMITTER CIRCUIT BLOCKS

A. QVCO

Simulation and measurement results for the 28 GHz QVCO, see Fig. 2, both standalone [7], [8] and in a PLL [9] have been previously presented. To minimize the I/Q phase error, phase tuning [7], [8] has been implemented. The QVCO in this paper consists of two cores, Fig. 2a, connected together as in Fig. 2b. The main and injection stages were designed with bias currents of 5.8 mA and 1.0 mA, respectively, i.e. the QVCO total bias
The current equals 13.6 mA. To improve the layout symmetry there are two main varactors, implemented as reversed biased pn-junctions using a control voltage $V_{ctrl}$. The QVCO inductors, with a layout shown in Fig. 4, are represented by inductors $L_{VCO}$ in Fig. 2a. The inductors were simulated and modeled using ADS Momentum. Each QVCO core in Fig. 2a contains two phase error tuning blocks biased with control voltages $V_{tune_p}$ and $V_{tune_n}$. With two QVCO cores there are four tuning blocks in total, biased with control voltages from 0 to 7.7V. The I/Q phase error can be minimized by changing these control voltages [7], [8], providing a simulated phase tuning range of 14.5° [7].

Fig. 2. QVCO core schematic and architecture [7]

In this paper, a transformer, as indicated in Fig. 3, is added at the tail of the main stage to extract the 56 GHz second harmonic. The transformer has two center taps. The center tap on the primary side is connected to the collector of the biasing device [10], while the center tap on the secondary side is used for biasing of the 56 GHz LO buffer, shown in Fig. 4b. The QVCO in Fig. 3 is biased with 8.1 mA and 1.0 mA in each main and injection stage, respectively. The main current was increased in order to increase the oscillator performance.

Fig. 3. QVCO architecture with 56 GHz output transformer excluding I/Q phase error tuning

In the simulations of large signal linearity and EVM, the QVCO with 56 GHz output schematic was replaced with a Verilog-A model based on the oscillator in the standard Cadence module library rfLib. Using control parameters for the Verilog-A module, the phase noise can be shaped for a slope of either 20 dB/decade or 30 dB/decade. The four phases of the 28 GHz QVCO are created using time delays. The 56 GHz output is generated from a frequency multiplier. By replacing the transistor level QVCO with a Verilog-A model, it is possible to simulate the complete transmitter using the periodic steady-state analysis (PSS) in Cadence SpectreRF. Due to simulator convergence difficulties, this is not possible using PSS oscillator mode for the design in Fig. 3. The EVM calculations are based on a transient analysis, which would have been too time consuming if a device-level representation of the QVCO had been used.

B. LO buffers

Two 28 GHz buffers, with a topology outlined in Fig. 4a, are placed at the I and Q QVCO outputs.

Fig. 4. LO buffers, 28 GHz LO buffer (a) and 56 GHz LO buffer (b)

The 28 GHz buffer was biased with a tail current of 3.4 mA and loaded with resistors $R_2$ of 60 Ω. This results in a differential output signal of 170 mVp at the base terminals of the 28 GHz mixer. Since the buffer is operating at only 28 GHz, the output voltage amplitude is large enough without having to implement an inductor at the output to resonate with the capacitive parasitics. For the 56 GHz LO buffer, see Fig. 4b, the transformer primary side input nodes, $LO_{56_p}$ and $LO_{56_n}$, are connected to the emitters of the main QVCO devices. The main devices of the QVCO are biased through the primary side center tap connection $I_{bias_{main}}$, while the base terminals of the 56 GHz buffer are biased through the secondary side center tap connection $bias_{LO_{56}}$. The 56 GHz buffer output transformer is connected to the bases of the switching devices of the 56 GHz mixer. The buffer is biased with 3.9 mA, giving a differential voltage swing of 150 mVp at the base terminals of the 56 GHz mixer.

C. Mixers

The core topology of the 28GHz and 56 GHz active double balanced active mixer is given in Fig. 5. The mixer topologies are identical, except for the input signal connection to the transconductance stage and the inductor/transformer at the output. The 28 GHz mixer upconverts the baseband signal with the signal from the 28 GHz LO buffers in Fig. 4a. The output nodes, $Out_{28_p}$ and $Out_{28_n}$, of the load inductor are connected to the transconductance stage input of the 56 GHz mixer. Two turns are used to reduce the die area of the load inductor. The inductance is designed to resonate at 28 GHz with the output capacitance of the 28 GHz mixer plus the input capacitance of the 56 GHz mixer. The baseband signals, $BB_p$ and $BB_n$, are AC-coupled to the 28 GHz mixer transconductance stage with capacitors $C_2$. Each transconductance device is biased with 4.7
mA and degenerated with a resistor $R_i$ equal to 5 $\Omega$ to increase the mixer linearity.

The 56 GHz double balanced active mixer has output nodes, $Out_{84_p}$ and $Out_{84_n}$, connected to the input of the PA. The transconductance devices $Q_2$ are biased with 12 mA each. The current commutating devices $Q_1$ are scaled so that the maximum allowed current density, i.e. 6.5 mA/\mu m^2, is not exceeded when all devices $Q_1$ are turned on simultaneously. For maximum power transfer to the PA, the mixer output transformer, with a layout shown in Fig. 8, is designed to be in resonance with the parasitic output capacitance of mixer devices $Q_1$ plus the input capacitance of the PA. This sets a bias current constraint of the 56 GHz mixer, since a larger bias current, resulting in an increased mixer conversion gain and output compression point (OCP1dB), requires larger devices $Q_1$. Larger devices, however, have a higher output capacitance, which requires the inductance of the mixer output transformer to be reduced to maintain resonance. However, for the transformer to have a low insertion loss, there is a lower limit for its inductance.

**Fig. 5.** 28 and 56 GHz mixer architecture

The selected bias current of 12 mA optimizes the combined performance of the mixer and output transformer. The devices $Q_1$ are switched with a differential LO-signal with an amplitude of 200 mVp. For layout reasons, summation of the signals from the 1- and Q 28 GHz mixer outputs is made after the AC-coupling capacitors $C_1$, equal to 800 fF each. The OCP1dB of the 56 GHz mixer must be large enough so that it does not limit the compression point of the transmitter. To maintain the compression point across the 81-86 GHz band, the input matching of the PA is designed to be wideband with $S_{11}<-10$dB between 75 and 95 GHz. A critical design parameter is the connection distance from the secondary side of the 56 GHz LO buffer transformer to the bases of the switching pair in the 56 GHz mixer. In comparison with the inductance of the transformer, the series trace adds significant inductance. The series inductance was therefore minimized by using the design kit maximum allowed trace width of 10 $\mu$m. To increase the mixer linearity, the transconductance devices, $Q_2$, are degenerated with resistors $R_i$ equal to 10 $\Omega$.

**D. Power amplifier**

The power amplifier architecture is shown in Fig. 6. It is a three stage differential design with interstage matching in between, sharing the transmitter supply voltage of 1.5 V. The fewer the stages, the higher the power added efficiency, (PAE) [11]-[15], [32], [33] of the PA. For the presented transmitter, the minimum number of stages is limited by the output compression point of the 56 GHz mixer. With a two stage PA [14], [15], the maximum output voltage swing from the 56 GHz mixer is not enough to drive the PA into compression (OCP1dB = -4.0 dBm), and therefore an additional stage is required as a preamplifier.

**Fig. 6.** Three-stage power amplifier architecture

In order to minimize the reduction in PAE, the bias current of the different stages is increased from input to output. In conventional bipolar PA designs, a cascode stage [32], [33] is used to increase the gain and provide isolation between the different stages. In this paper, however, due to the limited supply voltage of 1.5 V, a cascode architecture could not be used. Instead, the high frequency gain, as well as the isolation, was increased using a common emitter stage with capacitive cross-coupling [11]-[15]. The topology of the first and second stage, depicted in Fig. 7a, are identical except for device sizes and bias currents. The effective emitter area of the devices in each stage is given in Fig. 7b.

**Fig. 7.** Architecture of the PA stage 1 and 2 (a) and device sizes (b)

The third stage, differs slightly, as it has capacitors in series with the input terminals, as part of an impedance up-transformation network, see Fig. 8. The transistors have a parasitic base-collector capacitance, $C_{bc}$, that reduces the power gain as well as the isolation between the input and output nodes of the common emitter stages [11]-[15]. The capacitive cross coupling technique reduces the effect of the base-collector capacitance. It has been implemented with diode connected devices $Q_2$, with a capacitance $C_{bc-diode}$ [12], [14], [15]. Capacitive cross coupling could also be implemented in two other ways, either with fixed metal-insulator-metal (MIM)...
capacitors or with controllable capacitors, implemented as diode junctions in series with MIM capacitors [14]. Due to process spread, the topology with fixed MIM capacitors was not chosen. With diode connected devices there are benefits both regarding process spread, and large signal behavior [12], [15]. All devices are of the same type, which reduces the effects of mismatch. Since for high output power, the voltage swing across the base-collector junctions will be high enough to cause significant modulation of the capacitance, using cross-coupled diode connected devices is highly beneficial, since the effective capacitance modulation will then be significantly reduced [12]. The bias currents of the three stages were set to 4.5 mA, 9.8 mA and 16 mA, respectively. Using capacitive cross coupling, there is a clear tradeoff between gain, stability and input matching bandwidth. Using too large a value of the cross coupling capacitance results in a large increase in maximum available gain, $G_{\text{max}}$, but on the other hand, the stability factor, $k$ [11], [14], [15], [34] is then less than unity, i.e. the design is not unconditionally stable [11], [34]. At the same time, the input impedance increases, and the input matching bandwidth decreases, making the design sensitive to process spread. For a robust design, there is thus a limit on how much the gain can be increased in each stage while maintaining a sufficient bandwidth.

Single turn transformers with center tap biasing on both primary and secondary side [35] are used as interstage matching between all stages of the PA. Compared to using inductors, the transformer structures are more convenient for connecting two stages together, since in layout, input and output terminals are opposite to each other. To increase power transfer between two stages, the transformer is designed to be in resonance with the output capacitance of the first stage and the input capacitance of the second stage. Between stage two and three, a combination of transformer resonance and impedance transformation is utilized, as shown in Fig. 8 [15]. The output impedance of stage two is represented by resistor $R_{\text{out}_2}$ and capacitor $C_{\text{out}_2}$.

### A. Inductor/Transformer Design and Circuit Layout

#### IV. Electromagnetic simulation

In a mm-wave transmitter design, accurate models of inductors and transformers are needed. In this paper, the ADS Momentum 2.5D electromagnetic simulator has therefore been used to extract S-parameter models for the inductive parts. Since the gain of the active devices in the used technology is limited at 84 GHz, almost all interface nodes between the circuit parts are in resonance to maximize the power transfer. Any modeling error can therefore result in significant loss in signal transfer. For the presented transmitter, intended for higher order QAM modulation, also the matching between different inductive elements is important, since any imbalance can result in impairments of the transmitted signal [17]-[20]. Even small imbalances in capacitive parasitics, in the range of a few femtofarads, can result in I/Q phase errors that cause significant degradation in the BER [7], [8] of the radio link. It is therefore important to design a well-balanced layout of both the IQVCO core and its routing to minimize the error. In case of a residual phase error, this can be minimized using the I/Q phase tuning of the IQVCO in Fig. 2a [7], [8]. The octagonal inductors of the IQVCO together with the routing to the 28 GHz LO buffers, as well as the routing to the PLL divider buffers [8], [9] are shown in Fig. 9. A similar structure has been used in both a PLL and a IQVCO plus I/Q phase error detector circuit [7]-[9]. In the PLL design [8], [9], only the $Div_{Q_p}$ and $Div_{Q_n}$ buffer outputs were used to connect the PLL divider, while outputs $Div_{I_p}$ and $Div_{I_n}$ were left unconnected. Both divider buffers were however active, thereby improving the phase balance of the IQVCO. In the I/Q phase error detector circuit [7], [8], all four outputs were used to connect the detector.

To minimize capacitive losses to the substrate, the inductors are implemented in the top Cu layer. The octagonal inductors are sized with an inner diameter of 50 μm and a trace width of 11 μm. Their differential inductance equals 120 pH with a Q value of 18 at 28 GHz [7], [8]. The transformer used for extraction of the 56 GHz second harmonic has primary side input nodes $In_{56_p}$ and $In_{56_n}$ connected to the emitters of the IQVCO core devices. The secondary side output nodes, $Out_{56_p}$ and $Out_{56_n}$, are connected to the LO buffer in Fig. 4b. The transformer has an inner diameter of 31 μm and a trace width of 7.4 μm. Due to the layout of the active part of the IQVCO and its varactors, series wires with a length of 100 μm are required to connect the 56 GHz transformer.

In Fig. 10, a Momentum view of the two-turn 28 GHz mixer load inductors, the 56 GHz LO buffer output transformer and...
the 56 GHz mixer output transformer connected to the PA input is shown. The 28 GHz mixer outputs are connected to the nodes Mix_28_Ip, Mix_28_In, Mix_28_Qp, and Mix_28_Qn, respectively. To save die area and increase the inductance in order to resonate with the output capacitance of the 28 GHz mixer, the load inductors have been implemented with two turns. The inductor has an inner diameter of 53 \( \mu \)m and a trace width of 3.9 \( \mu \)m. The transconductance stage of the 56 GHz mixer, given in Fig. 5, is connected to the output nodes of the 28 GHz mixer load inductors. The interconnect wires from the load inductors to the 56 GHz mixer have been made wide to decrease the series inductance.

For a certain current handling capability of the output devices, a minimum device size is required, thereby determining the maximum transformer inductance. Further, since the diameter of the transformer in our case is in the same size range as the width of the output device, it is important to model the transformer including the Cu2 connection to the active device. For Momentum modelling, the trace length to the center of the active device has therefore been added to the transformer structure.

B. Transformer/inductor lumped model

S-parameter models of the transformers and inductors were used for simulations using SpectreRF PSS with the harmonic balance option. However, time domain simulations with S-parameter models, such as transient simulations, can give inaccurate results due to convergence difficulties. The reason is that the Momentum simulator creates models in the frequency domain that do not work well in the time domain. Transient simulations were, however, required to evaluate the EVM performance of the transmitter with digitally modulated input signals. Therefore, simplified lumped equivalent models, as shown in Fig. 12, were created for all transformers and inductors of the transmitter.

The 56 GHz LO buffer, shown in Fig. 4b, has its collectors connected to the nodes In_Buff_56p and In_Buff_56n of its output transformer. The transformer was designed with an inner diameter of 35 \( \mu \)m and a trace width of 6.3 \( \mu \)m. The output nodes on the secondary side, Out_Buff_56p and Out_Buff_56n, are connected to the bases of the 56 GHz mixer current-commutating devices, see Fig. 5. A center tap on the secondary side, node bias_TX_mix_56, is used for mixer base biasing. As can be realized from Fig. 10, the series inductance of the routing from the 56 GHz LO buffer output transformer to the 56 GHz mixer is significant in comparison with the inductance of the transformer itself, resulting in unwanted resonances. The wires have therefore been scaled up to the maximum allowed width of 10 \( \mu \)m. The collectors of the 56 GHz mixer are connected to the nodes TX_mixp and TX_mixn of its output transformer. The primary side is implemented in the top Cu layer, yellow color, due to design kit current density rules. The input to the first stage of the PA is connected to the nodes Out84_p and Out84_n, respectively. A center tap is used for PA input biasing.

For the transformers in a PA, the performance of the output transformer is the most important, since its loss has a strong impact on PA efficiency. To reduce the substrate loss, all transformers were designed in the two top Cu layers with 2.8 \( \mu \)m Cu4 and 1.05 \( \mu \)m Cu3. The output transformer, shown in Fig. 11, has an inner diameter of 24 \( \mu \)m and a trace width of 5.6 \( \mu \)m. The supply voltage of the output stage is connected through a center tap on the primary inductor. For maximum gain, and to minimize the loss, the transformer inductance and the output capacitance of the output stage must be in resonance at the transmit frequency of 84 GHz. For each of the four transformers in the presented PA design, the loss is in the range of 1 dB.

For a certain current handling capability of the output devices, a minimum device size is required, thereby determining the maximum transformer inductance. Further, since the diameter of the transformer in our case is in the same size range as the width of the output device, it is important to model the transformer including the Cu2 connection to the active device. For Momentum modelling, the trace length to the center of the active device has therefore been added to the transformer structure.

B. Transformer/inductor lumped model

S-parameter models of the transformers and inductors were used for simulations using SpectreRF PSS with the harmonic balance option. However, time domain simulations with S-parameter models, such as transient simulations, can give inaccurate results due to convergence difficulties. The reason is that the Momentum simulator creates models in the frequency domain that do not work well in the time domain. Transient simulations were, however, required to evaluate the EVM performance of the transmitter with digitally modulated input signals. Therefore, simplified lumped equivalent models, as shown in Fig. 12, were created for all transformers and inductors of the transmitter.

The model uses five fitting parameters: the primary and secondary side inductance, \( L \), the series resistance of the inductance, \( R_s \), the parasitic capacitance to ground of the primary side, \( C_{in} \), the parasitic capacitance to ground of the primary side, \( C_{out} \), and the coupling coefficient \( k \). To keep the model simple, it does not include any interwinding capacitance. When applicable, the model has been extended with series inductances due to routing on the primary and secondary side.
Even with a limited number of lumped elements as in Fig. 12, it is possible to achieve a good match regarding transformer impedances and insertion loss at the transformer operating frequencies.

C. Transmitter layout

The layout of the transmitter is shown in Fig. 13. The total size of the design equals 890 μm x 450 μm, of which more than 30% can be used for additional on-chip decoupling. In total there are 10 inductors and transformers, which dominate the occupied die area. The supply voltage is connected to the top metal layer (yellow color), while the ground is connected to the bottom layer (blue color). High Q metal-oxide-metal (MOM) decoupling capacitors between supply and ground have been created by connecting the two top metal layers to the supply and the two bottom metal layers to ground.

The QVCO is located to the left, followed by the transformer to extract the 56 GHz LO signal. The design has been made symmetrical, i.e. the two 28 GHz mixers are located above and below the 56 GHz LO buffer, while the 56 GHz mixer is placed in the center. A symmetrical layout is highly important, since differences in wire length can otherwise result in impairments of the transmitted signal. Due to the size of the transformers, the distance is quite long between the QVCO core and the 56 GHz mixer, thereby requiring LO signal buffering. The baseband signals are connected to the top and bottom side of the I and Q 28 GHz mixers, respectively. The three-stage PA is located to the right of the 56 GHz mixer.

V. TRANSMITTER AND BUILDING BLOCKS CIRCUIT SIMULATION

A. Power amplifier simulation results

The performance of the three stage PA, including the transformer between the 56 GHz mixer and the PA, was simulated with the output loaded with a pad in the top metal layer and a 50 Ω resistor. This emulates a measurement setup where the PA output is connected to ground-signal-ground (GSG) pads. The input terminals, \(TX_o\) and \(TX_s\) were for optimum input matching driven by a 95 Ω differential port in parallel with a 35fF capacitor. The small signal S-parameters are given in figure 18. At 84 GHz, \(S_{12}\) equals 20.6 dB with a 3dB bandwidth of 7.2 GHz. The output matching, given by \(S_{22}\) equals -5.7 dB at 84 GHz. A wide input match is important for the ability of the 56 GHz mixer to drive the PA into compression. The first stage of the PA has thus been designed so that \(S_{11} < -10\) dB for a frequency range of 16 GHz.

The large signal simulation results are given in Fig. 15, showing output power and power added efficiency (PAE) versus input power. The PA achieves a saturated output power, \(P_{sat-PA}\), of 14.4 dBm, while the 1 dB output compression point, \(OCP_{PA-1dB}\) is 11.1 dBm. When transmitting with M_QAM modulated input signals a high compression point is advantageous, since low EVM is required. The peak PAE equals 17.4 %, while it is 11.7 % at CP_{1dB}.

B. Transmitter simulation results with non-modulated signals

The periodic steady-state (PSS) harmonic balance analysis in SpectreRF was used to simulate the transmitter performance with non-modulated signals. Due to simulator convergence difficulties, it was not possible to use a device level representation of the QVCO. In these simulations, the QVCO, was therefore replaced with ideal time-delayed sinusoidal voltage sources. With a baseband frequency, \(f_{BB}\), at 1 GHz, the frequency at the PA output, \(f_{TX}\), is at 84 GHz for a QVCO frequency, \(f_{QVCO}\), of 27.7 GHz. In the PSS analysis both the LO signals and the baseband signals are defined as large signals. The I and Q baseband signals were phase shifted by 90°. The inductors and transformers of the upconverter and PA were represented with S-parameter models. The output power at 83 GHz versus baseband differential peak voltage, \(V_{p-BB}\), is shown in Fig. 16. In order to decrease the simulation time and the number of harmonics, and to achieve 83 GHz transmit frequency, the baseband frequency was set to 2 GHz. Using the harmonic balance simulator, the QVCO frequency must also be
at a harmonic of the PSS fundamental frequency. The third order distortion was simulated with a PAC-signal at 2.1 GHz.

As seen from Fig. 16, the transmitter can deliver a saturated output power, $P_{\text{sat-TX}}$, of 14.4 dBm with an output compression point $OCPTX_{1\text{dB}}$ equal to 11 dBm. In Fig. 17, $P_{\text{sat-TX}}$ and $OCPTX_{1\text{dB}}$ is simulated versus $f_{BB}$ for $f_{BB}$ equal to either 1 GHz or 2 GHz depending on the desired transmit frequency. Within the 81-86 GHz band, $P_{\text{sat-TX}}$ varies between 14.2 and 14.5 dBm, while $OCPTX_{1\text{dB}}$ varies between 11.0 and 13.4 dBm.

**VI. SYSTEM SIMULATION**

**A. Error Vector Magnitude (EVM) in transmitters**

For a high data-rate wireless link using QAM modulation, the transmitter linearity, LO phase noise and I/Q phase imbalance are highly important [16], [17]-[20], since they impact the achievable BER. During the design of a mm-wave transmitter it is therefore of great value if the expected BER due to transmitter imperfections can be estimated. However, both measuring and simulating the BER is significantly more complicated than using the EVM, from which the BER can then be estimated. In the presented work, the EVM is calculated based on demodulated I and Q signals from transient simulations. In (1) [22], [36] the bit error rate, $P_b$, is related to the signal to noise ratio, $E_s/N_0$, with $Q$ being the Gaussian co-error function [36]. $L$ is equal to the number of levels in each dimension in the constellation diagram and $M$ is the order of the quadrature amplitude modulation, i.e. for 16 QAM, $M$ is equal to 16 and $L$ is equal to 4.

$$P_b = Q \left( \frac{2^{(1-L^2)} L}{ \log_2 L} \frac{E_s}{N_0} \right)$$

For large symbol streams, the EVM is related to the signal to noise ratio as given by (2)

$$EVM_{\text{RMS}} = \left[ \frac{1}{ \text{SNR}} \right]^{\frac{1}{2}} = \left[ \frac{P_d}{P_s} \right]^{\frac{1}{2}}$$

Using (1) and (2), the bit error rate, $P_b$, is directly related to the $EVM_{\text{RMS}}$ as in (3) [25],

$$P_b = 2 \left( \frac{1-L^2}{2} \right) Q \left[ \frac{3 \log_2 L}{L^2-1} \frac{2E_s}{N_0 \log_2 M} \right]$$

In the constellation diagram for 16 QAM, 4 bits are mapped to each transmitted symbol as shown in Fig. 18a.

The definition of the Error Vector Magnitude (EVM) is illustrated in Fig. 18b [17]-[20], showing the difference between an ideal transmitted constellation point and the actual transmitted point. The rms error vector magnitude, $EVM_{\text{RMS}}$, is defined in (4) [21], as the normalized root mean square value of the errors of a large number ($N$) of constellation points.

$$EVM_{\text{RMS}} = \left[ \frac{1}{N} \sum_{t=1}^{N} \left| S_{\text{ideal}} - S_{\text{meas}} \right|^2 \right]^{0.5}$$

$$EVM_{\text{db,RMS}} = 20 \log EVM_{\text{RMS}}$$

In [22], the bit error rate is plotted versus EVM, using (3), for different modulation schemes. The results correspond well with Monte Carlo simulations. As expected, higher modulation order results in more stringent EVM requirements for a certain bit error rate. For 16 QAM, a BER of $3e^{-6}$ is achieved for 20 dB $EVM_{\text{RMS}}$. In linear scale, this corresponds to 10 % $EVM_{\text{RMS}}$. With increasing $EVM_{\text{RMS}}$, the BER quickly degrades, e. g. 18 % $EVM_{\text{RMS}}$, gives a BER of $3e^{-3}$. Early E-band links typically used simple modulation schemes, like BPSK. According to
channels. This is accomplished by a Verilog-A second order to avoid excess leakage of signal images into neighboring modulated baseband signals need to be further low pass filtered with a roll off factor equal to 0.22, from the Cadence library (ISI) [38]. In this work an RRC filter implemented in Verilog-A with a roll off factor equal to 0.22, from the Cadence library (ISI) [38]. In this work an RRC filter implemented in Verilog-A has been used. Before upconversion, the digitally modulated baseband signals need to be further low pass filtered to avoid excess leakage of signal images into neighboring channels. This is accomplished by a Verilog-A second order predistortion are easier to implement if the transmitter itself has a low EVM. Another source of imperfectness, though not investigated in this work, is carrier leakage in the 28 GHz I/Q mixer [22], resulting in a DC-shift of the constellation diagram. Carrier leakage minimization typically requires digital tuning to counteract circuit imbalances.

The EVM has been simulated for a 1 GHz 16 QAM signal. Four random data streams, generated in a module from the Cadence library ahdlLib, at a data rate equal to 1 GHz, were supplied to a Verilog-A 16 QAM generator creating I and Q signals, as outlined in Fig. 19. The data is first filtered in an RRC filter [37]. The purpose of the RRC filter is to reduce out of channel spectral emissions with a limited impact on the peak-to-average power ration (PAPR) [37], [38] and intersymbol interference (ISI) [38]. In this work an RRC filter implemented in Verilog-A with a roll off factor equal to 0.22, from the Cadence library rfLib, has been used. Before upconversion, the digitally modulated baseband signals need to be further low pass filtered to avoid excess leakage of signal images into neighboring channels. This is accomplished by a Verilog-A second order Butterworth low pass filter from the rfLib. After single-ended to differential signal conversion, the modulated signal is supplied to the transmitter baseband input.

At the detector side, the PA output signal is supplied to a Verilog-A I/Q demodulator from rfLib, clocked at a frequency equal to 84 GHz. After filtering in an analog low pass filter, followed by the RRC filter, the I and Q outputs are sampled at a rate equal to fBB. A flow chart for the EVM calculation is given in Fig. 20. The simulated I and Q data, Data_Iout and Data_Qout in Fig. 20, cannot be directly compared with the transmitted data, Data_Isent and Data_Qsent, since it is rotated in phase and has an amplitude that depends on the transmitter gain. Therefore, the I/Q data must first be normalized.

An initial gain calculation is based on the amplitude ratios, Gnorm, between the four transmitted and detected inner points of the constellation diagram. Only the four inner points are used for the initial gain calculation, since compared to the 12 outer points, these are less affected by transmitter compression. The detected data is first normalized with Gnorm. The EVMmin, calculated using (4), has a minimum for a certain fine tuning gain, Gfine, controlled in a gain sweep. The gain optimization is repeated for each phase position controlled by the phase sweep. The gain and phase settings resulting in minimum EVM are then used in the final calculation.

The EVM of the complete transmitter has been simulated with a 16 QAM 1 GHz signal. The EVM dependency on average PA
output power, $P_{\text{out-RMS}}$, QVCO phase noise, and QVCO I/Q phase imbalance have been investigated. In Fig. 21, the EVM dependency on $P_{\text{out-RMS}}$ is shown for a setup with no I/Q phase error and with noiseless devices. At low enough output power, the transmitter is linear and thus does not distort the constellation diagram. For average output powers above 0 dBm, the EVM increases rapidly though. With a PAR of 2.55 dB for 16 QAM, the amplitudes of the 4 outer symbols of the constellation diagram will be close to the OCP1 dB for an output power of 7.5 dBm, giving 6.4% EVM. For $P_{\text{out-RMS}}$ back-off to 5 dBm the EVM is improved to 3.5%. The EVM is dominated by the compression of the PA.

The simulated relationship between EVM and QVCO phase noise level at 1 MHz offset is shown in Fig. 22. The simulation was performed for a small PA output signal and without I/Q phase error. As can be seen, the EVM is equal to 1.8% even for a phase noise level of -120 dBc/Hz. This is due to that during the transient noise analysis, all noise sources in the devices are turned on, not only in the QVCO. At a measured phase noise level of -100 dBc/Hz [8], the EVM equals 3.6%, i.e. the QVCO phase noise adds 1.8% to the EVM. Further decreasing the phase noise would increase the QVCO power consumption significantly.

Inside the PLL bandwidth the phase noise will be reduced, thereby improving the EVM performance. The performed simulation thus provides a safe estimate of allowed QVCO phase noise. In [9], the PLL measured phase at 1 MHz offset equals -107 dBc/Hz. In Fig. 23, the EVM is simulated versus QVCO I/Q phase error. The simulations were performed for a low PA output level and with the noise sources turned off. As can be seen even an I/Q phase error as small as 3° results in an EVM of 2.8%, thereby stressing the need for I/Q phase error calibration. One way of implementing the I/Q phase calibration is shown in Fig. 2a [7], [8].

The effects on the 16 QAM constellation diagram of the three separately investigated transmitter impairments, together with the combined effects, are shown in Fig. 24. The effect of compression is shown for $P_{\text{out-RMS}}$ equal to 7.5 dBm, see Fig. 24a. The effect of a QVCO phase noise level of -90 dBc/Hz is shown in Fig. 24b, while the effect of an I/Q phase error equal to 6° is given in Fig. 24c. The combined effect of all three impairments in Fig. 24d, with $P_{\text{out-RMS}}$ equal to 7.5 dBm, $PN$ at 1 MHz offset equal to -100 dBc/Hz, and an I/Q phase error equal to 1° is shown in Fig. 24d, giving an EVM of 7.2%. An I/Q phase error of 1° is the accuracy that can be achieved using the phase error detector and tuner presented in [7] and [8]. In this case the EVM is dominated by the compression effect, giving 6.4% EVM, see Fig. 21.
As can be seen in Fig. 24a, too high an output power causes gain compression of the outer points in the constellation diagram. The QVCO phase noise causes a rotational random shift of the constellation points, while the I/Q phase error results in a skewed constellation. The simulated transmitter performance is summarized in Table 1.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
<th>Unit</th>
<th>Note</th>
</tr>
</thead>
<tbody>
<tr>
<td>Supply voltage</td>
<td>1.5</td>
<td>[V]</td>
<td>Common supply for complete TX including PA</td>
</tr>
<tr>
<td>PA @ complete TX</td>
<td>196</td>
<td>[mW]</td>
<td>PA output power well below OCP_{sat}</td>
</tr>
<tr>
<td>PA @ PA only</td>
<td>87</td>
<td>[mW]</td>
<td>PA output power well below OCP_{sat}</td>
</tr>
<tr>
<td>QVCO PN @ 1MHz offset</td>
<td>-100</td>
<td>dBC/Hz</td>
<td>Measured value [8]</td>
</tr>
<tr>
<td>QVCO frequency range</td>
<td>26.0-31.0 GHz</td>
<td></td>
<td>Measured values [8]</td>
</tr>
<tr>
<td>PAE @ PA only</td>
<td>11.7</td>
<td>%</td>
<td>PAE at OCP_{sat}</td>
</tr>
<tr>
<td>PA @ complete TX</td>
<td>14.2</td>
<td>%</td>
<td>MIN</td>
</tr>
<tr>
<td>QCP @ complete TX</td>
<td>11</td>
<td>%</td>
<td>MIN</td>
</tr>
<tr>
<td>EVM for PN = 7.5 dBm</td>
<td>7.2</td>
<td>%</td>
<td>1 GHz in QAM, QVCO PN = 100 dBC/Hz @ 1 MHz offset, I/Q phase error = 1, including noise effects</td>
</tr>
</tbody>
</table>

VII. CONCLUSIONS
This paper presents system simulation results for an 81-86 GHz E-band transmitter based on a 28 GHz QVCO. Up conversion to 84 GHz carrier frequency is performed with mixing of the 56 GHz differential second harmonic present at the emitters of the QVCO core cross coupled transistors. Basing the transmitter on a 28 GHz QVCO instead of an 84 GHz QVCO is advantageous, since the I/Q phase error will be significantly reduced for a lower frequency QVCO due to less impact of mismatch in capacitive and inductive parasitics. System simulations, investigating effects of compression, phase noise and I/Q phase imbalance have demonstrated the transmitter performance for a 1 GHz 16 QAM signal. The impact of phase noise on the EVM has been investigated by using a Verilog-A equivalent model of the QVCO with controllable phase noise. The QVCO has a nominal phase noise of -100 dBC/Hz at 1MHz. For the system simulations, lumped equivalent models were used for all transformers and inductors in the design. For the circuit simulations with continuous baseband signals, S-parameter transformer models were instead utilized. In the E-band at 81-86 GHz, the saturated PA output power exceeds 14.4 dBm with an output compression point of 11 dBm. With 16 QAM modulation the transmitter achieves an EVM_{rms} of 7.2% at an average output power of 7.5 dBm. This corresponds to a BER of less than 1e-6 [22]. Transmission with an increased EVM is possible, but reduces the user throughput due to an increased number of bits used for coding. Below the OCP_{sat}, the complete transmitter including the PA consumes 131 mA from a common 1.5 V supply.

ACKNOWLEDGMENT
The author would like to thank the Swedish funding agency Vinnova, the System Design on Silicon (SoS) excellence center and Infineon Technologies for sponsoring this project.

REFERENCES


[38] B. Chatelain; F. Gagnon, “Peak-to-average power ratio and intersymbol interference reduction by Nyquist pulse optimization,” *IEEE 60th Vehicular Technology Conference. VTC2004-Fall*, pp. 954-958, 2004
Paper VII

Single-Ended Low Noise Multiband LNA with Programmable Integrated Matching and High Isolation Switches

Single-Ended Low Noise Multiband LNA with Programmable Integrated Matching and High Isolation Switches

Tobias Tired, ST-Ericsson, Pietro Andreani, Lund University

Abstract—This paper describes a novel 90nm single-ended multiband input LNA preceded by RF input switches connected to an on-chip balun intended to drive a differential mixer. The architecture achieves a low noise figure of 1.8dB. The advantage with the proposed architecture is that it is fully single-ended with on-chip programmable narrow-band matching eliminating the need of off-chip components. Especially in multiband integrated radios a single-ended LNA is highly desirable since the pin-count for the LNAs is reduced by half compared with a differential architecture. The PCB routing of the RF input signal is simplified. Narrow-band matching is advantageous compared to common broadband matching since this adds attenuation of out of band interferers and suppresses conversion of 3rd LO harmonic. This is important for the coexistence of cellular systems with e.g. WLAN 802.11a operating in the 5GHz band.

I. INTRODUCTION

In a common multiband receiver as depicted in figure 1 there is one separate LNA for each frequency band plus one duplexer connected to each RF input required to attenuate the TX signal that leaks into the LNA. The duplexer typically provides some 50-55dB isolation from TX to RX. The TX leakage into the active LNA is also affected by the leakage through the non-active duplexers and LNAs. The receiver is degraded by the TX-leakage through intermodulation generated by second- and third order distortion.

The presented architecture depicted in figure 2 is based on only one single ended LNA with programmable on-chip input matching preceded by on-chip RF input switches. A tunable on-chip balun between the LNA and mixer creates a differential RF signal for the mixer. The differential mixer is advantageous regarding second order distortion. The presented input RF switches provide a combination of low on-resistance together with high isolation for TX-leakage while turned off. The multiband LNA is designed for WCDMA band I, II and III plus DCS and PCS EGSM.

II. TX-LEAKAGE RECEIVER DEGRADATION IN MULTIBAND WCDMA RECEIVERS

A. Second and third order distortion due to TX-leakage

The TX-signal is a digitally modulated interferer containing AM- and FM modulation. The AM-modulation can be represented by a two-tone interferer. Two interferers at \( f_1 \) and \( f_2 \) inserted into an LNA and mixer with a second order nonlinearity will generate an intermodulation product at their difference frequency \( f_1-f_2 \) [1] in the receive band. Second order distortion is generated through RF self-mixing of the AM-modulated TX-leakage in the mixer [2], second order nonlinearity in the mixer transconductance stage [3] and cross modulation [4], [5]. The third order nonlinearity of the LNA and mixer will create an intermodulation product at the RX frequency originating from the TX-leakage into the LNA with power \( P_2 \) and an interferer at half the duplex distance with power \( P_1 \). The following applies [5] for the third order intermodulation product \( P_{\text{IM3}} \) at the LNA input:

\[
P_{\text{IM3}}(dBm) = 2P_1(dBm) + P_2(dBm) - 2|I|P_1(dBm)  \tag{1}
\]
B. TX-leakage paths in multiband LNAs

For certain WCDMA frequency bands, e.g. band I there is an overlap between the RX and TX frequencies resulting in hard requirements on the isolation between the different LNA input ports.

- Band I RX: 2110-2170MHz, TX: 1920-1980MHz
- Band II RX: 1930-1990MHz, TX: 1850-1910MHz

When the receiver in figure 1 is configured for band I, LNA 1 is active and LNA 2 is turned off. At maximum output power the duplexer 1 input power is +26dBm. There is 0dB attenuation in the duplexer 1 from the TX input to the antenna switch (ASW) input. The isolation in the antenna switch between duplexer 1 and 2 is rather poor, i.e. 20dB. Since the band 1 TX frequency overlaps the band II RX frequency there is no attenuation in duplexer 2 for this TX interferer. The TX power at the turned off band II LNA input is therefore equal to +6dBm. Assuming 52dB isolation in the duplexer, the TX power at the band 1 LNA originating from the isolation between the LNA2 and LNA1 inputs. In order for this leakage to be 10dB less than the duplexer leakage the LNA input isolation must exceed 42dB.

Narrow band input matching together with a balun is advantageous for the issue of coexistence of cellular systems and WLAN. The upper part of the WLAN 802.11a band is at 5.8GHz [6]. The isolation [7] between the WLAN and cellular antenna is rather low, i.e. 15dB. The cellular duplexer does not attenuate the 5.8GHz interferer more than 50dB. At maximum WLAN output power, i.e. +20dBm the cellular receiver sees an interferer of -45dBm at 5.8GHz. If a wideband LNA is used and the RX mixer is driven by a square wave LO the only selectivity available is the -9.5dB from the third harmonic down conversion [8]. Down converted the WLAN interferer corresponds to a -54.5dBm in band II and PCS @1933MHz. Narrow band input matching together with a balun as in the presented multiband LNA is capable of adding additional selectivity to attenuate the 5.8GHz interferer.

III. DETAILED DESCRIPTION OF THE ARCHITECTURE

A. Packaging technology and grounding

The design is intended to be used with a WLP package [9] which is a package type with the die flipped upside down. With this package the smallest inductance from a die ground pad to the PCB ground is approximately 200pH. Each RF input switch depicted in figures 2 and 4 is associated with a dedicated ground connection gnd_switch_n where n=1...5. The LNA has a dedicated separate ground gnd_chip as illustrated in figure 3. The gnd_chip ground is connected to the PCB ground through four parallel inductance traces.

B. Input impedance of the inductively degenerated LNA

The inductively source degenerated MOS LNA has the advantage that it creates a real part of the input impedance without adding a resistor [10].
ground to the left of the on-chip series inductor \( L_g \), the expression for the input matching becomes much more complex. The resulting capacitance is denoted \( C_p \) and is the sum of all capacitance to signal ground at this node, i.e. parasitic capacitance from ESD-diodes, RF input pad, RF-input switches plus the parallel band dependent tuning capacitor to ground \( C_{p,\text{ext}} \). \( C_{\text{switch, on}} \) is the parasitic capacitance to ground of the turned-on switch connecting the LNA to the input port. The parasitic capacitances of the turned-off switches is denoted \( C_{\text{switch, off}} \).

\[
C_p = \sum C_{\text{ESD}} + C_{\text{pad}} + C_{\text{switch, on}} + C_{\text{switch, off}} + C_{p,\text{ext}} \tag{6}
\]

The total input impedance \( Z_{\text{in tot}} \) is now equal to the parallel connection of \( C_p \) and the input impedance of the series connection of \( L_s \) and \( Z_{\text{in,gs,ext}} \), i.e. \( Z_{\text{in,gs,ext}} \) if \( C_{\text{gs,ext}} = C_{gs} + C_{ps,\text{ext}} \) and \( L_s = L_g + L_s \).

\[
Z_{\text{in tot}} = \frac{1}{\frac{1}{sC_p} + \frac{1}{sC_{\text{gs,ext}}}} = \frac{g_m L_s}{C_i} + sL_i + \frac{1}{sC_i} \tag{7}
\]

The approximated real and imaginary part of the input impedance using typical design values for \( g_m \), \( L_s \), \( C_i \) and \( C_p \) are given by

\[
\begin{align*}
\text{Re}(Z_{\text{in tot}}) &\approx \frac{g_m L_s}{C_i} \left( 1 + \frac{C_i + C_p}{sC_p} \right) - \frac{g_m L_s}{sC_p} \\
\text{Im}(Z_{\text{in tot}}) &\approx \frac{\omega L_s (C_i + 2C_p - C_i - C_p)}{sC_p} \tag{9}
\end{align*}
\]

By selecting \( C_i \) and \( C_p \) i.e. tune the value of \( C_{ps,\text{ext}} \) and \( C_{p,\text{ext}} \) for different values of \( \omega \) the real part can be made equal to 50\( \Omega \) and the imaginary part can be cancelled. When \( C_p \) approaches zero the real and imaginary input impedance will be equal to

\[
\begin{align*}
\text{Re}(Z_{\text{in tot}}) &\approx \frac{g_m L_s}{C_i} \\
\text{Im}(Z_{\text{in tot}}) &\approx \omega L_i - \frac{1}{sC_i} \tag{10}
\end{align*}
\]

The on-chip inductor \( L_g \) of 5.2nH has a Q-value of 15 @ 2140MHz and is represented with an s-parameter model. The diameter equals 300\( \mu \)m. The source degeneration inductor \( L_s \) equals 230\( \mu \)H with \( Q=12 @ 2140MHz \). The total input impedance was simulated and is represented by a 750\( \Omega \) resistor in parallel with a 120fF capacitor. The centre tap impedance was simulated and is represented by a 750\( \Omega \) resistor in parallel with a 120fF capacitor.

\[
G_v = G_{m,\text{tot}} \cdot \left| Z_{\text{in, balun}} \right| \cdot G_{v,\text{balun}} \tag{12}
\]

\[
G_{m,\text{tot}} \text{ is the overall transconductance from the 50\( \Omega \) port to the output of the cascode, } G_{v,\text{balun}} \text{ is the voltage gain of the balun, } Z_{\text{in, balun}} \text{ is the impedance seen looking into the balun and tuning capacitor bank from the LNA cascode output. Between 1800MHz to 2200MHz the value of } \left| Z_{\text{in, balun}} \right| \text{ is approximately 50\( \Omega \).} \]

\[
G_{m,\text{tot}} = g_m \cdot Q = g_m \frac{v_{gs}}{v_{in}} \tag{13}
\]

where \( v_{in} \) is the input voltage at the 50\( \Omega \) port, \( v_{gs} \) is the gate-source voltage and \( g_m \) is the transconductance of the LNA input device. At the matching resonance frequency \( v_{in} \) will be \( Q \) times larger than the input voltage at the port. The duplexer is designed to see a 50\( \Omega \) input match across the received band in order to achieve the specified attenuation for the TX interferer. The input matching requirement must be fulfilled for process, supply voltage and temperature spread, therefore the bandwidth of the input matching is designed with a margin. The input NMOS was scaled with \( W=600\mu m \) and \( L=130nm \).

The maximum power of the WCDMA wanted signal is -25dBm. In order not to compress the baseband filter after the mixer with the wanted signal a gain switch is required. The gain switch is implemented by reducing the drain current of the LNA. In order to keep the 50\( \Omega \) matching the parallel tuning capacitance to ground, \( C_{PE,\text{ext}} \), is increased. This implementation of the gain switch reduces the average current consumption of the receiver since the maximum LNA gain is only required for very week signals. The LNA current while configured for maximum gain equals 14.9mA. When configured for maximum gain -6dB and maximum gain -12dB the current is reduced to 3.27mA and 1.32mA respectively.

\section*{D. Balun and frequency tuning function}

The cascode NMOS output is connected to the on-chip balun with a resonance tuning capacitance block at the primary side as depicted in figure 3 to maximize the voltage gain. The balun is implemented in layout and occupies an area of 270\( \mu m \times 270\mu m \). A five port plus substrate connection s-parameter model was extracted using the Momentum simulator. The output from the balun is intended to be connected to a differential passive mixer. The switching mixer input impedance was simulated and is represented by a 750\( \Omega \) resistor in parallel with a 1200F capacitor. The centre tap connected to node bias at the secondary side is used to bias the passive mixer connected to nodes Out_balun \( _p \) and Out_balun \( _n \). The resonance frequency is tuned by activating NMOS switches in series with capacitors.

\section*{A. The RF input switch}

The design of the RF input switch depicted in figure 4 is crucial for the performance of the presented multiband LNA. The switch has a very low on-resistance, \( r_{on}=1.3\Omega \) and at the same time provides high isolation while turned off. If not addressed the switch will leak in the off-mode through its
parasitic capacitances $C_{gs}$, $C_{gd}$ and $C_{db}$. The switch is DC-coupled to the LNA. The effect of $r_{on}$ on the NF of the LNA is identical to a series inductor with low Q-value. The gate-source capacitance, $C_{gs}$ of the RF switch NMOS, T1, is very large, 774fF due to the device size, W=700μm and L=200nm. The gate should have a high impedance bias to reduce the capacitive loading of $C_{gs}$. In the off-mode, i.e. $V_{GATE\_SWITCH} = 0V$, the TX interferer is shorted to the dedicated ground $GND\_SWITCH\_N$ through the NMOS device T2 with $r_{on} = 5Ω$. Using a dedicated ground for each switch improves the isolation since an interferer otherwise could be coupled to the single ended LNA ground.

IV. SIMULATED PERFORMANCE

The performance of the multiband LNA in band I, II and III is summarized in table 1. The performance in DCS and PCS is identical to the band II and III performance. The intermodulation simulations where made using the Cadence Spectre RF tool. The isolation was simulated as the difference in balun output voltage when the same AC signal is applied to either the turned on LNA (band I) or the turned off LNA (band II). The high voltage gain of approximately 28dB is achieved through the voltage gain of the balun. The noise figure of 1.9dB (band I) is dominated by the rather low Q-value of the integrated inductor. With an ideal inductor the noise figure equals 0.83dB. The matching is maintained when the gain switch is active by increasing the shunt capacitance using $C_{pg}\_CTRL}$. The duplexer was assumed to have 52dB isolation from TX to RX resulting in +26-52-26dBm TX power at the LNA input. The TX cross compression point is at least-20dBm. For the EGSM bands DCS 1800 and PCS 1900 the receiver must have a cross compression point of -23dBm for a blocker at 3MHz from the received signal. This is achieved with margin even in the low gain mode. A high selectivity for the WLAN interferer at 5.8GHz is achieved by the combination of narrow band input matching and integrated balun.

V. CONCLUSIONS

The benefit of the presented architecture is that it provides a multiband single ended LNA with integrated programmable matching thereby reusing the integrated matching inductor. The need for discrete matching components on the PCB is eliminated. A single ended design is advantageous since it reduces the number of package pins. High isolation between the different RF inputs is guaranteed by the presented implementation of the RF switches. When the signal strength of the RX signal is large the current consumption of the LNA can be significantly reduced by activating the gain switch. The noise figure could be improved by increasing the Q-value of the series inductor. This requires a larger area though.

REFERENCES


Tab.1. Performance summary of the multiband LNA
Paper VIII

A BiCMOS single ended multiband RF-amplifier and mixer with DC-offset and second order distortion suppression

A BiCMOS single ended multiband RF-amplifier and mixer with DC-offset and second order distortion suppression

Tobias Tired

Abstract Direct conversion receivers are widely used for full duplex mobile radio communication systems. This paper describes a novel SAW-less single-ended RF amplifier connected to a single-ended mixer with a feedback loop that suppresses the second-order distortion from TX cross modulation of the LO-leakage as well as DC-offset at the mixer output. In Monte Carlo simulations the design achieves $+47 \text{ dBm}$ minimum $\text{IIP}_2$ with $32 \text{ dB}$ conversion voltage gain. The advantage with the proposed architecture is that it is fully single-ended. Especially in multiband integrated radios this is highly desirable since the pin-count for the LNAs is reduced by half. The PCB routing of the RF input signal is simplified. The design requires two off-chip filter capacitors of non critical value intended to be placed on the laminate inside the package.

Keywords BiCMOS integrated circuits · Second order distortion · $\text{IIP}_2$ · Single ended mixers · Mismatch · DC-offset · Cross modulation

1 Introduction

Conventional WCDMA LNA and mixer architectures are differential and an external SAW filter is required between the LNA and mixer [1–3] if the receiver linearity is too low. The purpose of the SAW filter is to attenuate the TX-signal that leaks into the LNA through the finite isolation of the duplexer. The duplexer typically provides some $50–55 \text{ dB}$ isolation from TX to RX, but if the linearity is not high enough a SAW filter is needed to prevent the TX-leakage from deteriorating the receiver performance. The receiver is degraded through intermodulation generated by second and third order distortion. There are several possible combinations of integrated single-ended/differential LNA and mixers that could be used in high linearity direct conversion architecture.

- Differential LNA and differential mixer: The drawback is the additional package pin for the LNA. A multiband circuit will need a larger package.
- Single ended LNA and differential mixer: The drawback is the large on-chip balun between the LNA and mixer to create a differential RF signal for the mixer. If several baluns are needed in a multiband solution the area penalty is increased.
- Single ended LNA and single ended mixer.

In a fully single ended solution there is no need for an on-chip balun. The architecture presented in this paper is depicted in Fig. 1 describing a solution with multiple LNAs supporting different frequency bands. The LNAs are preceded by duplexers separating the RX and TX signals. The single ended mixer has a feedback loop that suppresses both $\text{IM}_2$ and DC-offset. Single ended mixers have the drawback compared to differential mixers that they do not suppress noise from the LO-driver. To compensate for this architectural difference the presented RF amplifier has a built-in attenuation of noise at harmonics to the LO-frequency since the gain of the RF amplifier preceding the mixer has a very steep roll-off. All mixers downconvert noise from odd harmonics to $f_{\text{LO}}$. Reducing the contribution to the mixer noise figure from the higher harmonics significantly lowers the noise figure of the mixer. The paper is organized as follows: Section two gives an overview of second and third order distortion mechanisms. Section
three gives a brief description of earlier presented solutions, i.e. trimming of the mixer load and L-C filters at the mixer input. Section four gives a detailed description of the presented architecture. Section five presents the simulated performance and section six describes the conclusions.

2 Third and second order distortion in WCDMA receivers

2.1 Third order distortion

A third order intermodulation product will be generated in the LNA and in the switching mixer core. For WCDMA the worst intermodulation case is when an interferer is present at half the duplex distance between the RX and TX frequency. The third order nonlinearity of the LNA and mixer will create an intermodulation product at the RX frequency originating from the TX-leakage into the LNA with power $P_1$ and the interferer at half the duplex distance with power $P_2$. With the third order intercept point denoted as IIP3 the following applies [4] for the third order intermodulation product $P_{IM3}$ calculated back to the LNA input.

$$P_{IM3}(dBm) = 2P_2(dBm) + P_1(dBm) - 2IIP_3(dBm) \quad (1)$$

2.2 Second order distortion

Two interferers at $f_1$ and $f_2$ inserted into an LNA and mixer with a second order nonlinearity will generate an intermodulation product at their difference frequency $f_1-f_2$ [5]. This intermodulation product will fall directly into the wanted downconverted baseband frequency band if the interferers are close to each other. In a WCDMA receiver the worst-case interferer for the mixer second order nonlinearity is the TX signal that leaks into the receive path through the finite TX-RX isolation of the duplexer.

The TX-signal is a WCDMA digitally modulated interferer containing AM and FM modulation. The AM-modulation can be represented by a two-tone interferer with two close frequencies at $f_{TX1}$ and $f_{TX2}$ as depicted in Fig. 2. The second order nonlinearity of the mixer will translate a squared version of the envelope of the TX-signal to the receiver mixer output. The receiver IM2 level due to TX-leakage is tested in a 3GPP standard test case that specifies the minimum required sensitivity for while the transmitted signal is at maximum power level (+24 dBm) at the antenna. With the second order intercept point denoted as IIP2 and if each of the two input tones has the power $P_T$ the following applies [5] for the second order intermodulation product $P_{IM2(f_1-f_2)}$ calculated back to the LNA input.

$$P_{IM2(f_1-f_2)} = 2P_T - IIP_2 \quad (2)$$

There are three mechanisms that generate second order distortion in a zero-IF receiver.

2.3 RF self-mixing

The RF signal can leak to the LO signal in the mixer through parasitic coupling in the mixer core switching devices [6]. If the LO-amplitude is not high enough the mixer behaves more like a linear multiplier [7] and consequently the mixer output will contain a signal that is proportional to the square of the input signal i.e. an IM2 product. If the RF signal is the TX-leakage with AM-modulation a low baseband frequency IM2 product will be generated through self-mixing in the switching mixer core transistors. However, if the LO-amplitude is high enough this effect is significantly reduced.

2.4 Second order nonlinearity in the mixer transconductance stage

The transconductance transistors that generate the RF-current that is supplied to the mixer core switching transistors have a second order nonlinearity. An AM-modulated interferer, represented by two frequencies $f_1$ and $f_2$ will generate a low frequency second order intermodulation product at $f_1-f_2$ that is added to the wanted output current from the transconductance transistor. If the mixer is perfectly balanced, i.e. there is no mismatch in the switching core transistors, the mixer load resistor or in the...
LO driver block, the low frequency intermodulation product at $f_1-f_2$ will cancel at the differential mixer output [8]. In reality mismatch always exists in these components resulting in that this intermodulation product leaks to the mixer output.

2.5 Cross modulation of the LO-leakage

The AM-modulation of the TX-leakage interferer at the mixer core RF input will transfer to the LO-leakage at the mixer RF input through the cross modulation mechanism [2, 4, 9–13]. Downconversion of this AM-modulated LO-leakage with the LO-signal itself will generate a mixer output signal at the IM2-frequency. Compared to the IM2 products generated by self-mixing and second order non-linearity in the mixer transconductance stage the cross modulation product at the two mixer outputs differ by 180$^\circ$. If the input signal to the LNA and mixer, $x(t)$, is the sum of the LO-leakage at the fundamental frequency, $x_1(t)$, and the AM-modulated TX-leakage $x_2(t)$ the following applies [4]

$$x(t) = A_1 \cos(\omega_1 t) + A_2[1 + m(t)] \cos(\omega_2 t)$$

where $m(t)$ is the amplitude modulation of the TX signal. Considering up to third order nonlinearities in the LNA and mixer the output signal $y(t)$ can be written as

$$y(t) = a_1 x(t) + a_2 x^2(t) + a_3 x^3(t)$$

Inserting (3) into (4) and expanding the expression will give

$$y_{crossmod}(t) = \frac{3}{2} a_3 A_1 A_2^2 (1 + m(t))^3 \cos(\omega_1 t)$$

The LO-leakage at frequency $\omega_1$ is modulated by the square of the TX-leakage.

Referring (5) to the input by dividing by the gain $a_1$ and inserting $IP_3 = \sqrt{\text{sman}}$ gives the cross modulation term at the input.

$$x_{crossmod}(t) = \frac{y_{crossmod}(t)}{a_1} = \frac{A_1 2 A_2^2 (1 + m(t))^2}{IP_3} \cos(\omega_1 t)$$

Converting (6) to log-scale gives at the input [2]

$$P_{I_{crossmod}} = 6 + P_1 (dBm) + 2(P_2 (dBm) - IP_3 (dBm))$$

(7)

The cross modulation term at LO-frequency $\omega_1$ is linear proportional to the LO-leakage and to the square of the TX-leakage. The total LO-leakage is the sum of the LO-leakage at the fundamental tone and the leakage at the harmonics. The LO-leakage at the harmonics is also cross modulated and second order products will be generated from downconversion by the LO-square wave harmonics at $2(n + 1)f_{LO}$.

3 Previous IP2 enhancement techniques

A large number of publications have been made regarding second order distortion and various means to counteract it. In [14] a solution with trimming of the mixer load resistor is presented. Each of the measured samples needed individual trimming though. The mixer DC-offset is not minimized after the trimming. In [15] an improvement for a previously published IP2 calibration method for a Gilbert cell type mixer is introduced. In the previous solution the IP2 was degraded as a function of the baseband frequency when a mixer with RC load was used. The improved solution maintains a high IP2 over the entire baseband frequencies in a WCDMA receiver by also trimming the mixer load capacitors. However, in order to implement the solutions in [14] and [15] an optimum trimming code has to be detected. For the code detection test tones have to be inserted into the receiver. In [16] a theoretical mismatch analysis is given for second order distortion in both single ended and double balanced bipolar mixers. Both mismatches in the load resistors and in the switching core are considered. Tunable RC load effects on IM2 are analyzed in [17]. A high IP2 mixer design in 0.18 $\mu$m CMOS (1.8 V supply) is presented in [18]. The idea in this paper is to filter out the modulated fundamental LO-frequency in the switching pair source terminal with a LC-filter, i.e. the parasitic capacitors at the switching pair common sources are tuned out. The Q of the inductor is 10 at 2.15 GHz, i.e. a large die area is occupied especially since one inductor is required in each mixer. Good measurement results are presented but for a differential implementation. The design does not have any DC-cancellation and it is not multiband. In [19] the solution in [18] has been improved in a 90 nm process with a common mode feedback loop from the mixer output. The supply voltage is only 750 mV and the IM2-performance is still very good. A technique for canceling IM2 in the transconduction stage of an active mixer is introduced in [20]. A digital technique for tuning of the mixer core is provided in [21]. A digital adaptive calibration method without test tones is presented in [22]. In [1] a double balanced Gilbert mixer is presented using pseudorandom test signal mixer inputs for generation of optimum mixer biasing for IP2 suppression. A digital self-calibration engine most suited for passive mixers using a test tone is described in [23].
4 Detailed description of the architecture

4.1 Top-level architecture of the design

The paper describes a method to increase the second order linearity of a single ended LNA and a single ended mixer. The architecture of the design is depicted in Fig. 1. Single ended LNAs generate an RF-current to the single ended multiband feedback RF-amplifier. The compression point of the RF-amplifier and mixer is high enough that the TX-leakage signal into the LNA does not drive the receiver into compression. The four outputs from the I and Q mixer are connected to the mixer feedback blocks that operate on the differential offset at the single ended mixer output as depicted in Fig. 3 describing the mixer with feedback.

The mixer consists of a main mixer together with a trim mixer. A mismatch in Vbe of the main mixer will result in a differential DC voltage at the mixer output. Inside the feedback block the mixer signal is low pass filtered and DC feedback currents I_{trim,n} and I_{trim,p} are created that control the base voltages of the trim mixer in order to counteract the DC-offset as well as the LO-leakage at the rf input node. The second order distortion product from the trim mixer due to cross modulation is in opposite phase compared to the main mixer product resulting in a suppression of IM2 at the mixer output. The mixer with feedback requires off-chip capacitors for low-pass filtering in the feedback loop. The low pass characteristic of the feedback will result in a high pass characteristic of the mixer output. The high pass cut-off frequency of the mixer conversion gain should be as low as possible not to cause increased bit-error rate. The requirements on IIP2 and IIP3 are the hardest when the mobile is transmitting at full TX-power. When the transmitter output power is low, the IM2 products generated are very much reduced since they are proportional to the square of the TX output power. By utilizing that the output power of the PA is known to the mobile, the mixer feedback loop can then be disabled and the high pass characteristic removed.

4.2 Multiband programmable current-to-current feedback RF amplifier

The receiver consists of multiple LNA’s supplying RF-signal to a multiband feedback RF amplifier and feedback mixer. The supported WCDMA frequency bands are given in Table 1.

The multiband feedback RF amplifier is depicted in Fig. 4. The design is based on a two-stage current-to-current amplifier with programmable band dependent feedback. Under the condition that the open loop gain is high the current gain of the feedback current-to-current amplifier is given by [24].

\[ A_I = \frac{I_{RF_{OUT}} + I_{RF_{OUTQ}}}{I_{RF_{IN}}} \]  

The output signal from LNA (node RF_IN) is connected to the base of the bipolar device Q1. The collector current of the input device equals 3.7 mA. The emitter of Q1 is

<table>
<thead>
<tr>
<th>Band</th>
<th>Receive frequency (MHz)</th>
<th>Transmit frequency (MHz)</th>
<th>Duplex distance (MHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>I</td>
<td>2110–2170</td>
<td>1920–1980</td>
<td>190</td>
</tr>
<tr>
<td>II</td>
<td>1930–1990</td>
<td>1850–1910</td>
<td>80</td>
</tr>
<tr>
<td>III</td>
<td>1805–1880</td>
<td>1710–1785</td>
<td>95</td>
</tr>
<tr>
<td>V</td>
<td>869–894</td>
<td>824–849</td>
<td>45</td>
</tr>
<tr>
<td>VI</td>
<td>875–885</td>
<td>830–840</td>
<td>45</td>
</tr>
<tr>
<td>VIII</td>
<td>925–960</td>
<td>880–915</td>
<td>45</td>
</tr>
</tbody>
</table>
connected through the resistor $R_1$ to ground for bias stability purposes. Since it is desired to have a low input impedance to the amplifier a capacitor is connected across the resistor thereby creating a low impedance from the emitter of $Q_1$ for RF frequencies. The cascode is needed to increase the loop gain of the amplifier. The loop gain of the feedback amplifier is defined as

$$LG = \frac{I_{RF\_IN}}{I_{B\_Q1}}$$  \hspace{1cm} (10)

The “error current” in the feedback system is equal to the base current, $I_B$ of input transistor $Q_1$.

With the LC-tank is possible to achieve a higher loop gain compared to what would have been possible if the tank had been replaced with a purely resistive load since there is no DC-drop. The third order linearity is thereby improved [25]. The collector of $Q_2$ is connected to the programmable resonance tank consisting of an on-chip inductor, a fixed capacitor and three switched capacitors. The resonance frequency of the tank is programmable to maximize the loop gain for the selected band. Low or high band operation is controlled with the BSEL signal.

The output from the resonance tank is AC-coupled to the gates of the NMOS-devices $M_1$ and $M_2$ and also AC-coupled to the NMOS devices $M_3$ and $M_4$. When active these devices are biased with 3.13 mA each. Depending on whether the output to the mixer core should be DC- or AC-coupled, either devices $M_3$ and $M_4$ or $M_1$ and $M_2$ are turned on. For a certain TX power level a control signal should be sent to the RF amplifier that disables the NMOS devices that are AC-coupled to the output. The devices that are DC-coupled to the output are then enabled. This is the current save mode of the architecture. The DC-current in the output NMOS is reused as the mixer tail current. The feedback signal is connected from the source terminals to the base of the input bipolar device through an AC-coupling. An interferer containing AM-modulation will generate a low frequency IM$_2$ tone in the transconductance NMOS. In the AC-coupled mode the IM$_2$-tone is prevented from reaching the mixer core by the AC-coupling of the output.

The mixer will downconvert noise from the amplifier not only at the LO-frequency but also at odd harmonics of the LO-frequency. The Fourier series of the square wave LO-signal contains only odd harmonics i.e. frequencies $f_{LO}, 3f_{LO}, 5f_{LO}, \ldots (2n + 1)f_{LO}$. The contribution to the total noise figure from the higher harmonics is significant, especially from the $3f_{LO}$ frequency. To reduce the noise figure of the amplifier and mixer it is important to reduce the mixer down conversion of noise at higher harmonics. The resonance tank is tuned to the LO-frequency. Far out from the LO-frequency, i.e. for higher harmonics the resonance circuit will act as a shortcut to signal ground. The output noise will be heavily attenuated. Closer to the resonance frequency, typically for the third harmonic of the LO-frequency the amplifier still has a high open loop gain and the frequency response is determined by the feedback net. The outlined feedback net with a band dependent pole in the impedance $Z_2$ defined by the capacitor and either low band (marked LB) or high band resistors (marked HB) solves this issue. The LC-tank also improves the blocking performance of the feedback amplifier and mixer. For frequencies close to the wanted signal for which the loopgain $A_B$ is large the two-stage amplifier will behave as a regular feedback current amplifier. For frequencies far away from the resonance frequency the open loop gain of the amplifier vanishes and any possible interferer will be short circuited to VCC.

The maximum power of the WCDMA wanted signal is $-25$ dBm. In order not to compress the mixer feedback loop with the wanted signal a gain switch is required in the RF amplifier. The switch is implemented with an NMOS device across the feedback net plus an NMOS in series with a capacitor connected to node $RFIN$. The gain of the RF feedback amplifier is reduced to 0 dB and the gain is further reduced by shunting the RF input signal to ground.

### 4.3 The LNA

The LNAs are standard bipolar cascode designs with inductive degeneration matched to a 50 Ω port. In band I and III the current consumption is 4.3 mA while it is 4.9 mA in band VIII. The degeneration inductors are 650 pH in band I, 850 pH in band III and 3 nH in band VIII.
4.4 The single balanced switching mixer core

The switching mixer core consists of a main mixer (Q1 and Q2) that is DC-coupled to the LO-driver and a trim mixer (Q3 and Q4) that is AC-coupled with capacitors C1 and C2 to the LO-driver. Connected together as depicted in Fig. 5 these two mixers form a mismatch compensated mixer.

The two mixer cores share the same collector load. The load consists of the resistors R1 and R2 together with the capacitors C5, C6, C7 and C8. Capacitors C5 and C6 filter common mode signals while C7 and C8 filter differential signals. The WCDMA TX-signal that leaks into the receiver LNA through the finite isolation of the duplexer will be downconverted to an IF-frequency by the RX LO-signal. When transmitting at high power this TX-leakage signal is a strong interferer that the receiver must be able to handle without compressing. The filter at the mixer output will attenuate the IF-frequency so that the mixer feedback loop does not compress even while transmitting at maximum power. The base bias voltages of the trim mixer are determined by the output currents from the feedback loop, connected at nodes I_{trim,p} and I_{trim,n}, in resistors R3 and R4. The feedback loop will regulate these two currents so that the DC-offset at the mixer output becomes 0 V, i.e. V_{Out_BB_p} = V_{Out_bb,n}. The capacitors C3 and C4 attenuate the LO voltage swing at the output of the feedback loop. Without these capacitors unwanted modulation of the output current from the feedback circuit would occur. The resistors R5 and R6 isolate the LO-signal from the signal ground generated by capacitors C3 and C4. The input signal to the feedback loop is mixer output voltages V_{Out_BB_p} and V_{Out_bb,n}. The RF signal from the current-to-current feedback amplifier is connected in node I_{EE_main_RF}. In order to have different DC tail currents in the main and trim mixer the RF signal must be AC-coupled with capacitor C9 from node I_{EE_main_RF} to node I_{EE_trim}. For optimal operation the ratio of the tail currents I_{DC_main} and I_{DC_trim} is typically around 10. In the presented results the tail current in the main and trim mixer equals 3.3 and 0.26 mA respectively.

The mixer is not capable of suppressing any low frequency IM2-signals originating from the current-to current amplifier. The RF input signal to the trim mixer is AC coupled to the RF input of the main mixer. For the case of DC-coupling of the output from the current-to-current amplifier a low frequency IM2-signal from this stage reaches the main mixer but not the trim mixer. If mismatch is present in the main mixer the mixer feedback loop will compensate this by offsetting the trim mixer but the feed through of the IM2-component through the main mixer will still be the same.

For high IM2 suppression it is important that all component mismatches besides mismatch in the mixer core switching devices is minimized by up scaling of the device sizes as well as careful layout. If the mixer load resistors R1 and R2 are mismatched the feedback loop will also try to compensate the DC-offset generated by this mismatch. However, the loop will then create a DC-offset between the bases in the trim mixer. This will then generate a poor IM2 performance in the compensated mixer core due to Vbe mismatch but there will still also be an IM2-component originating from the load resistor mismatch. It is not possible to reduce the total IM2-distortion by compensating resistor load mismatch with Vbe mismatch in the mixer core. Due to switching speed requirements scaling up the active devices in the mixer core is only possible to some amount. The two devices in the trim mixer core, Q3 and Q4, can also be mismatched. In the case of mismatch in this part the feedback loop will also compensate the DC-voltage at the mixer output originating from this mismatch.

4.5 The mixer feedback loop

The feedback loop is acting on the DC-offset at the mixer output. There are two identical filters, one for the I-channel mixer and one for the Q-channel mixer. The filter is a two-stage design with the first stage acting as a low pass filter and the second stage operating as a transconductance that generates the feedback current to the trim mixer. Figure 6 outlines the filter architecture. The outputs from the mixer are connected to Q1 and Q2. The low pass filter has both common-mode filtering with capacitors C0 and C2 and differential mode filtering by capacitor C1. The cut-off frequency is set by the external differential capacitor C1. In order to minimize the required size of C1 the resistive load is large.

To handle the DC voltage drop across the load without forward biasing the base–collector junction of Q1 and Q2, the tail current of the stage is low (20 µA). The stage is degenerated with resistors R1 and R2 to increase the
compression point. The degeneration resistors $R_4$ and $R_5$ in the second stage have two purposes. They increase the input impedance to the second stage as well as they increase the compression point. If the input impedance to the second stage is too low the filter cut-off frequency is set by this impedance instead of the high resistive load of the first stage. The second stage tail current equals 300 $\mu$A.

The feedback loop is dimensioned to be able to handle a certain level of mismatch in the main mixer core without running into compression. Both the low pass filter stage and the transconductance stage are heavily degenerated with resistors in order to increase the compression point. The feedback loop is only active for low frequencies i.e. DC plus a few kilohertz. The performance of the feedback mixer is sensitive to mismatch in the loop devices since this will create a mixer DC-offset that the feedback loop will counteract by offsetting the trim mixer. Since the bandwidth of the feedback loop should be as low as possible there is no penalty for increased capacitance due to up-scaling of the devices in the loop to improve the matching. The loop is disabled by shorting the second stage input to a bias voltage. The pole frequency defined by capacitor C1 and resistors R9 and R10 must be designed low enough that resistor process spread does not increase the cut-off frequency above a value that can be tolerated. The biasing of the loop should be designed for temperature stability of the of the loop output currents $I_{\text{trim}_n}$, $I_{\text{trim}_p}$, $I_{EE_{\text{main}}}$ and $I_{EE_{\text{trim}}}$. Supply voltage variations do not impact the loop performance.

4.6 The resulting LO-leakage

LO-leakage at node $I_{EE_{\text{main}}_{RF}}$ together with an AM-modulated interferer, i.e. a WCDMA TX-signal will generate a second order signal in the mixer due to cross modulation. The downconverted AM-modulated LO-leakage generates a baseband frequency at the AM-modulation frequency.

If the device mismatch in the main mixer (or in the trim mixer) increases, the LO-leakage in node $I_{EE_{\text{main}}_{RF}}$ will also increase if not compensated by the mixer feedback loop.

The capability of the loop to reduce the LO-leakage as well as the IM$_2$ product can be evaluated by inserting a voltage source as $V_{be}$ mismatch between the emitter of the mixer core device $Q_1$ and node $I_{EE_{\text{main}}_{RF}}$. The mismatch will cause a DC-current imbalance as well as a LO-leakage imbalance between $Q_1$ and $Q_2$. The differential LO-signals $LO_{p}$ and $LO_{n}$ no longer cancel each other in node $I_{EE_{\text{main}}_{RF}}$. The DC-current mismatch will create a DC-voltage offset at the mixer output that the feedback loop will counteract by changing the feedback currents $I_{\text{trim}_n}$ and $I_{\text{trim}_p}$ and offsetting the trim mixer. If the LO-leakage signal from the trim mixer is added to the LO-leakage from the main mixer in node $I_{EE_{\text{main}}_{RF}}$ the level of the summed LO-leakage is strongly attenuated. The benefit of the feedback is illustrated in Fig. 7 illustrating the difference in LO-leakage level in node $I_{EE_{\text{main}}_{RF}}$ of the I-mixer versus $V_{be}$ mismatch (x-axis variable emi_mismatch) between the mixer with feedback and with the feedback disabled.

When for a comparing simulation the feedback loop was disabled, the load of the first stage in the feedback path was shorted. The simulation was made with the Spectre RF PSS tool. The LO-frequency was 2200 MHz and the interferer frequency was 2000 MHz, i.e. the fundamental frequency $f_{PSS-fund}$ was 200 MHz.

The 11:th harmonic of the fundamental frequency equals the LO frequency. The attenuation of the LO-leakage is 10 dB with the mixer loop turned on. With the loop off the $f_{LO}$-leakage current at node $I_{EE_{\text{main}}_{RF}}$ is 5 $\mu$A for 4 mV $V_{be}$ mismatch. The loop will attenuate the leakage at frequencies $(2n + 1)f_{LO}$ as well.

4.7 IM$_2$ distortion from cross modulation

The AM-modulation of the TX-interferer will be transferred on to the LO-leakage at the input of the main as well as the trim mixer through cross modulation. In linear scale the IM$_2$ cross modulation component is proportional to $V_{LO_{\text{leak}}}/I_{PSS}^3$ where $V_{LO_{\text{leak}}}$ is the LO-leakage level at the mixer input.

In case of mismatch in the main mixer, like less DC-current in the left main mixer device, the phase of the LO-leakage will be $\theta$ degrees at the input of the main mixer. Since the trim mixer will counteract the DC-offset generated by the main mixer the right device of the trim mixer will have a higher DC-current. The LO-leakage at the input of the trim mixer will therefore be at the phase $\theta + 180^\circ$. 
The generated IM$_2$$_{\text{cross}}$ collector currents from the main and trim mixer will therefore counteract each other. The IM$_2$ level and DC-offset at the I-mixer differential output versus main mixer V$_{be}$ mismatch (x-axis variable emi$_{\text{missm}}$) with and without feedback are shown in Figs. 8 and 9 respectively. The simulation setup is identical to the setup for Fig. 7. The AM-modulated TX-interferer is represented by one interferer at 2000 MHz (PSS-frequency) and another at 2000 MHz $\pm 30$ kHz (pac-frequency). Harmonic $-10$ equals the IM$_2$ product according to

$$ f_{\text{IM2}} = f_{\text{pac}} - k \cdot f_{\text{PSS-fund}} $$ (11)

The input power of both signals was $-33$ dBm at the LNA input. The suppression of IM$_2$ product as well as DC offset is working as intended. The IM$_2$ level improvement is varying with the main mixer V$_{be}$ mismatch. Smaller variations are due to simulator accuracy. The improvement is reduced for very high offset voltages. Since the tail current in the trim mixer is ten times smaller compared to the main mixer tail current a larger relative change in the trim mixer collector currents is required to compensate for a mismatch in the trim mixer. This will cause the IIP$_3$ of the trim mixer devices to be different and an IM$_2$$_{\text{cross}}$ current is generated that is not optimal for cancellation. The trim mixer tail current and device sizes can be modified for tuning of the IM$_2$$_{\text{suppression}}$.

4.8 High pass characteristic of the mixer baseband output signal

With the feedback loop on the mixer output will have a high pass characteristic, i.e. the conversion gain for RF-frequencies very close to the LO-signal is reduced compared to the gain for RF-frequencies further away from the LO-signal. This is undesired since a WCDMA signal includes low frequency modulation. The reason for this is that the cut-off frequency for the low pass filter in the feedback loop is not infinitely low. Due to the finite rise and fall time of the trim mixer there is a certain feed through of low frequency signals from the base to the collector of the mixer trim devices. The collector signal current originating from low-frequency feed through will add in opposite phase to the baseband current created from down conversion of the RF-signal in the main and trim mixer. A smaller tail current in the trim mixer results in less feed through. For a given pole location in the mixer feedback loop the high pass cut-off frequency is lowered for a smaller tail current of the trim mixer. A solution based on only the trim mixer and no main

---

Fig. 7 LO-leakage in dBV$_p$ in node $I_{\text{tail}}$ main RF in versus V$_{be}$ mismatch with and without feedback

Fig. 8 Differential IM$_2$ level in dBVp at mixer I output versus V$_{be}$ mismatch with and without feedback
mixture would require a significantly larger low pass filter capacitor. The required filter capacitor is intended to be placed off-chip, preferably on a laminate inside the package not to increase the pin-count. The required pole location for maintained bit error rate, BER, depends on the modulation scheme of the received signal. WCDMA modulation with QPSK-modulation can tolerate a pole at 7 kHz while WCDMA with 16-QAM modulation, i.e. HSDPA, requires a pole location lower than 1 kHz. The loop gain of the feedback mixer will vary with the baseband frequency. The loop gain at baseband frequency f will be the difference between the maximum conversion gain and the conversion gain at frequency f (f < f_max). For frequencies higher than the cut-off frequency of the low pass filter the loop gain approaches 0 dB.

4.9 Trim mixer and RF feedback amplifier

When the mobile is transmitting at a low power < +24 dBm the requirement on the mixer IP2 is very much relaxed. The baseband frequency response of the mixer can be made flat from DC to the cut-off frequency defined by the mixer pole by shutting down the trim mixer. Shutdown is implemented by turning off the bias to the first stage in the feedback loop. The input to the second stage is instead biased with switches connected to a voltage source as illustrated in Fig. 6. Since there is no need for low second order distortion the AC-coupling of the feedback RF amplifier transconductance stage should be turned off as well to save current. With the mixer loop on and DC-coupled RF amplifier to save current the mixer DC offset is maintained low.

4.10 Suppression of noise from the feedback loop

The trim mixer has its base terminal AC-coupled to the base terminal of the main mixer. The emitter of the trim mixer is also AC-coupled to the emitter of the main mixer. The output signal from the mixer feedback is connected to the base of the trim mixer. The architecture has the big advantage that no low frequency noise from the feedback loop can reach the bases of the main mixer. The trim mixer though has a certain feed through of low frequency noise from the feedback loop but since the tail current of the trim mixer is only a fraction of the tail current in the main mixer the contribution from the feedback loop noise to the overall noise figure is very much reduced.

4.11 The LO-driver

The LO-signal to the mixer is provided through a standard IQ-divider circuit generating clock signal with 50% duty cycle.

It is important to provide the mixer with a LO-signal with short rise and fall time otherwise the IM2-performance will degrade.

5 Simulated performance

5.1 Specifications and calculations

The design was made for a system specified to have 32 dB voltage gain from the 50 Ω LNA input to the differential output of the mixer. The half-duplex IIP3 of the LNA and mixer should be at least −9 dBm. The IIIP2 for a two-tone TX-interferer should be at least +47 dBm.

The maximum TX-output level at the antenna is +24 dBm. The corresponding output power from the PA is +26 dBm assuming 2 dB loss. The duplexer was assumed to have 52 dB isolation from TX to RX resulting in +26 − 52 = −26 dBm TX power at the LNA input. For the simulations an input power of P_T = −33 dBm was selected because the compression point of the mixer is at least −23 dBm. When simulating the IM2 product with SpectreRF PSS and PAC the small signal pac signal must be at least 10 dB below the compression point for the results to have good accuracy. If the input signal P_T = −33 dBm and IIP2 equals +47 dBm then the IM2 level at the input of the LNA is at −113 dBm using (2). In a 50 Ω-system this corresponds to −126 dBVrms.

At the mixer output the following applies with the conversion voltage gain equal to G_v.

\[ V_{\text{mix-out}} = P_{\text{IF} - \text{IM2}} + 13 + G_v \]  

Using (12), the IM2-level at the differential mixer output is at −113 −13 +32 = −94 dBVrms. The cross modulation product is calculated using (7). Calculated back to the LNA input 5 μVrms i_LO-leakage current for 4 mV mismatch at node I_{EE, main_RF} with the
feedback loop off corresponds to 34 $\mu V_{rms}$, i.e. $-76$ dBm. The average TX-leakage power equals the sum of the TX interferer sideband powers of $-33$ dBm, i.e. $-30$ dBm. Using IIP3 = $-9$ dBm, (7) gives $P_{I_{crossmod}} = -112$ dBm from cross modulation of the fundamental LO-leakage tone. Taking account for cross modulation of odd harmonics to $f_{LO}$ the total cross modulation power is even higher. Using (2) with only a second order nonlinearity accounted for, $P_{I_{LM2}} = -113$ dBm for a receiver with IIP2 = $+47$ dBm and two TX interferer sidebands at $-33$ dBm. The maximum LNA input power of the half-duplex interferer is $-46$ dBm. The largest IM3 product is created when the TX-leakage is at its maximum i.e. $-26$ dBm. Using (1) the IM3 product at the differential mixer output then equals $-78$ dBVp for IIP3 = $-9$ dBm.

5.2 Simulated performance of the LNA and RF feedback amplifier standalone

For evaluation of the LNA and RF feedback amplifier standalone the mixer load of the NMOS output transistors was replaced with 10-Ω resistors. The inductance was represented with an s-parameter model of a real on-chip inductor. The lumped model has $L = 8.4$ nH and $R_s = 16$ Ω. Looking from the cascode device $Q9$, $Q = 8.5$ in band I and 5.4 in band VIII. The current gain is defined as the sum of the NMOS drain current to the I- and Q-mixer divided by the amplifier input current. The simulation setup and plots are presented for band I. The results for band III and band VIII are presented in Table 2. From Fig. 10 the current gain in AC-coupled mode equals 6.7 dB at 2140 MHz. The attenuation to the third harmonic at 6.42 GHz is 6.3 dB. The noise figure at 2140 MHz is 1.75 dB (Fig. 11). From Fig. 12 the Band I loop gain peaks with 33.2 dB at 2.18 GHz. The 1 dB cross compression point was simulated with a PSS-analysis followed by a PAC-analysis with the RX-signal at fixed input power of $-40$ dBm. The PSS-frequency is the TX-interferer at the duplex distance. The Band I half duplex linearity of the amplifier output current was simulated for the center of the band, i.e. 2140 MHz, using the SpectreRF PSS plus PAC tool. The TX-frequency was the PAC-signal and the half-duplex interferer was at the PSS-frequency.

The IM3-level at the amplifier current output was back calculated to the LNA input using the transconductance of the LNA and feedback amplifier. The IIP3 was calculated using the relation (1). The in-band IIP3 was simulated with a SSS-frequency at 2140 MHz and a PAC signal at 2141 MHz. The performance of the RF feedback amplifier in band I, III and VIII is summarized in Table 2.

5.3 Simulation results for the RF feedback amplifier together with the feedback mixer

To shorten the time for the simulator to reach convergence, the pole in the mixer feedback is set higher than it should be in a real design. The size of the external capacitor is 6 nF. All figures are simulation results for band I. The gain and noise figure depicted in Figs. 13 and 14 was simulated with a PSS-analysis plus PXF and PNOISE analysis. For the Pnoise analysis 20 sidebands were accounted for. In AC-coupled mode with the LO frequency at 2140 MHz, $N_{FDSB} = 2.6$ dB at 1 MHz. The loop gain at a baseband frequency below the cut off frequency is the difference between the in band maximum gain and the gain at the baseband frequency. At DC the conversion gain is 12.4 dB resulting in a loop gain of 19.5 dB. The gain of the first stage in the feedback loop equals 10 dB giving a gain of 9.5 dB in the second stage of the loop. The mixer feedback loop stability is guaranteed by the both the low pass filter in the mixer feedback loop and the low pass filter at the mixer output. The feedback factor is negligible for frequencies above the mixer feedback loop cut-off frequency. For low frequencies where there is a significant loop gain the phase change of the feedback signal relative to DC is very small.

Table 2 Performance summary of the RF feedback amplifier

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Band I</th>
<th>Band III</th>
<th>Band VIII</th>
</tr>
</thead>
<tbody>
<tr>
<td>Current consumption (mA)</td>
<td>10</td>
<td>10</td>
<td>10</td>
</tr>
<tr>
<td>Transconductance of LNA and feedback amp. (I + Q) (mS)</td>
<td>280</td>
<td>306</td>
<td>284</td>
</tr>
<tr>
<td>Feedback amplifier current gain at band center (dB)</td>
<td>6.7</td>
<td>7.0</td>
<td>6.5</td>
</tr>
<tr>
<td>Attenuation of $3f_{LO}$ in DC/AC-coupled mode</td>
<td>6.9/6.3</td>
<td>8.0/7.5</td>
<td>11.6/11.2</td>
</tr>
<tr>
<td>NF (dB)</td>
<td>1.75</td>
<td>1.69</td>
<td>1.98</td>
</tr>
<tr>
<td>Loopgain (dB)</td>
<td>34.3</td>
<td>32.8</td>
<td>29.5</td>
</tr>
<tr>
<td>Cross compression point for a TX interferer (dBm)</td>
<td>$-22.1$</td>
<td>$-21.9$</td>
<td>$-23.1$</td>
</tr>
<tr>
<td>Small signal in-band IIP3 (dBm)</td>
<td>$-6.9$</td>
<td>$-7.5$</td>
<td>$-8.5$</td>
</tr>
<tr>
<td>Small signal half-duplex IIP3 including ideal LNA (dBm)</td>
<td>$-5.6$</td>
<td>$-6.0$</td>
<td>$-7.6$</td>
</tr>
<tr>
<td>Large signal half-duplex IIP3 including ideal LNA (dBm)</td>
<td>$-5.8$</td>
<td>$-6.0$</td>
<td>$-7.8$</td>
</tr>
</tbody>
</table>
For frequencies above the mixer output cut-off frequency the input signal to the loop is heavily attenuated.

The conversion gain equaled 31.9 dB at 100 kHz with high pass BW\text{3dB} = 4.0 kHz and low pass BW\text{3dB} = 6.5 MHz. In the noise summary the largest contributions comes from the divider, the RF amplifier output stage and the main mixer tail current device. The NF\text{DSB} increases below the cut-off frequency. This is due to feed through of unfiltered noise from the feedback loop through the trim mixer. In DC-coupled mode the noise figure at 1 MHz is improved due to that the main mixer current generator is off. The mixer feedback loop does not affect the mixer noise figure for frequencies above the cut-off frequency of the low pass filter. For lower frequencies the main excess noise originates from the active devices and degeneration resistors in the first stage of the low pass filter together with the base current of the main mixer transistors.

The second order distortion was simulated with a Monte Carlo analysis [26, 27] using a PSS + PAC analysis. With the Monte Carlo tool a random mismatch is applied to all devices in the design for each simulation run. The standard deviation of the mismatch distribution for each device type is determined by process data. 100 iterations were made with the mixer feedback loop both on and off to verify the effect of the loop. To be able to use the PSS-tool the duplex distance was set to 200 MHz instead of 190 MHz. With the PSS frequencies f_{LO} = 2200 MHz, f_{TX1} = 2000 MHz and the PAC frequency f_{TX2} = 2000.03 MHz the IM2 product is at 30 kHz. The input powers of f_{TX1} and f_{TX2} were at -33 dBm each. With 32 dB conversion gain the IM2 limit at the mixer output then is at -91 dBVp. With the mixer feedback loop turned on the average IM2 level in the I- and Q-channel is \(-102\) dBVp and \(-100\) dBVp which corresponds to I-channel IIP2 = \(+59\) dBm and Q-channel IIP2 = \(+56\) dBm. The average DC-offset is 104 \(\mu\)V in the
I-mixer and 110 µV in the Q-mixer. Histogram of the I-mixer IM2-levels in dBVp and DC-offset at the mixer outputs with the loop on are depicted in Fig. 15.

With the feedback loop turned off as depicted in Fig. 16 for the I-mixer more than 19 iterations fall outside −92 dBVp. The average DC-offset increases a factor 8. The worst sample with the loop off is at −84 dBVp, corresponding to IIP2 = +40 dBm. Comparing the histograms, the turned-on loop results in a distribution peak around −100 dBVp while the turned-off loop results in more evenly distributed values with a large number of iterations close to the specification limit.
The in-band IIP3 for interferers close to $f_{LO}$ with the mixer switching was simulated with a SpectreRF QPSS plus qpac analysis. The QPSS large frequency was equal to $f_{LO}$ while the first interferer was the QPSS moderate signal. The second interferer was added during the qpac analysis.

The small signal half duplex IIP3 with the mixer switching was simulated with a QPSS + QPAC analysis with interferer powers at $-40$ dBm. The QPSS moderate frequency was the half duplex interferer. With $f_{TX}$ as the QPAC frequency an in-band IM3 product at the mixer output will be generated. With the IM3 level $-79.3$ dBVp at the mixer output the band I IIP3 equals $-9.4$ dBm using relation (2). The large signal half duplex IIP3 was simulated using the input powers $P_{TX} = -26$ dBm and $P_{half-duplex} = -46$ dBm. The switching feedback mixer targeted for high IIP2 has an impact on the overall IIP3. It is therefore important that the RF amplifier preceding the mixer has high enough third order linearity that the additional mixer nonlinearity can be accepted.

The cross compression was simulated with a QPSS + QPXF analysis with the TX signal as the QPSS moderate tone.

The performance summary for band I, III and VIII is provided in Table 3.

6 Conclusions

The benefit of the presented architecture is that it provides a multiband single ended LNA and single ended mixer with high enough second and third order linearity that it is functional in a WCDMA system without a SAW-filter. The DC-offset at the mixer output is strongly attenuated which is beneficial for the following stages, i.e. baseband filter and ADC. The attenuation of the DC-offset and the increase of the IP2 were achieved by the described method to reduce the effect of the switching mixer core device mismatch using a feedback loop. The third order nonlinearity performance as well as the low noise figure was possible to achieve through the programmable RF feedback current-to-current amplifier preceding the mixer. The LNA is single ended which is especially beneficial for a multiband solution since only one package pin is required for each band. Since the programmable feedback RF amplifier is multiband only one low-Q on-chip inductor is needed. The required filter capacitor is preferably placed inside the package. The architecture was designed in a BiCMOS...
References

Transactions on Microwave Theory and Techniques, 51(5), 1610–1612.


Tobias Tired received the M.Sc. degree in Engineering Physics from Lund Institute of Technology, Sweden in 1992. From 1993 to 1996 he was with Ericsson Microelectronics in Stockholm working with microelectronic process technology for fixed telephony. In 1996 he joined Ericsson in Lund for a position within the development of radio frequency analog integrated circuits. Currently he is with ST-Ericsson developing a 90 nm CMOS radio. His research interests are within the development of new low power linear frontend architectures. He is currently pursuing the degree of Technology licentiate. He holds two patents within the field of linear integrated LNA and mixers.