Sub-VT Design of a Wake-up Receiver Back-end in 65 nm CMOS

Seyed Mazloum, Nafiseh; Rodrigues, Joachim; Edfors, Ove

Published in:
2012 IEEE Subthreshold Microelectronics Conference (SubVT)

DOI:
10.1109/SubVT.2012.6404304

2012

Link to publication

Citation for published version (APA):

General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal

Take down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
Abstract—In sensor network applications, the use of duty-cycled ultra-low power wake-up receivers can significantly reduce overall power consumption. An important complement to previous investigations is to show that low-power wake-up receivers with good enough detection performance can be realized in hardware. In this paper we address this very issue by presenting the design, implementation, and sub-$V_T$ characterization of a digital back-end for such an ultra-low power WRx.

I. INTRODUCTION

Power consumption is a major design constraint in wireless sensor networks (WSNs). The node size in WSNs is very small and often the batteries cannot be replaced, which means that the energy resources are severely limited. To design an ultra low-power communication system it is crucial to combine a low-power transceiver circuit with an optimized communication protocol. Idle channel listening is the dominant source of energy waste in the WSNs. One common approach to reduce the energy cost is to send the node to sleep when the channel is idle, and activate the node periodically for potential communication [1]–[3]. This approach, however, introduces delay to the system as the transmitting node needs to wait for a periodic wake-up of the receiver before communication can take place.

Another approach is to employ an extra ultra-low power wake-up receiver (WRx) [4], [5]. Here, the low power/low performance WRxs is always on to monitor the channel continuously. Whenever the WRx detects a wake-up signal, the more power hungry main transceiver is powered up to take care of data transmission. With an always-on WRx there is no need for nodes to synchronize the communication to a periodic wake-up and the delay reduces accordingly, as compared to the aforementioned approach. The energy consumption per packet, however, becomes high in scenarios with rare data packets.

In [6], [7] we introduce a Duty-Cycled Wake-up receiver Medium Access Control protocol (DCW-MAC), where the two approaches above are combined, i.e., a scheme where low-power WRxs are combined with periodic channel-listening. Similar to the first approach, the WRx is switched on periodically to monitor the channel for a certain time period, and similar to the second approach, the WRx is designed for low-power operation, well below the power consumption of the main transceiver. We have shown previously that with optimized sleep intervals, duty-cycled low-power WRx schemes can outperform the other schemes both with and without delay requirements [6]. It is therefore of interest to further pursue this type of WRx schemes by looking at low-power hardware implementations.

Studies found in the literature on WRx hardware design mainly focus on the RF front-end architecture [8], [9] and often address the always-on WRx scheme [10]. This work presents the design, implementation and sub-$V_T$ realization of a digital back-end for a duty-cycled WRx. Considering that the presented structure is new, comparisons with the state of art become conceptually difficult. We investigate different detection thresholds for WB detection and show that both high detection performance and low probability of false alarms can be achieved with the chosen WRx structure.

This paper is organized as follows. First we give a description of the overall operation of the addressed system in Section II. The structure of the WB is detailed and we present the architecture chosen for the digital back-end of the WRx. In Section III we both analyze the WB detection performance by simulations and present the sub-$V_T$ characterization of the matched filters needed to perform preamble and address detection. Finally, in Section IV, we draw conclusions from the performed study, which shows that sub-$V_T$-designed matched filters are only expected to contribute to a minor part of the total energy consumption of the entire WRx chain at realistic performance levels.

II. SYSTEM DESCRIPTION

In our reference system, nodes communicate according to the addressed DCW-MAC scheme. Whenever data is available for transmission, periodic WBs are transmitted ahead of the data packet. The WRxs of non-transmitting nodes are switched on periodically, in an asynchronous way, to monitor the channel for a certain time period. In an ideal case, if the transmission of the WB coincides with the listening time of the destined WRx, the WRx detects the WB and powers up a high performance main receiver. With asynchronous communication the channel-listening interval is selected in a way that guarantees the WRx can hear one full WB. This means that the choice of the WB influences the channel-listening interval and consequently the optimal sleep time and energy consumption of the system. Moreover, in a realistic scenario, there will be a certain probability that the transmitted WB is missed by the WRx or the WRx erroneously detects a non-existing WB, both leading to unnecessary energy consumption. This makes the false alarm (FA) and detection (D) probabilities important.

The reference WRx, as shown in Fig. 1 contains a low-power/-performance analog front-end, a mixed-mode detector (A/D converter) and a sub-$V_T$ digital back-end. We employ on-off keying (OOK) of the WB to allow a simple low-power analog front-end design. This means that the bits in the WB are detectable using a simple non-coherent signal energy detector. The detected bits are then fed to the digital back-end. The detection performance vs. power consumption of the WRx is important for the total power consumption of the entire sensor network in a non-trivial way, where detection errors result in unnecessary and energy expensive power-up of circuitry. The digital back-end is a very important part of this and we
investigate how its power consumption depends on required performance and clock rate. In the following the structure of the WB is detailed and we propose an architecture for the digital back-end design.

A. Wake-up Beacon Packet Structure

A WB packet consists of a preamble, a destination node address and a source node address. Figure 2 shows a general structure of the WB. The preamble \( \{p_0...p_{m-1}\} \) is an m-bit sequence used to determine the starting point of a WB. The n-bit destination node address \( \{d_0...d_{n-1}\} \) is used to avoid energy cost due to overhearing, i.e., only the node where the packet is addressed should be activated. The n-bit source node address information \( \{s_0...s_{n-1}\} \) is needed if the WRx detects a WB. This information is used in the destination address field of the acknowledgement packet transmitted back. The source node and the destination node address fields carry L-bit source and destination node identities, respectively, and we are able to address a network having up to \( 2^{2L} \) nodes. Each bit in the identity sequence is spread by a k-bit spreading code to compensate for the low performance of the analogue front-end. Moreover, the ultimate goal is to design a WRx with a low probability of false alarm and a high probability of detection. Therefore we select the preamble and spreading sequences from the family of the PN-sequences with high auto- and low cross-correlation properties.

B. Digital Back-end Architecture

The overall operation of the proposed digital back-end is illustrated in Fig. 3. The digital back-end consists of two main elements: a matched filter and a threshold unit. The detection of a WB is performed in two stages. First the detected bits are fed to a filter matched to the preamble sequence. Whenever the output of the first matched filter exceeds a certain threshold we assume that the preamble is detected. The output of this stage then triggers the second stage of the digital back-end. In the second stage the identity/address of the destination node is decoded. The receiving bits are matched to the known sequence, if the matched filter output of the second stage is above a certain level the WB is declared to be present.

The proposed structure for the digital back-end has the advantage that it operates on bit-level, i.e., the matched filter uses one-bit coefficients. This has the advantage that only four different gates are required for hardware mapping, see Fig. 4. Moreover, summation of the tap branches is realized by a fully balanced adder tree, which keeps the idle time of the gates low, and thus reduces energy dissipation due to leakage.

**Matched Filter Implementation:** The matched filter is directly mapped as an finite impulse response (FIR) filter. The WB sequence is fed to registers during the initialization phase. Correlation of the input signal with the WB sequence is performed by an XNOR, as depicted in Fig. 4. Each filter tap is connected to a fully balanced adder tree. The number of adder slices is proportional to \( \log_2(N) - 1 \) where \( N \) is the number of filter taps (length of the WB sequence). The filter is realized by deploying a 65 nm low-power high-threshold (LP-HVT) CMOS technology. According to simulation results, the threshold \( V_T \) of the used technology is around 650 mV. The area cost scales almost proportional to the number of taps and ranges from 4100 to 15100 \( \text{um}^2 \), for 64 taps and 256 taps, respectively.

III. RESULTS

Both system performance, in terms of detection and false alarm probabilities, and the power/energy consumption of the matched filter implementations are of importance to the overall evaluation of the proposed WRx structure. This section discusses both system performance simulations and Sub-V\( T \) characterization of the matched filter.

A. System Performance Simulation

We evaluate the detection performance of the digital back-end architecture in terms of the probability of detection \( P_D \) and false alarm \( P_{FA} \). The behavior of the signal chain for a WRx consisting of an analogue front-end, A/D and the proposed digital back-end is simulated for an Additive White Gaussian Noise (AWGN) channel. The raw bit error probability of the OOK is set to 0.15 in the simulations. This is based on performance simulations of an analog front-end operating at -10 dB SNR. The listening interval is chosen to be twice as long as the WB, as discussed in Sec. II. Maximum-length shift register sequences (m-sequences) are used to generate the preamble and spreading codes. We consider a network of 256 nodes and thus the WB address sequence \( (L) \) is 8-bit long. The length of the preamble and the spreading code determines the peak of correlators/matched filters. The decision level in the threshold units determines the probabilities of false alarm and detection. A high threshold in Threshold unit I results in a lower probability of false alarms but a higher probability of misses for the preamble. This results in a higher energy cost
of the transmitting node. Contrary, a low threshold results in a high probability of detecting a preamble erroneously which points to a wrong starting position. Assuming that the starting point of the WB (preamble) is detected, a high threshold increases the probability of an address detection resulting in higher probability of a WB detection. The probability of detecting a WB carrying the address of other nodes increases by lowering the decision level in Threshold unit II. In this work, the decision level in Threshold unit II is fixed, i.e., selected as the midpoint between the expected output for the correct node address and the output of a nearest neighbor node in the sense that its address only differ by one bit. In the simulations we change decision level in Threshold unit I to illustrate the behavior of the WRx for different choices. Figure 5 presents the receiver operating characteristics (ROC) for a WB with a preamble of length 63 and address spreading of length 15. In this figure $P_D$ is the probability of detecting a WB carrying the correct node address during the first listening interval after the periodic WB transmission started. Correspondingly, $P_{FA}$ is the probability of detecting a WB addressed to another node. As it is shown in the figure, we get the best $P_D$ of 0.9 at the decision threshold $\gamma_1 = 0.47$. The 0.9 detection probability and $0.003$ false alarm achieved, with the selected sequence length, are adequate for the proper operation of the DCW-MAC scheme. Increasing the threshold level reduces the probability of false alarms, but also has the unwanted effect of lowering the probability of detections. Lowering the threshold increases the probability of erroneously detecting WB preambles, which in turn results both low probability of (correct) preamble detection and low false alarm probabilities. Both are direct results of an increasingly random position estimation of the WB starting point for low thresholds.

**B. Sub-$V_T$ Characterization**

The sub-$V_T$ characterization of a various matched filters is presented in Fig. 6 in terms of energy vs. clock frequency. The characterization is performed as described in [11]. It is observed that at energy minimum voltage (EMV), dependent on the number of taps, the clock frequency ranges from 20 to 80kHz for 256 and 64 taps, respectively. However, all investigated filters can be operated up to 1 MHz at 400mV. At EMV the energy dissipation per clock cycle is estimated as 0.35 and 0.06pJ for the 256 and 64 tap filter, respectively. At EMV the contribution from dynamic and leakage energy are equal.

**Fig. 5.** Receiver operating characteristics, for a WB with a preamble of length 63 and address spreading of length 15.

**Fig. 6.** Characteristics of matched filters with increasing number of taps, energy per clock cycle at critical path speed.

**C. Discussion**

In the proposed back-end structure only one matched filter will be active while the other will be in idle mode, e.g., when the first filter monitors the channel for a preamble the second filter will be idle, and vice versa. The total energy cost is the sum of the energy of the active filter and leakage energy from the inactive filter. A better detection performance of the back-end may be desirable, as this has the potential to further reduce the power consumption of the entire system. This can be achievable by increasing the size of the preamble and/or the spreading code. This will lead to longer delays in the system, but considering 10uW as the target WRx power consumption the higher power consumption of the back-end is negligible.

**IV. Conclusions**

This paper proposes an architecture for a digital back-end of a duty-cycled WRx. The detection performance vs. the power consumption of the digital back-end is investigated for the presented WB structure. It is shown that with a highly sufficient level of detection performance, i.e., high $P_D$ and low $P_{FA}$ the total power consumption of the digital back-end is negligible in the total power budget.

**References**


