Area and Power Reduction in DFT Based Channel Estimators for OFDM Systems

Stala, Michal; Gangarajaiah, Rakesh; Edfors, Ove; Öwall, Viktor

2013

Link to publication

Citation for published version (APA):

Total number of authors:
4

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

General rights
Unless other specific re-use rights are stated the following general rights apply:

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal

Read more about Creative commons licenses: https://creativecommons.org/licenses/

Take down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
Area and Power Reduction in DFT Based Channel Estimators for OFDM Systems

Michal Stala, Rakesh Gangarajaiah, Ove Edfors, Viktor Öwall
Lund University
Email: Michal.Stala, Rakesh.Gangarajaiah, Ove.Edfors, Viktor.Owall@eit.lth.se

Abstract—This paper presents a new Hardware (HW) implementation proposal for Discrete Fourier Transform (DFT) based channel estimators. The presented algorithm uses the high time correlation property of the channel estimates to reduce the complexity and the power consumption by utilizing a lower number of bits for the FFT in the channel estimator, compared to a traditional approach. The idea is that the channel estimator processes the difference between channel estimates from two Orthogonal Frequency Division Multiplexing (OFDM) symbols. The paper shows that the resulting HW could be reduced by 30 percent for logic and 15 percent for memory without performance loss in an Long Term Evolution (LTE) channel with up to 300Hz Doppler. The algorithm has been tested in realistic environments with 3GPP channel models.

I. INTRODUCTION

For several years, OFDM has been one of the most popular techniques used for high speed wireless communication. Standards such as Wireless Local Area Network (WLAN), LTE, Digital Audio Broadcasting (DAB) and Digital Video Broadcasting (DVB) use OFDM based solutions for the physical layer. A popular research topic within the OFDM area has been, and still is, research in the channel estimation field. A subset of that field is DFT based channel estimators [1]. DFT based estimators filter the channel estimates in the time domain and involves computational blocks such as Inverse Fast Fourier Transform (IFFT) and Fast Fourier Transform (FFT). FFT blocks [2] could under certain conditions, where high requirements are demanded, become very large, due to the number of operations (mainly multiplications and additions), and consume an increased amount of power. This paper shows a method of reducing the area and power consumption of the FFT block in DFT based channel estimators.

A brief introduction to DFT based channel estimators is given in Section II and is followed by Section III which describes the proposed estimator. Section IV describes the performance of the proposed algorithm compared to a standard implementation and it is followed by Section V where reduction in complexity/area for the proposed idea is analyzed. Finally, Section VI describes the conclusions.

II. BACKGROUND

A typical OFDM receiver [3], including a DFT based channel estimator, is depicted in Fig. 1. Initial channel information obtained through the reference signals, often referred to as pilots (not explicitly shown in figure). The channel estimates, \( \hat{H} \), are extracted from the OFDM symbols and fed to the channel estimator. Interpolation in frequency might be needed if the pilot pattern is not filling up a whole OFDM symbol, i.e. pilots are not present on all sub-carriers. In this paper we target LTE [4] [5] and interpolation is required since the pilots are not present on all sub-carriers. The IFFT converts the \( \hat{H} \) to time domain. The function \( F \) could be implemented in different ways [1], and in the current implementation it is simply removing/zeroing samples and keeping only the samples corresponding to the assumed maximum channel length, i.e. the length of the cyclic prefix. The remaining data of the channel estimates is converted back to the frequency domain and can in a later stage be used for equalization.

Fig. 1. OFDM receiver system with DFT based channel estimator

A. Limitations with the standard DFT solution

The DFT based approach in [1] requires one IFFT and one FFT to perform the necessary operations. These operations require substantial area and power budgets especially in a system such as LTE, where there are up to 1200 sub-carriers. The proposed method aims at reducing the area and power consumption of the FFT in the DFT based channel estimator by processing the difference between channel estimates from two OFDM symbols.

B. Fading channels

Channels of wireless systems are modeled through statistical models, most often with Rayleigh fading channel models. LTE has several pre-defined models for performance analysis [6]. One pedestrian model (EPA), two vehicular models (EVA) and two typical urban models (ETU) are defined for the LTE standard. There are several different settings for the Doppler
effect, ranging from 5Hz to 300Hz Doppler. The proposed method has most advantages when the Doppler is small, i.e.
the channel is not varying too much over time. A detailed description of the algorithm is presented in the following section.

III. BASIC IDEA

The channel estimates between two OFDM symbols in time experience only small changes in a scenario where the channel is not varying to a large extent, i.e. the Doppler frequency is low. This results in the fact that the channel estimator is performing computations on data that is strongly correlated during consecutive computations. Fig. 2 is showing channel estimates during two time instances, corresponding to one LTE sub-frame (1ms) for an EVA 70Hz LTE channel. It can be observed that the difference between the two channel estimates in frequency, $Re\{\hat{H}_{\text{diff}}\}$, has not changed to a large extent, even in a channel with 70Hz Doppler. The proposed method use this property to reduce the area and power consumption.

![Channel estimates (real)](image)

Fig. 2. Channel estimates for two consecutive channel estimations for an EVA 70Hz channel with no noise

The channel data to the FFT is the difference, $\hat{h}_{\text{diff}}$, between the current and the previous channel estimates in the proposed method, see Fig. 3. The magnitude of the difference, $\hat{h}_{\text{diff}}$, is expected to be low when the Doppler frequency is low. Thus the number of bits for $\hat{h}_{\text{diff}}$, can be reduced to a large extent if the Doppler is low, and as a consequence, the input to the FFT algorithm will have a reduced word length compared to the standard DFT based channel estimator.

![The proposed algorithm](image)

Fig. 3. The proposed algorithm

IV. PERFORMANCE

To evaluate the system, channel models described in Section II-B, have been used to validate the proposed method. Different settings for the quantization have been used to show how much word length reduction is allowed for different channel models with different Doppler frequencies. Different placements of the decimal point/fractional length has also been evaluated and Table I illustrates different placement possibilities. Plots with perfect channel state information (CSI) are included as a reference and all simulation plots show results using an uncoded channel.

<table>
<thead>
<tr>
<th>Fractional length:</th>
<th>Valid bits (X)</th>
<th>Max value</th>
<th>Min value</th>
</tr>
</thead>
<tbody>
<tr>
<td>4</td>
<td>S.XXXX</td>
<td>0.9375</td>
<td>-1</td>
</tr>
<tr>
<td>5</td>
<td>0.SXXXX</td>
<td>0.4688</td>
<td>-0.5</td>
</tr>
<tr>
<td>6</td>
<td>0.0SXXXX</td>
<td>0.2344</td>
<td>-0.25</td>
</tr>
<tr>
<td>7</td>
<td>0.00SXXXX</td>
<td>0.1172</td>
<td>-0.125</td>
</tr>
<tr>
<td>8</td>
<td>0.000SXXXX</td>
<td>0.0586</td>
<td>-0.0625</td>
</tr>
<tr>
<td>9</td>
<td>0.0000SXXXX</td>
<td>0.0293</td>
<td>-0.0312</td>
</tr>
</tbody>
</table>

Fig. 4. EVA 70Hz floating point

Fig. 4 is used as a reference to show that the concept is valid in floating point simulations using QPSK modulation. Fig. 5 shows that the proposed algorithm can run with 3 bits and still maintain reasonable performance in an EPA 5Hz scenario using QPSK modulation, close to standard floating point solution. Fig. 6 shows that a reduction to 4 bits is not enough to maintain reasonable performance in the "DFT diff 9" (proposed algorithm with fractional length 9, see Table I) case and that 5 bits is desirable for an EVA 70Hz scenario using QPSK modulation. The performance in a 5 bit "DFT diff 9" scenario is equivalent to an 8 bit standard DFT scenario and slightly better than the 7 bit. As a conclusion, the proposed algorithm needs 2 or 3 bits less, compared to the standard method, when simulating EPA 5Hz or EVA 70Hz LTE channel models.
The most challenging LTE channel for the algorithm is the one with the highest Doppler (ETU 300Hz). Fig. 7 shows different simulations done with the ETU 300Hz LTE channel. Two of the plots, "DFT diff 7" and "DFT diff 9" shows the current method with 5 bits but with different decimal point settings. The "DFT diff 7" has a fractional length of 7 and wordwidth of 5 bits where the most significant bit is reserved for the sign bits, see Table I. The "DFT diff 9" has a fractional length of 9. The "DFT diff 9" setting for the fractional length has been used in the EV A and EPA simulations. The plot shows that this setting does not give satisfactory results in the ETU 300Hz case. The "DFT diff 7" setting is proposed for the ETU 300Hz case instead. The resulting HW must have support for sensing the doppler frequency and adjusting the decimal point/fractional length dynamically in order to support both high and low doppler scenarios. This would require some additional HW, both for the Doppler sensing and the dynamic shift.

V. Complexity analysis

There are different architectures for FFT implementation that require different number of adders, multipliers and memory elements. The proposed method reduces the sizes of the internal FFT components since the word length is reduced. The advantages with the current method are most apparent in FFT architectures where the multipliers dominate in area since they are assumed to scale quadratically with the word length while adders and memory are assumed to scale linearly. There are some penalties that need to be taken into account. The proposed algorithm requires three additional adder steps, as shown in Fig. 3. Further, some extra memory is required to store the data from the previous time instance.

The proposed algorithm is more advantageous in scenarios where the Doppler frequency is low, 3 bit precision is enough for the EPA 5Hz case. One approach is to only consider systems where the Doppler is assumed to be low, for example wireless routers or other stationary devices. Both area and power consumption could be reduced in such a scenario compared to a standard DFT approach.

The paper shows that dynamic scaling of the data could enable support for both high and low Doppler scenarios without loosing performance with just a small penalty in HW complexity.

A. Streaming/pipelined architecture

A suitable architecture for OFDM systems is a streaming/pipelined architecture [7] [2], where data is assumed to arrive in consecutive order, and not in blocks of data. A pipelined FFT architecture is suitable for these types of systems, see Fig. 2. There are $\log_2(N)$ stages in a pipelined FFT, where $N$ is the FFT size. The required number of memory elements is $N - 1$ number of words. Each radix-2 butterfly unit consists of 1 complex multiplier and 2 complex adders. Each complex multiplier consists of 3 real multipliers and 5 real adders (an alternative would be 4 real multipliers and 2 adders). Each complex adder consists of 2 real adders. The required HW for an FFT of size $N$ is presented in Table II.
The size of the FFT in an LTE system is 2048. Applying $N = 2048$ gives 33 multipliers, 110 adders and 2047 words of memory elements, as shown in the last column of Table II. The penalty for the proposed method would be 3 extra adders and one extra memory (the memory for $H_{DFT}$ in Fig. 1 and Fig. 2 is assumed to be required for interpolation in time, both in the standard and proposed method, and will not contribute to the penalty). The extra memory would correspond to the number of words in a Cyclic Prefix, which in LTE is 144 (or 160, only in the first OFDM symbol). The penalty for memory is vary small considering the total number of memory elements (2048) in the FFT.

Table III and Table IV show different sizes of the expected area of the two methods. The adders and multipliers are scaled to the total amount of 8 bit adders needed in the standard DFT case. The figures for memory are also scaled, using the total memory size of the standard 8 bit case as a reference. Thus, the memory sizes should only be compared to other memory figures and not with the adders and multipliers. The total logic area refers to the sum of the area of the adders and multipliers, any control logic or other components are not included. A quick analysis, see Table III and Table IV, show that area gain could be obtained with even a small reductions of bits, assuming linear reduction for adders, memory and quadratic for multipliers. Even a small reduction of one bit, for example, 8 to 7 bits, would reduce the total logic area by about 18 percent. The numbers for the adders and the multipliers in these tables has been extracted from a 65nm technology library from ST Microelectronics.

Table III and Table IV show different sizes of the expected area of the two methods. The adders and multipliers are scaled to the total amount of 8 bit adders needed in the standard DFT case. The figures for memory are also scaled, using the total memory size of the standard 8 bit case as a reference. Thus, the memory sizes should only be compared to other memory figures and not with the adders and multipliers. The total logic area refers to the sum of the area of the adders and multipliers, any control logic or other components are not included. A quick analysis, see Table III and Table IV, show that area gain could be obtained with even a small reductions of bits, assuming linear reduction for adders, memory and quadratic for multipliers. Even a small reduction of one bit, for example, 8 to 7 bits, would reduce the total logic area by about 18 percent. The numbers for the adders and the multipliers in these tables has been extracted from a 65nm technology library from ST Microelectronics.

Simulations show that a reduction of two bits gives satisfactory results in the EVA 70Hz case and the ETU 300Hz case when using dynamic shift. It is required that the internal calculations of the FFT are done with higher precision than the input [2] which is true in both the proposed and the standard approach. Using the assumption that two extra bits are required internally results in 7 bits for the proposed algorithm and 9 bits for the standard algorithm. An area reduction of about 30 percent, for logic, and a memory area reduction of about 15 percent can be gained using values from Table III (9 bits) and Table IV (7 bits).

A detailed investigation regarding the area and power reduction needs to be conducted. The penalty of the extra hardware, the adders, the dynamic shift and other HW such as control logic, and the reduction in the FFT size needs to be investigated in a real implementation.

VI. CONCLUSION

This paper has proposed a method with maintained performance in LTE channels with up to 300Hz Doppler frequency with a word width reduction to 5 bits compared to 7 bits in the standard DFT implementation. A reduction by two bits in the FFT implementation reduces the area of the adders and multipliers of 30 percent and the memory by 15 percent.

REFERENCES