# A Low Power Correlation for IEEE 802.16 OFDM Synchronization On FPGA Using FFT

S.Hari Krishnan, Hemavathi.H, Anjali S Nair

Abstract— This paper represents to develop a new Radix Based FFT Algorithm for the analysis of Multiband OFDM and their FPGA parameters calculation, in order to reduce the complexity for the efficient operation of multiband -OFDM. This brief compares the use of multiplier less and DSP slice-based cross-correlation for IEEE 802.16d orthogonal frequency division multiplexing (OFDM) timing synchronization on Xilinx Virtex6 and Spartan-6 field programmable gate arrays (FPGAs). The natural approach, given the availability of embedded DSP blocks on these FPGAs, would be to implement standard multiplier-based cross-correlation. However, this can consume a significant number of DSP blocks, which may not fit on low-power devices. Hence, we compare a DSP48E1 slice based design to four different quantization of multiplier less correlation in terms of resource utilization and power consumption. OFDM timing synchronization accuracy is evaluated for each system at different signal to-noise ratios. Results show that even relatively coarse multiplier less co-efficient quantization can yield accurate timing synchronization, and does so at high clock speeds. Multiplier less designs enjoy reduced power consumption over the DSP48E1 Slice-based design.

Index Terms—OFDM, FPGA, power consumption, Multiplier, FFT, XILINX tool.

# I. INTRODUCTION

In general, the emphasis in VLSI design has shifted from high speed to low power due to the proliferation of portable electronic systems. Many of the techniques have already been used in low power design with additional techniques emerging continuously at all levels.

The implementations of many Digital signal processing, Digital image processing algorithms consumes more power. The FPGA contain more number of Dsp blocks, which are not fit on low power devices so the Multiplierless correlators are used for reducing power consumption.

This brief compares the use of multiplier less and DSP slice-based cross-correlation for IEEE 802.16d orthogonal frequency division multiplexing (OFDM) timing synchronization on Xilinx Virtex-6 and Spartan-6 field programmable gate arrays (FPGAs). The natural approach, given the availability of embedded DSP blocks on these FPGAs, would be to implement standard multiplier-based cross-correlation. However, this can consume a significant number of DSP blocks, which may not fit on low-power

**S.Hari Krishnan**, Assistant professor & Head-ECE Sanskrithi School of Engineering Puttaparthi

**Hemavathi.H**, Assistant professor, Sanskrithi School of Engineering Puttaparthi

Anjali S Nair, Assistant professor, Sanskrithi School of Engineering Puttaparthi

7

devices. Hence, we compare a DSP48E1 slice-based design to four different quantization of multiplier less correlation in terms of resource utilization and power consumption.

OFDM timing synchronization accuracy is evaluated for each system at different signal-to-noise ratios. Multiplier less designs enjoy reduced power consumption over the DSP48E1 Slice-based design, and can be used where DSP Slice resources are insufficient, such as on low-power FPGA devices.

## II. INTRODUCTION TO OFDM AND ITS PRINCIPLES

OFDM splits a data-bearing radio signal into multiple smaller signal sets and modulates each onto a different subcarrier, transmitting them simultaneously at different frequencies, by using a number of parallel subcarriers spaced orthogonally as closely as possible in frequency without overlapping or interfering. OFDM is an attractive modulation scheme used in broadband wireless systems that encounter large delay spreads. OFDM avoids temporal equalization altogether, using a cyclic prefix technique with a small penalty in channel capacity. Where LoS cannot be achieved, there is likely to be significant multipath dispersion, which could limit the maximum data rate. Technologies like OFDM are probably best placed to overcome these, allowing nearly arbitrary data rates on dispersive channels.



Fig.1.Bandwidth Divided in to N sub channels.

For each subcarrier a rectangular pulse shaping is applied. The guard interval or cyclic extension is added to the subcarrier signal in order to avoid Inter- Symbol Interference (ISI), which occurs in multipath channels. At each Receiver the cyclic prefix is removed and only the time interval [0, Ts] is evaluated. The total OFDM block duration is T = Ts + Tg. Orthogonal Frequency Division Multiplexing (OFDM) is a multi-carrier transmission technique, which divides the available spectrum into many carriers, each one being modulated by a low rate data stream. OFDM is similar to FDMA in that the multiple user access is achieved by subdividing the available bandwidth into multiple channels that are then allocated to users. However, OFDM uses the spectrum much more efficiently by spacing the channels much closer together. This is achieved by making all the carriers Orthogonality to one another, preventing interference between the closely spaced carriers.



www.ijntr.org

As fore mentioned, OFDM is a special form of multi carrier modulation (MCM) and the OFDM time domain waveforms are chosen such that mutual Orthogonality is ensured even though sub carrier spectra may over lap.

With respect to OFDM, it can be stated that Orthogonality is an implication of a definite and fixed relationship between all carriers in the collection. It means that each carrier is positioned such that it occurs at the zero energy frequency point of all other carriers. The sinc function, illustrated in fig exhibits this property and it is used as a carrier in an OFDM system.



Fig.2.OFDM Subcarrier in Frequency Domain

## III. OFDM TRANSCEIVER

The block diagram of an OFDM Transceiver is shown. The input bits stream are grouped into parallel format in order to be mapped into M-QAM or M-PSK constellation (Example: 2 bits for QPSK, 4 bits for 16-QAM etc...). The output of the mapping is a complex number that locates the sample on the I-Q constellation. The Inverse Fourier Transform (IFFT) block takes N samples and performs the inverse Fourier transform, the result is N-Points time domain signal that sums up N subcarriers. Those N point samples are serialised and converted to analogue format to be up-converted and sent through the channel. At the receiver, the inverse process is applied, after down-conversion the signal is converted to digital format. Those points are framed in the same order as the ones sent; in order to apply an N point Fourier transform to recover the N subcarriers. Those subcarriers are demodulated by using the corresponding de-mapping scheme to get the correct samples, the samples are serialised back to get the bits stream initially transmitted. Simulink modelling environment using a Xilinx specific block-set. All of the downstream FPGA implementation steps including synthesis and place and route are automatically performed to generate an FPGA programming file.



Fig. 3. Block diagram of OFDM transceiver

### IV. SIMULATIONS AND DISCUSION

#### POWER CONUMPTION TABLE AT 50MHZ

| Correlator | Quiscent |    | Dynamic |     | Total |     |
|------------|----------|----|---------|-----|-------|-----|
| S          | (mW)     |    | (mW)    |     | (mW)  |     |
|            | V6       | S  | V6      | S6  | V6    | S6  |
|            |          | 6  |         |     |       |     |
| DSPc       | 1312     | -  | 846     | -   | 2158  | -   |
| DSPp       | 1300     | -  | 328     | -   | 1628  | -   |
| ML 1       | 1296     | 67 | 133     | 149 | 1429  | 216 |
| ML 2       | 1296     | 68 | 160     | 197 | 1456  | 265 |
| ML 3       | 1297     | 70 | 182     | 239 | 1479  | 309 |
| ML 4       | 1297     | 71 | 203     | 294 | 1500  | 365 |

In order to validate our designs at the application level, we simulate them using ModelSim with an IEEE 802.16 OFDM frame. The designs presented were synthesized and fully imple-mented using Xilinx ISE 13.2, targeting Xilinx Virtex-6 (V6) and Spartan-6 (S6) devices. The results of implementation are reported in terms of the number of occupied slices, DSP48E1 Slices, and the maximum frequency, and are summarized in Table I.

DSPc and DSPp are correlator designs using DSP Slices in non-pipelined and pipelined structures, respectively. ML1, ML2, ML3, and ML4 are multiplierless correlators with coefficient quantizations of 1, 0.5, 0.25, and 0.125, respectively.

Table I reveals that the DSPp uses more logic slices because of its pipeline structure. The slices in DSP48E1-based designs are usedfor registers and route-thrus, while the slices in the multiplierless designs are mostly used as logic. The number of slices used in the multiplierless designs increases the coefficient quantization becomes finer. The DSP48E1-based designs use 256 DSP Slices, 4 for each complex multiply plus 6%-9% of logic resources. The multiplierless designs use only logic to compute the cross-correlation with 64 complex coefficients. The total logic area is a small fraction of the whole device: around 5%-12% of total resources in the Virtex-6, and around 6%–13% of total resources in the equivalent Spartan-6. While Spartan-6 devices do include DSP Slices, their number is insufficient to implement the full 64-sample complex cross-correlation. This shows an ideal scenario where multiplierless correlation makes sense, and hence the motivation for this brief.



www.ijntr.org

8

The maximum frequencies, reported after place and route, decrease for multiplierless designs according to the degree of coefficient quantization. Meanwhile the nonpipelined DSP48E1 design is slower than the multiplierless designs. However, the pipelined DSP48E1 design can achieve higher-frequency.

A post-place-and-route simulation in ModelSim was used to esti-mate the power consumption of the system using the Xilinx XPower tool. Table II shows the power dissipation of the designs running at 50 MHz. The DSP48E1-based correlators consume more power than the multiplierless correlators, but this is due primarily to increased dynamic power when using the DSP48E1s on the Virtex-6. The dynamic power of the non-pipelined DSP48E1-based correlator DSPc is greatest at 846 mW, but pipelining reduces this by a factor of more than 2.5 times, because of reduced switching activity between the multiplier and adder. The dynamic power of the multiplierless designs increases from 133 to 203 mW on Virtex-6 and from 149 to 294 mW on Spartan-6 as finer coefficient quantization is used. It is important to note that the quiescent power of the Spartan-6 is much lower by design. Hence, we can see that using this multiplierless technique allows us to synchronization on a Spartan-6 device, where a multiplier-based design is not possible, saving significant power and sacrificing little in terms of accuracy.

We also investigated how the total power consumption varies with the frequency, as shown in Fig. 6. As the frequency increases, the finer quantizations and DSP48E1-based designs begin to consume proportionally more power. Overall, multiplierless designs on the Spartan-6 consume 75%–85% less power than the same designs on the Virtex-6, and a 0.25 quantization design on the Spartan-6 consumes 81%–85% less power than the DSP48E1-based design on a Virtex-6.

The DPSc implementation represents how a "blind" design would be mapped. Our architecture-aware designs show significantly better performance, reduced area, and reduced power consumption.

The simulation results for FFT block is shown in the following figure.

The simulation waveform is obtained for 8-Pt FFT and OFDM output. The various output is obtained by varying the input values.



Fig 4.simulation waveform of 8-pt FFT



Fig 4.simulation waveform of OFDM

# V. CONCLUSIONS

In this paper, the OFDM transmitter is designed by XILINX ISE9.1i simulator. Here the transmitter part includes serial/parallel converter, IFFT block, cyclic prefix and modulation. We are designing these blocks by VHDL codings. The performance of OFDM transmitter is enhanced by the adding the coding blocks by using forward error correction techniques using VHDL code and synthesized using Xilinx ISE 9.1i.The result suggest that the area and power occupied by the conventional method are high so the multiplier less Correlators are used for low power consumption. The Carrier frequency offset Correlators are used here to reduce more power consumption and area more effectively.

# REFERENCES

[1]H.Thinh pham,A.Suhaib,Fahmy and Ian vince Mc Loughlin (2011) "Low power correlation for IEEE 802.16 OFDM Synchronization on FPGA", *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, Vol. 21, No.8.

[2]Jiun-Ping Wang, Shiann-Rong Kuang and Shish-Chang Liang, (2011) "High accuracy fixed-width modified Booth multipliers for lossy applications", *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, Vol. 19, No.1.

[3]A. Antoniou, F. El-Guibaly, and S. S. Kidambi, (1996) "Area-efficient multipliers for digital signal processing applications," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 43, no. 2, pp. 90–94.



[4]D. D. Caro, N. Petra, and A. G. M. Strollo, (2005) "Dual-tree error compensation for high performance fixed-width multipliers," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 52, no. 8, pp. 501–507.

[5]R.D.Chen, J.M.Jou, and S.R.Kuang, (1999) "Design of low-error fixed-width multipliers for DSP applications", *IEEE Trans. Circuits Syst. I*, Exp. Briefs, vol. 46, no.6, pp. 836-842.

[6]F. Elguibaly, (2000) "A fast parallel multiplier-accumulator using the modified Booth algorithm," *IEEE Trans. Circuits Syst. II, Reg. Papers*, vol. 47, no. 9, pp. 902–908.

[7]W. S. Feng, L. D. Van, and S. S.Wang, (2000) "Design of the low error fixed width multiplier and its application," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 47, no. 10, pp. 1112–1118.

[8]Z. Hong and R. Sedgewick, (1982) "Notes on merging networks," in *Proc. ACM Symp. Theory Comput.*, pp. 296–302.

[9]A. Inoue, T. Izawa, S.Kashiwakura, S. Mitarai, R. Ohe, T. Tsuru, and G.O.Young, (1997) "A 4.1-ns compact 54 \* 54 multiplier utilizing sign select Booth encoders," *IEEE J. Solid-State Circuits*, vol. 32, no. 11, pp. 1676–1682.

[10]C.W. Jen, and W.C. Yeh, (2000) "High-speed Booth encoded parallel multiplier design," *IEEE Trans. Computers*, vol. 49, no. 7, pp. 692–701.

[11]S.Y. Kuo, M.A. Song, and L.D. Van, (2007) "Adaptive low-error fixed width Booth multipliers," *IEICE Trans. Fundamentals*, vol. E90-A, no.6, pp. 1180–1187.

[12]M. J. Schulte and E. E. Swartzlander, Jr., (1993) "Truncated multiplication with correction constant," in *Proc. VLSI Signal Processing, VI*, New York, pp. 388–396.

[13]L.D.Van and C.C.Yang, (2005) "Generalized low-error area-efficient fixed width multipliers," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 52, no. 8, pp. 1608–1619.



# Name: Harikrishnan.

S.Hari Krishnan obtained his B.E Degree in Electronics & Communication Engineering from Mepco Schlenk Engineering College, Sivakasi and M.E Degree in VLSI DESIGN from Karpagam University, Coimbatore. His Field of interest includes Digital Signal Processing and VLSI Design. He has organized and attended more number of seminars, Workshops,

Faculty development programmes and conferences. He published various papers in international conferences and also He guided various UG projects.



### Name: Hemavathi H

Hemavathi H obtained her B.E Degree in Medical Electronics Engineering from Sri Krishna Institute Of Technology, Bangalore and M.Tech degree in Digital Electronics from East West Institute Of Technology, Bangalore.she has published published

various papers in international conferences.



# Name: Anjali.S.Nair

Anjali.S.Nair robtained her B.E Degree in Electronics & Communication Engineering Andhra University and M.E Degree in Microwave and Radars from Andhra University, Visakhapatnam.. Her Field of interest includes Signals and Systems and Antennas and Radars. She has organized and

attended various national and international conferences.



10 www.ijntr.org