Massively parallel nonstationary EEG data processing on GPGPU platforms with Morlet continuous wavelet transform
 Ze Deng^{1},
 Dan Chen^{1}Email author,
 Yangyang Hu^{1},
 Xiaoming Wu^{1},
 Weizhou Peng^{1} and
 Xiaoli Li^{2}
https://doi.org/10.1007/s1317401200711
© The Brazilian Computer Society 2012
Received: 27 June 2012
Accepted: 4 October 2012
Published: 10 November 2012
Abstract
Morlet continuous wavelet transform (MCWT) has been widely used to process nonstationary electroencephalogram (EEG) data. Nowadays, the MCWT application for processing EEG data is timesensitive and dataintensive due to quickly increasing problem domain sizes and advancing experimental techniques. In this paper, we proposed a massively parallel MCWT approach based on GPGPU to address this research challenge. The proposed approach treats MCWT as four main computing subprocedures and parallelizes them with CUDA correspondingly. We focused on optimizing FFT on GPUs to improve the performance of MCWT. Extensive experiments have been carried out on Fermi and Kepler GPUs and a Fermi GPU cluster. The results indicate that (1) the proposed approach (especially on Kepler GPU) can ensure encouraging runtime performance of processing nonstationary EEG data in contrast to CPUbased MCWT, (2) the performance can further be improved on the GPU cluster but performance bottleneck exists when running multiple GPGPUs on one node, and (3) tuning an appropriate FFT radix is important to the performance of our MCWT.
Keywords
Morlet continuous wavelet transform EEG data GPGPU1 Introduction
Most data from either natural phenomena or artificial sources are nonlinear and nonstationary in nature [30]. In the past decade, numerous works have focused on how to efficiently analyze nonstationary data [4, 14–16, 19–21, 30, 32]. Wavelet transformbased approaches are mainstream for dealing with nonstationary data for their capabilities of multiresolution analysis for all the scales [17] to extract shortlived transient information.
The approaches in wavelet transform family can be of two types, i.e., continuous wavelet transform (CWT) and discrete wavelet transform (DWT). The most notable difference between CWT and DWT lies in CWT’s highly redundant nature of the analyzing functions, which are not limited in an orthonormal basis while DWT selects only those scales which do provide an orthonormal basis. That means CWT’s resolution in scale is arbitrary. CWT can then enable a more comprehensive analysis than DWT does, even though CWT must be discretized for numerical evaluation [15]. CWT may rely on various mother wavelets, e.g., Morlet, Mexican hat and Paul [29], among which CWT upon Morlet basis (MCWT) is salient for its capacity of mining transient characteristics hidden in nonstationary data [29] with a wide application in analyzing EEG data [4, 16, 19–21].
As the problem domain sizes and experimental techniques for recording activities of EEG systems have been advancing quickly, i.e., rapidly increasing number of channels and sampling frequencies, the density and the spatial scale of nonstationary for EEG data analysis have been increasing exponentially. For instance, for the analysis of the interaction dynamics of multiple neural oscillations [21], the number of electrodes has increased from tens to more than one thousand [4]. For a MCWT application for EEG in practice, it often requires a realtime or near realtime data analysis. As such, nowadays MCWT applications for EEG are commonly timesensitive and dataintensive.
Modern graphics processing units (GPUs) have evolved from a configurable graphics processor to massively parallel manycore multiprocessors for rapidly solving data and time intensive [24]. In this paper, we explored the feasibility of using GeneralPurpose computation on Graphics Processing Units (GPGPU) to address the challenge of performance and scalability in EEG MCWT applications. A GPGPUbased MCWT has been developed for this purpose.
We first adapted a MCWT [22] approach to the manycore architecture of GPU. The MCWT algorithm can be viewed as an integration of four computing subprocedures that are: (1) a forward fast Fourier transform (fFFT) of multichannel EEG data, (2) transform of Morlet wavelets from time domain to frequency domain, (3) inner production between transformed data and transformed Morlet wavelets, and (4) inversed fast Fourier transform (iFFT) of inner production results.
We employed a coalesced GPU global memory access scheme [12], a parallelization scheme of twodimension GPU thread grid and onedimension thread block for the first subprocedure. Each row of the thread grid owns multiple blocks responsible for onechannel EEG data. For second and third subprocedures, we dealt with multiple scale factors using a twodimension GPU thread block assigning method for exploiting timefrequencyscale data parallelism. For the last subprocedure, we used a similar method to that for the first subprocedure (fFFT) except that the row number of thread grid needs to be larger since the number of channels of data increases to S times in the case of using S scale factors.
These GPGPUbased algorithms have been designed with NVIDIA CUDA [3] to map these subprocedures onto four different groups of massively parallel executions. In this study, we further investigated how to enhance the proposed MCWT approach by optimizing FFT using large radix to reduce the time complexity [12].
A case study has been carried out to evaluate the performance of the proposed approach upon GPUs of Fermi [1] architecture and the latest Kepler [2] architecture using a large EEG data. We test this approach on a GPU cluster as well. The results indicate that the proposed approach can ensure encouraging runtime performance of processing nonstationary EEG data in contrast to CPUbased MCWT. To the best of our knowledge, the proposed approach is the first massively parallel CWT aided by GPGPU. It should also be noted that the approach applies to other types of nonstationary data rather than being specific to EEG.
The remainder of this paper is organized as follows: Sect. 2 presents some typical work related to processing nonstationary data. Section 3 introduces the conventional MCWT on CPU and proposes our GPGPUbased MCWT and its variants. Experiments and results are given in Sect. 4. We conclude the paper with a summary and present future work in Sect. 5.
2 Related work
In the past decade, numerous methods have emerged for processing nonstationary data [4, 8, 9, 13–16, 19–21, 23, 25–27, 30, 32]. ShortTime Fourier Transform (STFT) and Wigner–Ville distribution (WVD) [14] are two classic methods. STFT uses uniform time and frequency resolutions to analyze nonstationary signals while WVD can extract crossterms between various components of nonstationary signals. A common problem with the two methods is that they often fail to discover transient phenomena from nonstationary data.
In addition to the above, Wang et al proposed an adaptive analysis method called empirical data decomposition (EDD) [30]. The EDD algorithm also implements a multiresolution analysis similar to wavelet transform. Xuan et al. [32] used empirical mode decomposition to decompose precipitation data to probe the nonstationary dynamics.
Comparing to these approaches, the wavelet transformbased methods have a more wide range of applications. For example, Gurley and Kareem proposed a series of methods based on CWT and DWT to process nonstationary data in terms of earthquake, wind and ocean engineering [13]. Akhtar et al. developed a framework based on independent component analysis (ICA) and a DWT variant to correctly detect artifacts embedded in EEG data [4]. Using the proposed framework, ICA is used to extract artifactonly independent components from EEGs and further the DWT is employed to remove any cerebral activity from the extracted artifacts independent components to get clean EEG data. The key difference from our work is that our work focuses on CWT with more comprehensive ability of analyzing nonstationary data.
MCWT has been used in various research areas and disciplines. Li et al. [20, 21] applied MCWT in analyzing and quantifying the instantaneous interaction dynamics between neuronal population to understand the mechanism of epileptic seizure in EEG. Klein et al. [16] proposed a MCWTbased coherence analysis approach to monitoring timedependent changes in the coherences among multichannel EEG. Fligge et al. [9] used MCWT to objectively determine the length of sunspot cycle and carry out error analysis on longterm solar activities, e.g., sunspot number, sunspot area. MCWT has also been employed to extract two complementary wavelet skeleton spectra to discriminate the components of periodicities and of hierarchies of discontinuities from several largesize time series represent solar activity records [27]. Pi\(\ddot{o}\)ft [26] has used MCWT to process four long Czech mean monthly temperature series from 1775 to 2001, and then the temperature variability in the Czech Republic has been examined. None of them have considered to use GPU platforms to help MCWT while we did so.
Recent research has focused on using manycore platform such as GPU to improve wavelet transform, and nearly all works along this direction are targeted at DWT. For instance, Wong et al implement a twodimension DWT with Cg and OpenGL on a GeForce GTX 7800 [31]. Similarly, in [28] authors also explore the implementation of a fast 2DDWT with Filter Bank Scheme (FBS) and Lifting Scheme (LS) using Cg on the same GPU. With NVIDIA’s CUDA library [6], people have implemented 2DDWT variants [10, 18] and a 3DDWT on GPUs [11]. In contrast to these methods, we aimed to significantly promote a CWT approach with the latest GPGPU technologies to better cater for the needs of processing massive nonstationary EEG data.
3 Morlet continuous wavelet transform on GPGPU
In this section, we first present a MCWT algorithm operation on CPU. We then detail the design of the GPGPUbased MCWT for multichannel EEG data.
3.1 Morlet continuous wavelet transform on CPU

Step1. Transform X from time domain to frequency domain to generate a frequency series \(X(\omega )\) using Fourier transform.

Step2. Transform \(\psi _{\tau ,s}^{} (n)\) from time domain to frequency domain to generate a frequency series \(\phi ^* (s\omega )\) with computing angle frequency.

Step3. Compute the inner production of \(X(\omega )\) and \(\sqrt{s} \)\(\phi ^* (s\omega )\), that is referenced as [\(X(\omega )\) , \(\sqrt{s} \)\(\phi ^* (s\omega )\)], where \(\sqrt{s} \) is a factor for energy normalization across the different scales.

Step4. Transform [\(X(\omega )\) , \(\sqrt{s} \)\(\phi ^* (s\omega )\)] from frequency domain back to time domain to get \(w(s,\tau )\) using inverse fourier transform.
3.2 GPUbased Morlet continuous wavelet transform (MCWTGPU)

1. Forward 1DFFT (fFFT) procedure of multiplechannel EEG data (lines 2–4 in Algorithm 1).

2. The transform procedure of Morlet wavelets from time domain to frequency domain under multiple channels and different scale factors (lines 5–11).

3. Inner production of \(X(\omega )\) and \(\sqrt{s} \)\(\phi ^* (s\omega )\) (lines 12–18).

4. Inverse FFT (iFFT) procedure of multiplechannel data under different scale factors (lines 19–23).
3.2.1 Parallelizing subprocedures based on FFT
The first scheme (see Fig. 2a) makes all thread blocks twodimensional. In each block, all threads are indexed along one dimension. Section 4.1 details how the number of thread in one block may affect the performance. For fFFT, a thread block b(x, y) is responsible for processing the xth segment’s data of the yth channel with x\(\in \)[0, d) and y\(\in \)[0, m). For iFFT, all thread blocks are also twodimensional, but a thread block should be referenced as b(x, y\(\times \)z) to deal with the xth segment’s data of the yth channel under a scale factor z\(\in \) [0, s) (see 2(b)).
3.2.2 Parallelizing other subprocedures
Similar to the design of thread blocks for fFFT, here the scheme first sets up d\(\times \)m twodimensional thread blocks in a grid. A total number of d blocks cooperate with each other to process the data in one channel. In the context of a thread block, the scheme indexes intrablock threads along two dimensions to deal with data with different scale factors. Taking Thread(x, y) as an example, the first dimension x denotes the xth data segment while the second dimension y means that the data is processed under the scale factor y.
3.2.3 Parallelism analysis
3.2.4 Optimizing FFT on GPU
We further investigated how to improve the GPGPUaided approach by optimizing the FFT algorithm. Algorithm 3 illustrates the global memory FFT for processing onechannel EEG data as suggested in [12]. The value of the FFT’s radix is set to 2.
In Algorithm 3, FFT_GPU() function needs to be called \(\log _R^N \) times for processing the data with the length of Ns (line 3).
Since each call will lead to high data transferring costs, using bigger FFT radix R can reduce the times of calling Call_FFT_GPU(). It is a way to improve FFT especially for largesize data. However, the algorithm only described the implementation of radix2 FFT (line 18). Therefore, this motivates us to propose a FFT with radixR (R\(>\)2) on GPU. This algorithm is described in Algorithm 4 where float2 is used to represent a complex number.
Major hardware and software features of experimental computers
Features  Computer#1  Computer#2  Computer#3 

Hardware  
CPU  E7500@2.93GHz  E5620@2.4GHz  i72600@3.4GHz 
GPU  GTX580 (Fermi)  Tesla C2050 (Fermi)  GTX680 (Kepler) 
Memory of the Host  6GB  24GB  16GB 
Software  
OS  Windows 7  Red Hat Enterprise  Windows 7 
Linux Server 5.4  
CUDA version  4.1  4.1  4.1 
Through the above FFT algorithm’s modifications, we further accelerate MCWT on GPGPU for largersize data.
4 Experiments and results
We have evaluated the performance of the proposed MCWT methods against large EEG dataset upon various platforms empowered by cuttingedge NVIDIA GPUs and highperformance networks. The experiments focus on execution times of these GPGPUaided MCWT methods.
4.1 Experiment setup
4.1.1 Data set
We chose an EEG dataset for testing, which was obtained from a patient with epilepsy using 64 sampling channels with a frequency of 1,600 Hz for one hour. In total, the EEG has \(64 \times 3{,}600 \times 1{,}600\) data points.
Given that a MCWT uses M scale factors, each point in the EEG signal needs to occupy at least 2\(\times \)M times GPU memory space. This is because the value of a data point first should be transformed to a complex number and then be scaled M times when being processed over a GPU. As a result, a GPGPUaided MCWT algorithm is likely to cause the GPU memory to deplete.
4.1.2 Testbed configurations
We chose two Fermi GPUs and a Kepler GPU to evaluate the GPGPUaided methods on top of three individual computers. The Table 1 gives the major configurations of the three computers.
4.1.3 Specification of the number of threads in a block
4.2 Runtime efficiency
In the following experiments, we first compared the execution times for processing the whole EEG data with MCWTGPU based on CUFFT and optimized FFT in [12] on a single GPU called as MCWTSGC and MCWTSGO respectively against the sequential MCWT method based on CPU(MCWTC) as introduced in Sect. 3.1. We also measured the runtime times for processing this EEG data set upon the GPU cluster (MCWTGC). Finally, we evaluated the performance of optimized GPGPUaided MCWT.
4.2.1 Executing times on single GPUs
Performance comparison of MCWTC and MCWTSG in terms of executing time
Scale  MCWTC(s)  MCWTSGC(s)  Speedup  MCWTSGO(s)  Speedup 

Computer#1 (GTX580)  
10  4,513  486  9.3  47  96.0 
20  8,719  702  12.4  77  113.2 
30  12,938  1,024  12.6  108  119.8 
40  17,160  1,568  11.0  136  126.2 
50  21,244  1,924  11.0  167  127.2 
60  25,067  2,419  10.4  198  126.6 
70  29,039  2,871  10.1  230  126.3 
Computer#2 (Tesla C2050)  
10  2,342  292  8.0  71  32.9 
20  4,611  570  8.1  144  32.1 
30  6,937  939  7.4  202  34.3 
40  9,357  1,344  7.0  267  35.1 
50  11,654  1,795  6.5  330  35.3 
60  13,991  2,216  6.3  388  36.1 
70  16,328  2,702  6.1  455  35.9 
Computer#3 (GTX680)  
10  2,987  228  13.1  39  76.6 
20  5,876  516  11.4  67  87.7 
30  8,753  951  9.2  94  93.1 
40  11,619  1,345  8.6  121  96.1 
50  14,375  1,774  8.1  149  96.5 
60  17,109  2,253  7.6  178  96.1 
70  20,508  2,719  7.5  206  99.6 
As shown in Table 2, Comparing to MCWTC, MCWTSGC gains speedups ranging from 9.3 to 10.1 on computer #1 , from 8.0 to 6.1 on computer #2 and from 13.1 to 7.5 on computer #3 with the increase of scale. This can be seen that MCWTSGC faces a performance bottleneck when processing largesize data.
In contrast, MCWTSGO gains higher speedups than MCWTSGC. When scale = 40, MCWTSGO runs 11.5, 5.3 and 11.1 times of MCWTSGC. Clearly, MCWTSGO dramatically outperforms MCWTSGC for dealing with relatively large data. We trust that the performance improvement is a result of using optimized FFT. In other words, FFT is a key factor which contributes to the overall performance of MCWT. With the assistance of the coalesced global memory access scheme, MCWTSGO needs to access the global memory much less than its counterpart does especially when dealing with large data.
MCWTC performs better on Computer #2 than on Computer #3. This is because Computer #2 has a much larger main memory to be capable of handling the high demand for memory by the MCWT algorithm.
MCWTSGO performs better on Computer #3 than on other platforms. This is because GTX 680 adopts Kepler, the latest GPU architecture. The memory subsystem of the Kepler architecture is completely revamped, which results in a 6008MHz data rate. GTX 680 offers the highest memory clock speeds of any GPU in the industry [2]. In this way, the proposed method using optimized memory access scheme can properly exploit the new features of memory subsystem of the Kepler GPU. The results again indicate that it is important to in order to improve the performance of GPGPUaided MCWT.
4.2.2 Executing times on the GPU cluster
In this set of experiments, we equally distributed the whole EEG data set into three slave nodes of the GPU cluster and processed them in parallel. On each slave node, the data are simultaneously processed using GPUaided MCWT with various numbers of GPUs. In our setting, master node means management node to control parallel tasks while slave nodes are computing nodes.
For comparison purpose, the results of MCWTSG on computer #1 with a Tesla C2050 GPU Card (shown in Table 1) have been referred to as a baseline.
However, we observed that MCWTGC3GPUs/Node only has 1.4 times than MCWTGC2GPUs/Node and MCWTGC4GPUs/Node is 1.1 times relative to MCWTGC3GPUs/Node. That is because that multiple GPUs on the same node contend PCIE Bus resources.
4.2.3 Evaluating the improved GPGPUaid MCWT
To study the effect of the FFT with a larger radix on MCWT, we executed MCWTSGO multiple times to process an EEG data segment (6 s) under scale 30, and each run has a different radix setting for the FFT.
5 Conclusion and future work
In this paper, we proposed a parallel MCWT with GPGPU platforms to address the challenge of analyzing massive nonstationary EEG data in an efficient and scalable manner.
The proposed approach adapts the embedded parallelisms in the MCWT algorithm to the manycore architecture of GPU using CUDA platform. The MCWT algorithm has been separated into four main subprocedures. The subprocedures have been parallelized using various schemes, including FFT, transforming Morlet wavelets from time domain to frequency domain, and inner production. The improved version of GPGPUaided approach has been proposed as well.
A case study of EEG data analysis has also been performed to examine the proposed approach and its performance. Different GPUs have been adopted for the purpose, including devices of Fermi and Kepler architectures. Furthermore, we also evaluated our approach on a GPU cluster. Finally, we assessed the impacts of FFT radix on GPGPUaided MCWT.
Experimental results show that (1) MCWTSGO can significantly outperform MCWTC and MCWTSGC, especially on Kepler GPU, (2) MCWTGC can further improve the performance but performance bottleneck exists when running multiple GPGPUs on one node, and (3) tuning an appropriate FFT radix is important to MCWTSGO.
In the future, we plan to further study the approach to solving bottleneck of MCWTGC for multiple GPGPUs on one node.
Declarations
Acknowledgments
This work is funded in part by National Science Fund for Distinguished Young Scholars (grant No.61025019), the National Natural Science Foundation of China (grants No. 61272314), the Program for New Century Excellent Talents in University (NCET110722), the Fundamental Research Funds for the Central Universities (No.CUGL100608, No.CUGL100231, No.G1323511175 and G1323521289, China University of Geosciences, Wuhan), the Specialized Research Fund for the Doctoral Program of Higher Education (grant No. 20110145110010), the Programme of HighResolution Earth Observing System (China), and the Hundred University Talent of Creative Research Excellence Programme (Hebei, China).
Authors’ Affiliations
References
 NVIDIA Corporation. Nvidia next generation cuda compute architecture: Fermi (2009)Google Scholar
 NVIDIA Corporation. Whitepaper NVIDIA GeForce GTX 680 (2012)Google Scholar
 NVIDIA CUDA C programming guide version 4.2 (2012)Google Scholar
 Akhtar MT, Mitsuhashi W, James CJ (2012) Employing spatially constrained ICA and wavelet denoising, for automatic removal of artifacts from multichannel EEG data. Signal Process 92(2):401–416View ArticleGoogle Scholar
 Boashash B (2003) Time frequency signal analysis and processing: a comprehensive referenceGoogle Scholar
 Corporation N (2007) Nvidia compute unified device architecture (CUDA) programming guide version 1.1Google Scholar
 Corporation N (2012) CUDA CUFFT, LibraryGoogle Scholar
 Erol S (2011) Timefrequency analyses of tidegauge sensor data. Sensors 11:3939–3961View ArticleGoogle Scholar
 Fligge M, Solanki SK, Beer J (1999) Determination of solar cycle length variations using the continuous wavelet transform. Astron Astrophys 483:313–321Google Scholar
 Franco J, Bernabe G, Fernandez J, Ujaldon M (2011) The 2D wavelet transform on emerging architectures: GPUs and multicores. J RealTime Image Process 99:1–8Google Scholar
 Francoa J, Bernab G, Fernandez J, Ujaldon M (2010) Parallel 3D fast wavelet transform on manycore GPUs and multicore CPUs. In: International conference on computational science, p 1101C1110Google Scholar
 Govindaraju NK, Lloyd B, Dotsenko Y, Smith B, Manferdelli J (2008) High performance discrete Fourier transforms on graphics processors. In: Proceedings of supercomputing, pp 1–12Google Scholar
 Gurley K, Kareem A (1999) Applications of wavelet transforms in earthquake. Wind and ocean engineering. Eng Struct 21(2):149–167View ArticleGoogle Scholar
 Hambaba A (2012) Nonstationary statistical tests in timescale space. In: IEEE aerospace conference proceedings, pp 373–379Google Scholar
 Johnson RW (2012) Symmetrization and enhancement of the continuous Morlet transform for spectral density estimation. Int J Wavelets Multiresol Inf Process 10(1):1–12Google Scholar
 Klein A, Sauer T, Jedynak A, Skrandies W (2006) Conventional and wavelet coherence applied to sensoryevoked electrical brain activity. IEEE Trans Biomed Eng 53:266–272Google Scholar
 Kumar P, FoufoulaGeorgiou E (1997) Wavelet analysis for geophysical applications. Rev Geophys 35(4):385–412View ArticleGoogle Scholar
 van der Laan WJ, Jalba AC, Roerdink JB (2011) Accelerating wavelet lifting on graphics hardware using CUDA. IEEE Trans Parallel Distrib Syst 22(1):132–146View ArticleGoogle Scholar
 Lachaux JP, Lutz A, Rudrauf D, Cosmelli D, Quyen MLV, Martinerie J, Varela F (2002) Estimating the timecourse of coherence between singletrial brain signals: an introduction to wavelet coherence. Clin Neurophysiol 32:157–174View ArticleGoogle Scholar
 Li X, Yao X, FIEEE JRGJ, Fox J (2005) Computational neuronal oscillation with morlet wavelet transform. In: Proceedings of 27th annual international conference of the IEEE engineering in medicine and biology Society, pp 1–4Google Scholar
 Li X, Yao X, Fox J, Jefferys JG (2007) Interaction dynamics of neuronal oscillations analysed using wavelet transforms. J Neurosci Methods 160:178–185Google Scholar
 Liu CL (2010) A tutorial of the wavelet transformGoogle Scholar
 SouzaEcher MP, Echer E, Nordemann DJR, Rigozo NR (2009) Multiresolution analysis of global surfaceair temperature and solar activity relationship. J Atmos SolarTerrestr Phys 71:41–44View ArticleGoogle Scholar
 Nickolls J, Dally WJ (2010) The GPU computing era. IEEE Micro 30(2):56–69View ArticleGoogle Scholar
 Park SG, Sim HJ, Lee HJ, Oh JE (2008) Application of nonstationary signal characteristics using wavelet packet transformation. J Mech Sci Technol 22(11):2122–2133View ArticleGoogle Scholar
 Pioft P, Kalvova J, Brazdil R (2004) Cycles and trends in the CZECH temperature series using wavelet transforms. Int J Climatol 24:1661–1670View ArticleGoogle Scholar
 Polygiannakis J, PrekaPapadema P, Moussas X (2003) On signalnoise decomposition of timeseries using the continuous wavelet transform: application to sunspot index. Astr Soc 343:725–734View ArticleGoogle Scholar
 Tenllado C, Setoain J, Pinuel L, Tirado F (2008) Parallel implementation of the 2D discrete wavelet transform on graphics processing units: Filter Bank versus Lifting. IEEE Trans Parallel Distrib Syst 19(3):299–310View ArticleGoogle Scholar
 Torrence C, Compo GP (1998) A practical guide to wavelet analysis. Bull Am Meteorol Soc 79(1):61–78View ArticleGoogle Scholar
 Wang X, Deng J, Wang Z, Li C (2007) An adaptive analysis method for nonstationary Data_Empirical data decomposition. In: Third international conference on natural computation, pp 3–7Google Scholar
 Wong TT, Leung CS, Heng PA, Wang J (2007) Discrete wavelet transform on consumerlevel graphics hardware. IEEE Trans Multimedia 9(3):668–673View ArticleGoogle Scholar
 Xuan Z, Xie S, Sun Q (2010) The empirical mode decomposition process of nonstationary signals. In: International conference on measuring technology and mechatronics automation, pp 866–869Google Scholar