 SI: Data Intensive Computing
 Open Access
 Published:
Massively parallel nonstationary EEG data processing on GPGPU platforms with Morlet continuous wavelet transform
Journal of Internet Services and Applications volume 3, pages347–357(2012)
Abstract
Morlet continuous wavelet transform (MCWT) has been widely used to process nonstationary electroencephalogram (EEG) data. Nowadays, the MCWT application for processing EEG data is timesensitive and dataintensive due to quickly increasing problem domain sizes and advancing experimental techniques. In this paper, we proposed a massively parallel MCWT approach based on GPGPU to address this research challenge. The proposed approach treats MCWT as four main computing subprocedures and parallelizes them with CUDA correspondingly. We focused on optimizing FFT on GPUs to improve the performance of MCWT. Extensive experiments have been carried out on Fermi and Kepler GPUs and a Fermi GPU cluster. The results indicate that (1) the proposed approach (especially on Kepler GPU) can ensure encouraging runtime performance of processing nonstationary EEG data in contrast to CPUbased MCWT, (2) the performance can further be improved on the GPU cluster but performance bottleneck exists when running multiple GPGPUs on one node, and (3) tuning an appropriate FFT radix is important to the performance of our MCWT.
Introduction
Most data from either natural phenomena or artificial sources are nonlinear and nonstationary in nature [30]. In the past decade, numerous works have focused on how to efficiently analyze nonstationary data [4, 14–16, 19–21, 30, 32]. Wavelet transformbased approaches are mainstream for dealing with nonstationary data for their capabilities of multiresolution analysis for all the scales [17] to extract shortlived transient information.
The approaches in wavelet transform family can be of two types, i.e., continuous wavelet transform (CWT) and discrete wavelet transform (DWT). The most notable difference between CWT and DWT lies in CWT’s highly redundant nature of the analyzing functions, which are not limited in an orthonormal basis while DWT selects only those scales which do provide an orthonormal basis. That means CWT’s resolution in scale is arbitrary. CWT can then enable a more comprehensive analysis than DWT does, even though CWT must be discretized for numerical evaluation [15]. CWT may rely on various mother wavelets, e.g., Morlet, Mexican hat and Paul [29], among which CWT upon Morlet basis (MCWT) is salient for its capacity of mining transient characteristics hidden in nonstationary data [29] with a wide application in analyzing EEG data [4, 16, 19–21].
As the problem domain sizes and experimental techniques for recording activities of EEG systems have been advancing quickly, i.e., rapidly increasing number of channels and sampling frequencies, the density and the spatial scale of nonstationary for EEG data analysis have been increasing exponentially. For instance, for the analysis of the interaction dynamics of multiple neural oscillations [21], the number of electrodes has increased from tens to more than one thousand [4]. For a MCWT application for EEG in practice, it often requires a realtime or near realtime data analysis. As such, nowadays MCWT applications for EEG are commonly timesensitive and dataintensive.
Modern graphics processing units (GPUs) have evolved from a configurable graphics processor to massively parallel manycore multiprocessors for rapidly solving data and time intensive [24]. In this paper, we explored the feasibility of using GeneralPurpose computation on Graphics Processing Units (GPGPU) to address the challenge of performance and scalability in EEG MCWT applications. A GPGPUbased MCWT has been developed for this purpose.
We first adapted a MCWT [22] approach to the manycore architecture of GPU. The MCWT algorithm can be viewed as an integration of four computing subprocedures that are: (1) a forward fast Fourier transform (fFFT) of multichannel EEG data, (2) transform of Morlet wavelets from time domain to frequency domain, (3) inner production between transformed data and transformed Morlet wavelets, and (4) inversed fast Fourier transform (iFFT) of inner production results.
We employed a coalesced GPU global memory access scheme [12], a parallelization scheme of twodimension GPU thread grid and onedimension thread block for the first subprocedure. Each row of the thread grid owns multiple blocks responsible for onechannel EEG data. For second and third subprocedures, we dealt with multiple scale factors using a twodimension GPU thread block assigning method for exploiting timefrequencyscale data parallelism. For the last subprocedure, we used a similar method to that for the first subprocedure (fFFT) except that the row number of thread grid needs to be larger since the number of channels of data increases to S times in the case of using S scale factors.
These GPGPUbased algorithms have been designed with NVIDIA CUDA [3] to map these subprocedures onto four different groups of massively parallel executions. In this study, we further investigated how to enhance the proposed MCWT approach by optimizing FFT using large radix to reduce the time complexity [12].
A case study has been carried out to evaluate the performance of the proposed approach upon GPUs of Fermi [1] architecture and the latest Kepler [2] architecture using a large EEG data. We test this approach on a GPU cluster as well. The results indicate that the proposed approach can ensure encouraging runtime performance of processing nonstationary EEG data in contrast to CPUbased MCWT. To the best of our knowledge, the proposed approach is the first massively parallel CWT aided by GPGPU. It should also be noted that the approach applies to other types of nonstationary data rather than being specific to EEG.
The remainder of this paper is organized as follows: Sect. 2 presents some typical work related to processing nonstationary data. Section 3 introduces the conventional MCWT on CPU and proposes our GPGPUbased MCWT and its variants. Experiments and results are given in Sect. 4. We conclude the paper with a summary and present future work in Sect. 5.
Related work
In the past decade, numerous methods have emerged for processing nonstationary data [4, 8, 9, 13–16, 19–21, 23, 25–27, 30, 32]. ShortTime Fourier Transform (STFT) and Wigner–Ville distribution (WVD) [14] are two classic methods. STFT uses uniform time and frequency resolutions to analyze nonstationary signals while WVD can extract crossterms between various components of nonstationary signals. A common problem with the two methods is that they often fail to discover transient phenomena from nonstationary data.
In addition to the above, Wang et al proposed an adaptive analysis method called empirical data decomposition (EDD) [30]. The EDD algorithm also implements a multiresolution analysis similar to wavelet transform. Xuan et al. [32] used empirical mode decomposition to decompose precipitation data to probe the nonstationary dynamics.
Comparing to these approaches, the wavelet transformbased methods have a more wide range of applications. For example, Gurley and Kareem proposed a series of methods based on CWT and DWT to process nonstationary data in terms of earthquake, wind and ocean engineering [13]. Akhtar et al. developed a framework based on independent component analysis (ICA) and a DWT variant to correctly detect artifacts embedded in EEG data [4]. Using the proposed framework, ICA is used to extract artifactonly independent components from EEGs and further the DWT is employed to remove any cerebral activity from the extracted artifacts independent components to get clean EEG data. The key difference from our work is that our work focuses on CWT with more comprehensive ability of analyzing nonstationary data.
MCWT has been used in various research areas and disciplines. Li et al. [20, 21] applied MCWT in analyzing and quantifying the instantaneous interaction dynamics between neuronal population to understand the mechanism of epileptic seizure in EEG. Klein et al. [16] proposed a MCWTbased coherence analysis approach to monitoring timedependent changes in the coherences among multichannel EEG. Fligge et al. [9] used MCWT to objectively determine the length of sunspot cycle and carry out error analysis on longterm solar activities, e.g., sunspot number, sunspot area. MCWT has also been employed to extract two complementary wavelet skeleton spectra to discriminate the components of periodicities and of hierarchies of discontinuities from several largesize time series represent solar activity records [27]. Pi\(\ddot{o}\)ft [26] has used MCWT to process four long Czech mean monthly temperature series from 1775 to 2001, and then the temperature variability in the Czech Republic has been examined. None of them have considered to use GPU platforms to help MCWT while we did so.
Recent research has focused on using manycore platform such as GPU to improve wavelet transform, and nearly all works along this direction are targeted at DWT. For instance, Wong et al implement a twodimension DWT with Cg and OpenGL on a GeForce GTX 7800 [31]. Similarly, in [28] authors also explore the implementation of a fast 2DDWT with Filter Bank Scheme (FBS) and Lifting Scheme (LS) using Cg on the same GPU. With NVIDIA’s CUDA library [6], people have implemented 2DDWT variants [10, 18] and a 3DDWT on GPUs [11]. In contrast to these methods, we aimed to significantly promote a CWT approach with the latest GPGPU technologies to better cater for the needs of processing massive nonstationary EEG data.
Morlet continuous wavelet transform on GPGPU
In this section, we first present a MCWT algorithm operation on CPU. We then detail the design of the GPGPUbased MCWT for multichannel EEG data.
Morlet continuous wavelet transform on CPU
Let us denote nonstationary EEG data be a discrete time series X = {\(x_{n} n\in \)[0, N]} with equal time space dt. A CWT algorithm computes wavelet coefficients \(\omega (s,\tau )\) of the discrete time series X. Based on [21], \(\omega (s,\tau )\) is computed by the convolution of wavelet function \(\psi (n)\) analyzed series X, \(\left\langle {X,\left. {\psi _{\tau ,s}^{} (n)} \right\rangle } \right.\), that means:
where s and \(\tau \) represent the scale and translation, respectively and “*” means complex conjugation. By tuning the value of scale factor s, we can extract a set of various frequency components. Through a CWT, the information of X can then is projected into a twodimension space (s and \(\tau \)) for further data analysis. As for MCWT, the parent wavelet function \(\psi _0 (n)\) in CWT is given:
where \(\omega _0\) is the wavelet central angle frequency. Furthermore, based on the timedomain convolution theorem [5], the convolution of two series in time domain can be indirectly computed with inner production of the transformed frequency series of two series. The convolution \(\langle {X, {\psi _{\tau ,s}^{} (n)} \rangle } \) can be computed as the following four steps:

Step1. Transform X from time domain to frequency domain to generate a frequency series \(X(\omega )\) using Fourier transform.

Step2. Transform \(\psi _{\tau ,s}^{} (n)\) from time domain to frequency domain to generate a frequency series \(\phi ^* (s\omega )\) with computing angle frequency.

Step3. Compute the inner production of \(X(\omega )\) and \(\sqrt{s} \)\(\phi ^* (s\omega )\), that is referenced as [\(X(\omega )\) , \(\sqrt{s} \)\(\phi ^* (s\omega )\)], where \(\sqrt{s} \) is a factor for energy normalization across the different scales.

Step4. Transform [\(X(\omega )\) , \(\sqrt{s} \)\(\phi ^* (s\omega )\)] from frequency domain back to time domain to get \(w(s,\tau )\) using inverse fourier transform.
Thus, \(w(s,\tau )\) can be computed as the following formula:
where IFT means the inverse fourier transform. More specifically, the algorithm of MCWT on CPU (namely “MCWTCPU”) for multiplechannel EEG data is shown in Algorithm 1.
Taking an EEG data processing as instance: an EEG data set has m channels and n (assume a number of power of 2) data points per channel. The EEG data can be written as a matrix E[m][n], and we denote l scale factors as a vector S[l]. Since the data segment in each channel can be scaled up to l times after being processing with MCWT, the wavelet coefficients are written as a threedimension array W[l][m][n] as illustrated in Fig. 1.
GPUbased Morlet continuous wavelet transform (MCWTGPU)
Based on Algorithm 1, parallelism exists in the following main subprocedures:

1. Forward 1DFFT (fFFT) procedure of multiplechannel EEG data (lines 2–4 in Algorithm 1).

2. The transform procedure of Morlet wavelets from time domain to frequency domain under multiple channels and different scale factors (lines 5–11).

3. Inner production of \(X(\omega )\) and \(\sqrt{s} \)\(\phi ^* (s\omega )\) (lines 12–18).

4. Inverse FFT (iFFT) procedure of multiplechannel data under different scale factors (lines 19–23).
We proposed a parallelized MCWT (namely “MCWTGPU”) as in Algorithm 2.
Parallelizing subprocedures based on FFT
One of the key issues of a GPUbased algorithm is the design of thread blocks to maximize the exploitation of data parallelism. Subprocedure (1) deals with multichannel data using fFFT while subprocedure (4) executes iFFT to handle multichannel data under various scale factors. Two schemes of thread assignment have been proposed for parallelizing fFFT and iFFT as illustrated in Fig. 2.
The first scheme (see Fig. 2a) makes all thread blocks twodimensional. In each block, all threads are indexed along one dimension. Section 4.1 details how the number of thread in one block may affect the performance. For fFFT, a thread block b(x, y) is responsible for processing the xth segment’s data of the yth channel with x\(\in \)[0, d) and y\(\in \)[0, m). For iFFT, all thread blocks are also twodimensional, but a thread block should be referenced as b(x, y\(\times \)z) to deal with the xth segment’s data of the yth channel under a scale factor z\(\in \) [0, s) (see 2(b)).
NVIDIA has provided CUFFT, which is an official parallel FFT library on GPU [7]. An alternate FFT has also been proposed in [12], which optimizes FFT’s performance via three efficient memory access schemes:(1) shared memory is used when the size of data set is small, (2) coalesced global memory access scheme is employed for large size, and (3) a hierarchical memory access method is to compute large data stream’s FFT by combining the FFTs of smaller data with share memory. In this paper, we use the coalesced global memory access scheme to develop the GPGPUaided MCWT.
Parallelizing other subprocedures
The other subprocedures, i.e., transforming Morlet wavelets from time domain to frequency domain and inner production, are similar to each other. We then designed one thread assignment scheme for both subprocedures as shown in Fig. 3.
Similar to the design of thread blocks for fFFT, here the scheme first sets up d\(\times \)m twodimensional thread blocks in a grid. A total number of d blocks cooperate with each other to process the data in one channel. In the context of a thread block, the scheme indexes intrablock threads along two dimensions to deal with data with different scale factors. Taking Thread(x, y) as an example, the first dimension x denotes the xth data segment while the second dimension y means that the data is processed under the scale factor y.
Parallelism analysis
Given a fixedsize workload with mchannel EEG data, we analyze the speedup ratio of proposed MCWT. Let the time cost of processing the fixedsize workload using CPU and GPU be \(T_\mathrm{CPU}\) and \(T_\mathrm{GPU}\) respectively. Thus, the speedup factor is that:
Based on Algorithm 1. \(T_\mathrm{CPU}\) is the total time of processing mchannel EEG data in sequence. So:
where FFT, Transfer, InnerProduct and iFFT represent the time to execute four subprocedures for onechannel data.
Let \(N_{c}\) be the number of cores in one GPU card. \(T_\mathrm{GPU}\) consists of two parts. One part is the parallel execution time using GPU for processing some data channel with the maximum time cost. The other is time for GPU thread synchronization. So, \(T_\mathrm{GPU}\) can be represented as:
where \(\alpha \) means the number of available GPU cores and \(\alpha \in [1, N_{c}]\). The value of \(\alpha \) depends on how many GPU cores can be exploited to execute MCWT in parallel. Sync is time spent to synchronize the GPU threads.
Optimizing FFT on GPU
We further investigated how to improve the GPGPUaided approach by optimizing the FFT algorithm. Algorithm 3 illustrates the global memory FFT for processing onechannel EEG data as suggested in [12]. The value of the FFT’s radix is set to 2.
In Algorithm 3, FFT_GPU() function needs to be called \(\log _R^N \) times for processing the data with the length of Ns (line 3).
Since each call will lead to high data transferring costs, using bigger FFT radix R can reduce the times of calling Call_FFT_GPU(). It is a way to improve FFT especially for largesize data. However, the algorithm only described the implementation of radix2 FFT (line 18). Therefore, this motivates us to propose a FFT with radixR (R\(>\)2) on GPU. This algorithm is described in Algorithm 4 where float2 is used to represent a complex number.
Through the above FFT algorithm’s modifications, we further accelerate MCWT on GPGPU for largersize data.
Experiments and results
We have evaluated the performance of the proposed MCWT methods against large EEG dataset upon various platforms empowered by cuttingedge NVIDIA GPUs and highperformance networks. The experiments focus on execution times of these GPGPUaided MCWT methods.
Experiment setup
Data set
We chose an EEG dataset for testing, which was obtained from a patient with epilepsy using 64 sampling channels with a frequency of 1,600 Hz for one hour. In total, the EEG has \(64 \times 3{,}600 \times 1{,}600\) data points.
Given that a MCWT uses M scale factors, each point in the EEG signal needs to occupy at least 2\(\times \)M times GPU memory space. This is because the value of a data point first should be transformed to a complex number and then be scaled M times when being processed over a GPU. As a result, a GPGPUaided MCWT algorithm is likely to cause the GPU memory to deplete.
Testbed configurations
We chose two Fermi GPUs and a Kepler GPU to evaluate the GPGPUaided methods on top of three individual computers. The Table 1 gives the major configurations of the three computers.
The proposed approach has also be evaluated on a GPU cluster illustrated in Fig. 4. The GPU cluster consists of one master node for managements and three slave nodes for executing computing tasks. These four nodes are connected via one 1Gbps’s Ethernet and a 40Gbps’s InfiniBand network and each node is equipped with four Tesla C2050 GPU cards, eight E5620 CPU cores and 24GB host memory.
Specification of the number of threads in a block
We have first explored whether/how the number of threads in a thread block may affect the runtime performance for a given scale on a GPU. For this purpose, we executed MCWTGPU to process a data segment over a GPU and tuned the number of threads to measure the execution times. As suggested in [3], the numbers of threads per thread block have been specified as powers of two and \(2^{10}\) to the maximum. Figure 5 highlights the results with scale set as 20 on GTX580, Tesla C2025, and GTX680. For the three GPUs, the numbers of threads to achieve the optimal performance are \(2^{7}\), \(2^{8}\), and \(2^{9}\) respectively. In the experiments for examining runtime efficiency, we have always identified the number of threads for each scale over a given GPU and used the optimal setting for performance evaluation.
Runtime efficiency
In the following experiments, we first compared the execution times for processing the whole EEG data with MCWTGPU based on CUFFT and optimized FFT in [12] on a single GPU called as MCWTSGC and MCWTSGO respectively against the sequential MCWT method based on CPU(MCWTC) as introduced in Sect. 3.1. We also measured the runtime times for processing this EEG data set upon the GPU cluster (MCWTGC). Finally, we evaluated the performance of optimized GPGPUaided MCWT.
Executing times on single GPUs
For this set of experiments, the executing time includes the time required to copy data pieces from host to device and vice versa. The parameter \(\omega _0 \) of MCWT is set to 6 as an optimal value to adjust the timefrequency resolution [21].
As shown in Table 2, Comparing to MCWTC, MCWTSGC gains speedups ranging from 9.3 to 10.1 on computer #1 , from 8.0 to 6.1 on computer #2 and from 13.1 to 7.5 on computer #3 with the increase of scale. This can be seen that MCWTSGC faces a performance bottleneck when processing largesize data.
In contrast, MCWTSGO gains higher speedups than MCWTSGC. When scale = 40, MCWTSGO runs 11.5, 5.3 and 11.1 times of MCWTSGC. Clearly, MCWTSGO dramatically outperforms MCWTSGC for dealing with relatively large data. We trust that the performance improvement is a result of using optimized FFT. In other words, FFT is a key factor which contributes to the overall performance of MCWT. With the assistance of the coalesced global memory access scheme, MCWTSGO needs to access the global memory much less than its counterpart does especially when dealing with large data.
MCWTC performs better on Computer #2 than on Computer #3. This is because Computer #2 has a much larger main memory to be capable of handling the high demand for memory by the MCWT algorithm.
MCWTSGO performs better on Computer #3 than on other platforms. This is because GTX 680 adopts Kepler, the latest GPU architecture. The memory subsystem of the Kepler architecture is completely revamped, which results in a 6008MHz data rate. GTX 680 offers the highest memory clock speeds of any GPU in the industry [2]. In this way, the proposed method using optimized memory access scheme can properly exploit the new features of memory subsystem of the Kepler GPU. The results again indicate that it is important to in order to improve the performance of GPGPUaided MCWT.
Executing times on the GPU cluster
In this set of experiments, we equally distributed the whole EEG data set into three slave nodes of the GPU cluster and processed them in parallel. On each slave node, the data are simultaneously processed using GPUaided MCWT with various numbers of GPUs. In our setting, master node means management node to control parallel tasks while slave nodes are computing nodes.
For comparison purpose, the results of MCWTSG on computer #1 with a Tesla C2050 GPU Card (shown in Table 1) have been referred to as a baseline.
In Fig. 6, MCWTGC2(3, 4)GPUs/Node means each node uses two(three, four) GPUs to run MCWTGCO. The results indicate that MCWTGC2GPUs/Node has a speedup of 2.6 comparing to MCWTSGO. This is reasonable as more nodes were used to process the data set in parallel.
However, we observed that MCWTGC3GPUs/Node only has 1.4 times than MCWTGC2GPUs/Node and MCWTGC4GPUs/Node is 1.1 times relative to MCWTGC3GPUs/Node. That is because that multiple GPUs on the same node contend PCIE Bus resources.
Evaluating the improved GPGPUaid MCWT
To study the effect of the FFT with a larger radix on MCWT, we executed MCWTSGO multiple times to process an EEG data segment (6 s) under scale 30, and each run has a different radix setting for the FFT.
Figure 7 shows that MCWTSGO performs better when using radix4 FFT than using radix2 FFT. However, MCWTSGO’s performance becomes worse when FFT radix increases from 4 to 8 and 16. The results indicate that selecting an appropriate radix value is important to MCWTSGO’s performance. When a large radix value is set, although MCWT iterates few times, the workload of a single iteration may become excessively heavy. If the workload exceeds the parallel processing capacity of the GPU, the performance of MCWTSGO will certainly decrease.
Conclusion and future work
In this paper, we proposed a parallel MCWT with GPGPU platforms to address the challenge of analyzing massive nonstationary EEG data in an efficient and scalable manner.
The proposed approach adapts the embedded parallelisms in the MCWT algorithm to the manycore architecture of GPU using CUDA platform. The MCWT algorithm has been separated into four main subprocedures. The subprocedures have been parallelized using various schemes, including FFT, transforming Morlet wavelets from time domain to frequency domain, and inner production. The improved version of GPGPUaided approach has been proposed as well.
A case study of EEG data analysis has also been performed to examine the proposed approach and its performance. Different GPUs have been adopted for the purpose, including devices of Fermi and Kepler architectures. Furthermore, we also evaluated our approach on a GPU cluster. Finally, we assessed the impacts of FFT radix on GPGPUaided MCWT.
Experimental results show that (1) MCWTSGO can significantly outperform MCWTC and MCWTSGC, especially on Kepler GPU, (2) MCWTGC can further improve the performance but performance bottleneck exists when running multiple GPGPUs on one node, and (3) tuning an appropriate FFT radix is important to MCWTSGO.
In the future, we plan to further study the approach to solving bottleneck of MCWTGC for multiple GPGPUs on one node.
References
 1.
NVIDIA Corporation. Nvidia next generation cuda compute architecture: Fermi (2009)
 2.
NVIDIA Corporation. Whitepaper NVIDIA GeForce GTX 680 (2012)
 3.
NVIDIA CUDA C programming guide version 4.2 (2012)
 4.
Akhtar MT, Mitsuhashi W, James CJ (2012) Employing spatially constrained ICA and wavelet denoising, for automatic removal of artifacts from multichannel EEG data. Signal Process 92(2):401–416
 5.
Boashash B (2003) Time frequency signal analysis and processing: a comprehensive reference
 6.
Corporation N (2007) Nvidia compute unified device architecture (CUDA) programming guide version 1.1
 7.
Corporation N (2012) CUDA CUFFT, Library
 8.
Erol S (2011) Timefrequency analyses of tidegauge sensor data. Sensors 11:3939–3961
 9.
Fligge M, Solanki SK, Beer J (1999) Determination of solar cycle length variations using the continuous wavelet transform. Astron Astrophys 483:313–321
 10.
Franco J, Bernabe G, Fernandez J, Ujaldon M (2011) The 2D wavelet transform on emerging architectures: GPUs and multicores. J RealTime Image Process 99:1–8
 11.
Francoa J, Bernab G, Fernandez J, Ujaldon M (2010) Parallel 3D fast wavelet transform on manycore GPUs and multicore CPUs. In: International conference on computational science, p 1101C1110
 12.
Govindaraju NK, Lloyd B, Dotsenko Y, Smith B, Manferdelli J (2008) High performance discrete Fourier transforms on graphics processors. In: Proceedings of supercomputing, pp 1–12
 13.
Gurley K, Kareem A (1999) Applications of wavelet transforms in earthquake. Wind and ocean engineering. Eng Struct 21(2):149–167
 14.
Hambaba A (2012) Nonstationary statistical tests in timescale space. In: IEEE aerospace conference proceedings, pp 373–379
 15.
Johnson RW (2012) Symmetrization and enhancement of the continuous Morlet transform for spectral density estimation. Int J Wavelets Multiresol Inf Process 10(1):1–12
 16.
Klein A, Sauer T, Jedynak A, Skrandies W (2006) Conventional and wavelet coherence applied to sensoryevoked electrical brain activity. IEEE Trans Biomed Eng 53:266–272
 17.
Kumar P, FoufoulaGeorgiou E (1997) Wavelet analysis for geophysical applications. Rev Geophys 35(4):385–412
 18.
van der Laan WJ, Jalba AC, Roerdink JB (2011) Accelerating wavelet lifting on graphics hardware using CUDA. IEEE Trans Parallel Distrib Syst 22(1):132–146
 19.
Lachaux JP, Lutz A, Rudrauf D, Cosmelli D, Quyen MLV, Martinerie J, Varela F (2002) Estimating the timecourse of coherence between singletrial brain signals: an introduction to wavelet coherence. Clin Neurophysiol 32:157–174
 20.
Li X, Yao X, FIEEE JRGJ, Fox J (2005) Computational neuronal oscillation with morlet wavelet transform. In: Proceedings of 27th annual international conference of the IEEE engineering in medicine and biology Society, pp 1–4
 21.
Li X, Yao X, Fox J, Jefferys JG (2007) Interaction dynamics of neuronal oscillations analysed using wavelet transforms. J Neurosci Methods 160:178–185
 22.
Liu CL (2010) A tutorial of the wavelet transform
 23.
SouzaEcher MP, Echer E, Nordemann DJR, Rigozo NR (2009) Multiresolution analysis of global surfaceair temperature and solar activity relationship. J Atmos SolarTerrestr Phys 71:41–44
 24.
Nickolls J, Dally WJ (2010) The GPU computing era. IEEE Micro 30(2):56–69
 25.
Park SG, Sim HJ, Lee HJ, Oh JE (2008) Application of nonstationary signal characteristics using wavelet packet transformation. J Mech Sci Technol 22(11):2122–2133
 26.
Pioft P, Kalvova J, Brazdil R (2004) Cycles and trends in the CZECH temperature series using wavelet transforms. Int J Climatol 24:1661–1670
 27.
Polygiannakis J, PrekaPapadema P, Moussas X (2003) On signalnoise decomposition of timeseries using the continuous wavelet transform: application to sunspot index. Astr Soc 343:725–734
 28.
Tenllado C, Setoain J, Pinuel L, Tirado F (2008) Parallel implementation of the 2D discrete wavelet transform on graphics processing units: Filter Bank versus Lifting. IEEE Trans Parallel Distrib Syst 19(3):299–310
 29.
Torrence C, Compo GP (1998) A practical guide to wavelet analysis. Bull Am Meteorol Soc 79(1):61–78
 30.
Wang X, Deng J, Wang Z, Li C (2007) An adaptive analysis method for nonstationary Data_Empirical data decomposition. In: Third international conference on natural computation, pp 3–7
 31.
Wong TT, Leung CS, Heng PA, Wang J (2007) Discrete wavelet transform on consumerlevel graphics hardware. IEEE Trans Multimedia 9(3):668–673
 32.
Xuan Z, Xie S, Sun Q (2010) The empirical mode decomposition process of nonstationary signals. In: International conference on measuring technology and mechatronics automation, pp 866–869
Acknowledgments
This work is funded in part by National Science Fund for Distinguished Young Scholars (grant No.61025019), the National Natural Science Foundation of China (grants No. 61272314), the Program for New Century Excellent Talents in University (NCET110722), the Fundamental Research Funds for the Central Universities (No.CUGL100608, No.CUGL100231, No.G1323511175 and G1323521289, China University of Geosciences, Wuhan), the Specialized Research Fund for the Doctoral Program of Higher Education (grant No. 20110145110010), the Programme of HighResolution Earth Observing System (China), and the Hundred University Talent of Creative Research Excellence Programme (Hebei, China).
Author information
Rights and permissions
About this article
Cite this article
Deng, Z., Chen, D., Hu, Y. et al. Massively parallel nonstationary EEG data processing on GPGPU platforms with Morlet continuous wavelet transform. J Internet Serv Appl 3, 347–357 (2012). https://doi.org/10.1007/s1317401200711
Received:
Accepted:
Published:
Issue Date:
Keywords
 Morlet continuous wavelet transform
 EEG data
 GPGPU