Introduction
The purpose of this tutorial is to provide basic commentary on our preprocessing protocol. Each case we handle is treated differently. This is an example protocol that is typical for a grid subject.
Contents
Our current protocol (07/12/2017) can be segmented into three general sections - Standard preprocssing - Signal processing methods that are standard across all subjects - Includes resampling, filtering, and fieldtrip setup - Visual inspection - Artifact rejection, channel rejection, event rejection - Rereferencing - Reducing data dependencies
Standard preprocessing in FieldTrip
% Specify input and outputs for signal file inpath = '0_Data/IR53/sig/IR53_raw.besa'; outpath = '0_Data/IR53/sig/IR53_FT.mat';
Resample signal
Most datasets come in at 5000 Hz sample rate. We downsample to 1000 Hz (saves computation time; standard for the field)
% Load the data into fieldtrip friendly format
cfg = [];
cfg.dataset = inpath;
dat = ft_preprocessing(cfg);
processing channel { 'DC01' 'DC02' 'DC03' 'DC04' 'ROF1' 'ROF2' 'ROF3' 'ROF4' 'ROF5' 'ROF6' 'ROF7' 'ROF8' 'ROF9' 'ROF10' 'ROF11' 'ROF12' 'ROF13' 'ROF14' 'ROF15' 'ROF16' 'RFP1' 'RFP2' 'RFP3' 'RFP4' 'RFP5' 'RFP6' 'RFP7' 'RFP8' 'RFP9' 'RFP10' 'RFP11' 'RFP12' 'RFP13' 'RFP14' 'RFP15' 'RFP16' 'RST1' 'RST2' 'RST3' 'RST4' 'RST5' 'RST6' 'RST7' 'RST8' 'RST9' 'RST10' 'RST11' 'RST12' 'FZ' 'CZ' 'PZ' 'OZ' 'C3' 'C4' 'RUE' 'RLE' 'LUE' 'LLE' 'EKG' 'REF' 'RT1' 'RT2' 'RT3' 'RT4' 'RT5' 'RT6' 'RT7' 'RT8' 'RT9' 'RT10' 'RT11' 'RT12' 'RT13' 'RT14' 'RT15' 'RT16' 'RT17' 'RT18' 'RT19' 'RT20' 'RT21' 'RT22' 'RT23' 'RT24' 'RT25' 'RT26' 'RT27' 'RT28' 'RT29' 'RT30' 'RT31' 'RT32' } reading and preprocessing reading and preprocessing trial 1 from 1 the call to "ft_preprocessing" took 4 seconds and required the additional allocation of an estimated 2610 MB
Resample the data.
detrend:
Detrend at the experiment level (now) to account for large electrode drifts that can be seen in the signal. I also think that it is important to remove drifts before any band-filtering.
demean:
Demean at trial level (later) so ERPs plot cleanly. I don't run many amplutude sensitive analyses so it doesn't make a huge differences for me.
cfg = []; cfg.resamplefs = 1000; cfg.detrend = 'yes'; cfg.demean = 'no'; dat_res = ft_resampledata(cfg,dat); hdr = dat_res.hdr; hdr.Fs = 1000; hdr.nSamples = length(dat_res.trial{1});
the input is raw data with 92 channels and 1 trials the call to "ft_selectdata" took 0 seconds and required the additional allocation of an estimated 0 MB resampling data resampling data in trial 1 from 1 original sampling rate = 5000 Hz new sampling rate = 1000 Hz the call to "ft_resampledata" took 14 seconds and required the additional allocation of an estimated 508 MB
Filter Data
Bandpass Filter:
Why you shouldn't bandpass here
1) The processed dataset will be unusable for ERPs
2) You will obscure visual artifacts
3) You can always do it later
Why should you bandpass here?
1) You want to bandpass low frequency data for small time windows (there is A LOT to consider if you are doing that analysis anyway)
When should you bandpass?
Before any oscillatory analysis
What should you set as your bandpass?
[1,200 Hz] is what I'm generally comfortable with for a general bandpass. Some datasets will have high frequency noise around 100-150 Hz, if that is the case you may want to set your upper bound to a more conservative number
Bandstop Filter (aka Notch Filter):
Almost every dataset has significant electrical line noise at 60, 120, 180 Hz. Applying a notch filter will remove the noise in most cases. If line noise persists, I I throw the channel out if I am looking ERPs or >30 Hz oscillations.
cfg = []; cfg.preproc.bpfilter = 'no'; % Bandpass cfg.preproc.bpfreq = [1 200]; cfg.preproc.bsfilter = 'yes'; % Notch cfg.preproc.bsfreq = [58 62; 118 122; 178 182]; dat_res_filt = ft_preprocessing(cfg,dat_res); %Use resampled data
Warning: the trial definition in the configuration is inconsistent with the actual data Warning: reconstructing sampleinfo by assuming that the trials are consecutive segments of a continuous recording the call to "ft_selectdata" took 0 seconds and required the additional allocation of an estimated 522 MB preprocessing preprocessing trial 1 from 1 the call to "ft_preprocessing" took 0 seconds and required the additional allocation of an estimated 522 MB
Select channels to remove
Our current system is to compile notes on a spreadsheet. For some datasets (SEEG), Knight Lab will add notes to a shared file on their server. I pull that to our server and add additional notes. On that spreadsheets we mark channels to remove as well as electrode regional placements by various labelling schema (automatic, Knight Lab placements, Badre lab placement)
import_loc_notes_IR53_2; % function for loading .xlsx file with labels rem_ind = (OutofBrain)|(Epileptic)|(Discard); % labels for removed electrodes rem = strcat('-',Electrode(find(rem_ind))); keep = Electrode(find(~rem_ind)); elec_str = union(rem,keep); cfg.channel = ft_channelselection(elec_str,hdr.label);
Visual Inspection
View databrowser
This view mode is best for evaluating ictal artifacts and corrupted channels
Things to look for
1) Ictal artifacts
Ictal artifacts are screened by Knight Lab (usually twice). They will provide a list of artifacts that should be removed. Generally they choose to remove any electrode that is the source of the ictal event. If you want to be ultra conservative, you can remove the spread of the events as well (although usually they are considered potential source as well)
What to look for
- Sharp waveform
- Spreads across neighboring electrodes
- May occur several time on the same electrode
What to do
- Remove channel
- Remove event
2) High frequency noise
High frequency noise usually results from a bugged electrode, environmental noise, or unusual electrode placement (out of brain, on machine, etc.).
What to look for
-Consistent 'rippling'
-High frequency power on PSDs
What to do
- Throwout electrode
-Lowpass filter (I don't do this if I plan on rereferencing later).
3) Persistent line noise
Since we've already applied a notch filter, line noise at this point is significantly corruptive. There are more creative ways to remove line noise but at this point I tend to throw the channel out.
What to look for
- Unnatural oscillations that is always present (don't confuse with beta)
- PSD spikes
What to do
- Throwout electrode
- Advanced notch filters
% Use fieldtrip databrowser cfg.viewmode = 'vertical'; cfg.ylim = [-75,75]; new_cfg = ft_databrowser(cfg,dat_res_filt);
the input is raw data with 92 channels and 1 trials Warning: the data has been resampled, not showing the events detected 0 visual artifacts the call to "ft_databrowser" took 2 seconds and required the additional allocation of an estimated 521 MB
Remove eliminated channels
We have already removed channels that are marked in the .xlsx file; this next step removes channels that we specified using the databrowser. Note removed channels from the databrowser are not automatically reflected in the .xlsx sheet, and should be edited appropriately.
vec = zeros(size(dat_res_filt.cfg.channel)); for cid = 1:length(new_cfg.channel); vec(strmatch(new_cfg.channel{cid},dat_res_filt.cfg.channel,'exact')) = 1; end new_mat = dat_res_filt.trial{1}; new_mat = new_mat(find(vec),:); dat_res_filt.trial{1} = new_mat; dat_res_filt.label = new_cfg.channel;
Segment dataset into trials
Trial definition is kind of a pain right now. Check out my tutorial on event encoding to see how we encode event onset times to get a better idea of why.
At this point we have a vector of event onsets time compiled by my encocder.
load('/home/bfrick/Documents/MATLAB/ECoG_Pipeline/0_Data/IR53/sig/IR53_raw.mat', 'events_raw');
Since our trial length we are analyzing is [-250, 500] ms I choose a conservative [-1000 1000] ms. All the time window maneuvering is done during signal processing and data analysis. No need to constrain my window at this point.
pretrig = 1; % -1000 ms posttrig = 1; % +1000 ms Fs = 1000; % sample rate trl = []; si = []; for lid = 1:length(events_raw) new_trl = [events_raw(lid)-pretrig*Fs,events_raw(lid)+posttrig*Fs,pretrig*Fs ]; % [beginning, end, 0] new_si = [events_raw(lid)-pretrig*Fs,events_raw(lid)+posttrig*Fs]; % [beginning, end] trl = [trl; new_trl]; si = [si;new_si]; end cfg = []; cfg.trl = trl; cfg.sampleinfo = si; cfg.trials = 'all'; cfg.chans = 'all'; dat_res_filt_trial = ft_redefinetrial(cfg,dat_res_filt);
the input is raw data with 59 channels and 1 trials the call to "ft_redefinetrial" took 0 seconds and required the additional allocation of an estimated 0 MB
Trialwise Filtering
Now our data is in trial format! detrend. The data here is already detrended. I am 90% sure that detrending here is OK and should maintain transients (I believe it is a low frequency detrender that is not sensitive to window size), but I would check the documentation demean I demean the signnal here so the potentials are similar across electrodes and across trials bandpassfilter
cfg = []; cfg.detrend = 'no'; cfg.demean = 'yes'; cfg.preproc.bpfilter = 'no'; % Bandpass cfg.preproc.bpfreq = [1 200]; dat_trial = ft_preprocessing(cfg,dat_res_filt_trial);
the call to "ft_selectdata" took 0 seconds and required the additional allocation of an estimated 0 MB preprocessing preprocessing trial 320 from 320 the call to "ft_preprocessing" took 1 seconds and required the additional allocation of an estimated 0 MB
Trialwise visual inpection
This is similar to the databrowser we did before but now you'll all electrodes for one trial, and then all trials for one electrode. It's much harder to see the big picture here but it is useful for deteriming rereferencing schemas and finding artifacts you might have missed.
I like to use this pass for:
1) Second and third pass for general artifacts
2) Evaluation of correlations across electrodes
% *View trials* cfg = []; cfg.method = 'trial'; cfg.trials = 'all'; cfg.channel = ft_channelselection(elec_str,dat_res_filt_trial.label); cfg.keepchannel = 'no'; cfg.keeptrial = 'no'; dat_pass_1 = ft_rejectvisual(cfg,dat_trial);
the input is raw data with 59 channels and 320 trials showing the data per trial, all channels at once trial 1 marked as GOOD 320 trials marked as GOOD, 0 trials marked as BAD 59 channels marked as GOOD, 0 channels marked as BAD the call to "ft_selectdata" took 0 seconds and required the additional allocation of an estimated 0 MB the call to "ft_rejectvisual" took 2 seconds and required the additional allocation of an estimated 0 MB
View channels
cfg = []; cfg.method = 'channel'; % browse through channels cfg.trials = 'all'; cfg.channel = ft_channelselection(elec_str,dat_res_filt_trial.label); cfg.keepchannel = 'no'; cfg.keeptrial = 'no'; dat_pass_2 = ft_rejectvisual(cfg,dat_pass_1);
the input is raw data with 59 channels and 320 trials showing the data per channel, all trials at once channel ROF1 marked as GOOD 320 trials marked as GOOD, 0 trials marked as BAD 59 channels marked as GOOD, 0 channels marked as BAD the call to "ft_selectdata" took 0 seconds and required the additional allocation of an estimated 0 MB the call to "ft_rejectvisual" took 2 seconds and required the additional allocation of an estimated 1 MB
Rereferencing
Rereferencing is something we are still playing around with. I will post a tutorial on this when we have fully fleshed out some of our fringe ideas. In that tutorial I'll explain these analyses in much more detail
What we do
-For grid channels
--- Common average reference (CAR) across electrodes
- For SEEG channels
--- Bipolar referencing schema (using adjacent, same electrode, same region)
Why we do it
1) Remove single source artifacts
2) Remove conductive signalling
3) Remove external events (movement)
Concerns
1) CAR may propagate artifacts into healthy channels
-- which is why I rereference last
2) CAR does not really adress single source or conductive signalling
-- A bipolar montage poses a couple of problems for our grid cases (will adress in rereferencing tutorial)
-- We're not confidant in other methods (PCA, etc. addressed in rereferencing tutorial)
3) Bipolar rereferencing greatly reduces # of channels
-- necessary due to large conductance between same electrode channels
4) Bipolar creates a much larger ROI
-- necessary ...
CAR_BOOL = 1; % for most analyses I like to run it once with un-re-referenced data test the effect of rereferencing dat_reref = dat_pass_2; if CAR_BOOL load('0_Data/IR53/reg/simple.mat'); load('0_Data/IR53/reg/names.mat'); car_reref; % Rereferencing script for CAR dat_reref = dat_pass_2; for gn_id = 1:length(grid) cfg = []; cfg.chans = grid{gn_id}; cfg.reref = 'yes'; cfg.refchannel = grid{gn_id}; dat_reref = ft_preprocessing(cfg,dat_reref); end end
the call to "ft_selectdata" took 0 seconds and required the additional allocation of an estimated 0 MB preprocessing preprocessing trial 320 from 320 the call to "ft_preprocessing" took 1 seconds and required the additional allocation of an estimated 0 MB the call to "ft_selectdata" took 0 seconds and required the additional allocation of an estimated 0 MB preprocessing preprocessing trial 320 from 320 the call to "ft_preprocessing" took 1 seconds and required the additional allocation of an estimated 0 MB the call to "ft_selectdata" took 0 seconds and required the additional allocation of an estimated 0 MB preprocessing preprocessing trial 320 from 320 the call to "ft_preprocessing" took 1 seconds and required the additional allocation of an estimated 0 MB
Wrap-up
View trials again
This is your final draft, make sure nothing looks odd. Rereferencing (especially CAR) will highlight bugged channels and missed artifacts. If you're not saving your removed channels and events correctly, you'll notice that here as well.
cfg = []; cfg.method = 'trial'; cfg.trials = 'all'; cfg.channel = 'all'; cfg.keepchannel = 'no'; cfg.keeptrial = 'no'; dat_pass_temp = ft_rejectvisual(cfg,dat_reref);
the input is raw data with 59 channels and 320 trials showing the data per trial, all channels at once trial 1 marked as GOOD 320 trials marked as GOOD, 0 trials marked as BAD 59 channels marked as GOOD, 0 channels marked as BAD the call to "ft_selectdata" took 0 seconds and required the additional allocation of an estimated 0 MB the call to "ft_rejectvisual" took 2 seconds and required the additional allocation of an estimated 2 MB
Save it
Fieldtrip data formats are tricky. Any time you make a new preprocessing scipt, check your dat file here and look for inconsistencies. The dat file here will be your primary signal file for all future analyses.
display(['Saving... ',outpath]) dat = dat_pass_temp; reg = {}; % Add region for cid = 1:length(dat.label) reg{cid} = simple{strmatch(dat.label{cid},names,'exact')}; end dat.region = reg; save(outpath,'dat');
Saving... 0_Data/IR53/sig/IR53_FT.mat
Thanks! email me if you have concerns or questions