pulse2percept.datasets

Utilities to download and import datasets.

Dataset loaders can be used to load small datasets that come pre-packaged with the pulse2percept software.
Dataset fetchers can be used to download larger datasets from a given URL and directly import them into pulse2percept.

`base`	`get_data_dir`, `clear_data_dir`, `fetch_url`
`han2021`	`fetch_han2021`
`beyeler2019`	`fetch_beyeler2019`, `subject_params`
`nanduri2012`	`load_nanduri2012`
`perezfornos2012`	`load_perezfornos2012`
`greenwald2009`	`load_greenwald2009`
`horsager2009`	`load_horsager2009`

See also

Basic Concepts > Datasets

pulse2percept.datasets.clear_data_dir(data_dir=None)[source]

Delete all content in the data directory

By default, this is set to a directory called ‘pulse2percept_data’ in the user home directory. Alternatively, it can be set by a PULSE2PERCEPT_DATA environment variable or set programmatically by specifying a path.

Added in version 0.6.

Parameters:: data_dir (str or None) – The path to the pulse2percept data directory.

pulse2percept.datasets.fetch_url(url, file_path, progress_bar=<function _report_hook>, remote_checksum=None)[source]

Download a remote file

Fetch a dataset pointed to by url, check its SHA-256 checksum for integrity, and save it to file_path.

Added in version 0.6.

Parameters:

url (string) – URL of file to download
file_path (string) – Path to the local file that will be created
progress_bar (func callback, optional) – A callback to a function func(count, block_size, total_size) that will display a progress bar.
remote_checksum (str, optional) – The expected SHA-256 checksum of the file.

pulse2percept.datasets.fetch_beyeler2019(subjects=None, electrodes=None, data_path=None, shuffle=False, random_state=0, download_if_missing=True)[source]

Load the phosphene drawing dataset from [Beyeler2019]

Download the phosphene drawing dataset described in [Beyeler2019] from https://osf.io/28uqg (66MB) to data_path. By default, all datasets are stored in ‘~/pulse2percept_data/’, but a different path can be specified.

Retinal implants:	Argus I, Argus II
Subjects:	4
Number of samples:	400
Number of features:	16

The dataset includes the following features:

subject	Subject ID, S1-S4
electrode	Electrode ID, A1-F10
image	Phosphene drawing
img_shape	x,y shape of the phosphene drawing
date	Experiment date (YYYY/mm/dd)
stim_class	Stimulus type used to stimulate the array
amp	Pulse amplitude used (x Threshold)
freq	Pulse frequency used (Hz)
pdur	Pulse duration used (ms)
area	Phosphene area (see [Beyeler2019] for details)
orientation	Phosphene orientation (see [Beyeler2019])
eccentricity	Phosphene elongation (see [Beyeler2019])
compactness	Phosphene compactness (see [Beyeler2019])
x_center, y_center	Phosphene center of mass (see [Beyeler2019])
xrange, yrange	Screen size in deg (see [Beyeler2019])

Added in version 0.6.

Changed in version 0.7: Redirected download to 66MB version of the dataset that includes the fields x_center and y_center.

Parameters:

subjects (str | list of strings | None, optional) – Select data from a subject or list of subjects. By default, all subjects are selected.
electrodes (str | list of strings | None, optional) – Select data from a single electrode or a list of electrodes. By default, all electrodes are selected.
data_path (string, optional) – Specify another download and cache folder for the dataset. By default all pulse2percept data is stored in ‘~/pulse2percept_data’ subfolders.
shuffle (boolean, optional) – If True, the rows of the DataFrame are shuffled.
random_state (int | numpy.random.RandomState | None, optional, default: 0) – Determines random number generation for dataset shuffling. Pass an int for reproducible output across multiple function calls.
download_if_missing (optional) – If False, raise an IOError if the data is not locally available instead of trying to download it from the source site.

Returns:

data (pd.DataFrame) – The whole dataset is returned in a 400x16 Pandas DataFrame

pulse2percept.datasets.load_horsager2009(subjects=None, electrodes=None, stim_types=None, shuffle=False, random_state=0)[source]

Load data from [Horsager2009]

Load the threshold data described in [Horsager2009]. Average thresholds were extracted from the figures of the paper using WebplotDigitizer.

Retinal implants:	Argus I
Subjects:	2
Number of samples:	552
Number of features:	21

The dataset includes the following features:

subject	Subject ID, S05-S06
implant	Argus I
electrode	Electrode ID, A1-F10
task	‘threshold’ or ‘matching’
stim_type	‘single_pulse’, ‘fixed_duration’, ‘variable_duration’, ‘fixed_duration_supra’, ‘bursting_triplets’, ‘bursting_triplets_supra’, ‘latent_addition’
stim_dur	Stimulus duration (ms)
stim_freq	Stimulus frequency (Hz)
stim_amp	Stimulus amplitude (uA)
pulse_type	‘cathodic_first’
pulse_dur	Pulse duration (ms)
interphase_dur	Interphase gap (ms)
delay_dur	Stimulus delivered after delay (ms)
ref_stim_type	Reference stimulus type (‘single_pulse’, …)
ref_freq	Reference stimulus frequency (Hz)
ref_amp	Reference stimulus amplitude (uA)
ref_amp_factor	Reference stimulus amplitude factor (xThreshold)
ref_pulse_dur	Reference stimulus pulse duration (ms)
ref_interphase_dur	Reference stimulus interphase gap (ms)
theta	Temporal model output at threshold (a.u.)
source	Figure from which data was extracted

Some stimulus types require a reference stimulus. For example, ‘bursting_triplets_supra’ were delivered at 2x or 3x threshold of a reference bursting triplet pulse. The parameters of the reference stimulus are given in ref_* fields.

Missing values are denoted by NaN.

Added in version 0.6.

Parameters:

subjects (str | list of strings | None, optional) – Select data from a subject or list of subjects. By default, all subjects are selected.
electrodes (str | list of strings | None, optional) – Select data from a single electrode or a list of electrodes. By default, all electrodes are selected.
stim_types (str | list of strings | None, optional) – Select data from a single stimulus type or a list of stimulus types. By default, all stimulus types are selected.
shuffle (boolean, optional) – If True, the rows of the DataFrame are shuffled.
random_state (int | numpy.random.RandomState | None, optional) – Determines random number generation for dataset shuffling. Pass an int for reproducible output across multiple function calls.

Returns:

data (pd.DataFrame) – The whole dataset is returned in a 552x21 Pandas DataFrame

pulse2percept.datasets.load_nanduri2012(electrodes=None, task=None, shuffle=False, random_state=0)[source]

Load data from [Nanduri2012]

Load the brightness and size rating data described in [Nanduri2012]. Datapoints were extracted from figure 4 of the paper using WebplotDigitizer.

Retinal implants:	Argus I
Subjects:	1
Number of samples:	128
Number of features:	17

The dataset includes the following features:

subject	Subject ID, S06
implant	Argus I
electrode	Electrode ID, A2, A4, B1, C1, C4, D2, D3, D4
task	‘rate’ or ‘size’
stim_class	“Nanduri2012”
stim_dur	Stimulus duration (ms)
freq	Stimulus frequency (Hz)
amp_factor	Stimulus amplitude ratio over threshold
brightness	Patient rated brightness compared to reference stimulus
size	Patient rated size compared to reference stimulus
ref_stim_class	“Nanduri2012”
ref_amp_factor	Amplitude factor (xTh) of reference pulse
ref_freq	Frequency (Hz) of reference pulse
pulse_dur	Pulse duration (ms)
pulse_type	‘cathodicFirst’
interphase_dur	Interphase gap (ms)
varied_param	Whether this trial is a part of ‘amp’ or ‘freq’ modulation

Added in version 0.7.

Parameters:

electrodes (str | list of strings | None, optional) – Select data from a single electrode or a list of electrodes. By default, all electrodes are selected.
shuffle (boolean, optional) – If True, the rows of the DataFrame are shuffled.
random_state (int | numpy.random.RandomState | None, optional) – Determines random number generation for dataset shuffling. Pass an int for reproducible output across multiple function calls.

Returns:

data (pd.DataFrame) – The whole dataset is returned in a 144x16 Pandas DataFrame

pulse2percept.datasets.load_perezfornos2012(shuffle=False, subjects=None, figures=None, random_state=0)[source]

Load data from [PerezFornos2012]

Load the brightness associated with joystick position data described in [PerezFornos2012]. Datapoints were extracted from Figures 3-7 of the paper.

Retinal implants:	Argus II
Subjects:	9
Number of samples:	45
Number of features:	158

The dataset includes the following features:

Subject	Subject ID
Figure	Reference figure from [PerezFornos2012]
time_series	Numpy array of the time series data associated with the figure. Note: This was generated by linear interpolation from [-2.0, 75.5] in steps of 0.5

Added in version 0.8.

Parameters:

shuffle (boolean, optional) – If True, the rows of the DataFrame are shuffled.
random_state (int | numpy.random.RandomState | None, optional) – Determines random number generation for dataset shuffling. Pass an int for reproducible output across multiple function calls.
subjects (str | list of strings | None, optional) – Select data from a single subject or a list of subjects. By default, all subjects are selected.
figures (str | list of strings | None, optional) – Select data from a single figure or a list of figures. By default, all figures are selected

Returns:

data (pd.DataFrame) – The whole dataset is returned in a 45x158 Pandas DataFrame

pulse2percept.datasets.load_greenwald2009(subjects=None, electrodes=None, shuffle=False, random_state=0)[source]

Load data from [Greenwald2009]

Load the brightness and size rating data described in [Greenwald2009]. Datapoints were extracted from figure 4 of the paper using WebplotDigitizer.

Retinal implants:	Argus I
Subjects:	2
Number of samples:	83
Number of features:	12

The dataset includes the following features:

subject	Subject ID, S06
implant	Argus I
electrode	Electrode ID, A2, A4, B1, C1, C4, D2, D3, D4
task	‘rate’
stim_class	“Greenwald2009”
stim_dur	Stimulus duration (ms)
amp	Amplitude of the stimulation
brightness	Patient reported brightness
pulse_dur	Pulse duration (ms)
interphase_dur	Interphase gap (ms)
pulse_type	‘cathodicFirst’
threshold	Electrical stimulation threshold

Added in version 0.7.

Parameters:

subjects (str | list of strings | None, optional) – Select data from a single subject or a list of subjects. By default, all subjects are selected
electrodes (str | list of strings | None, optional) – Select data from a single electrode or a list of electrodes. By default, all electrodes are selected.
shuffle (boolean, optional) – If True, the rows of the DataFrame are shuffled.
random_state (int | numpy.random.RandomState | None, optional) – Determines random number generation for dataset shuffling. Pass an int for reproducible output across multiple function calls.

Returns:

data (pd.DataFrame) – The whole dataset is returned in a 144x16 Pandas DataFrame