pulse2percept.datasets

Utilities to download and import datasets.

  • Dataset loaders can be used to load small datasets that come pre-packaged with the pulse2percept software.
  • Dataset fetchers can be used to download larger datasets from a given URL and directly import them into pulse2percept.
base get_data_dir, clear_data_dir, fetch_url
horsager2009 load_horsager2009
beyeler2019 fetch_beyeler2019
nanduri2012 load_nanduri2012
perezfornos2012 load_perezfornos2012
beyeler2019 fetch_beyeler2019
greenwald2009 load_greenwald2009
pulse2percept.datasets.clear_data_dir(data_dir=None)[source]

Delete all content in the data directory

By default, this is set to a directory called ‘pulse2percept_data’ in the user home directory. Alternatively, it can be set by a PULSE2PERCEPT_DATA environment variable or set programmatically by specifying a path.

New in version 0.6.

Parameters:data_dir (str or None) – The path to the pulse2percept data directory.
pulse2percept.datasets.fetch_url(url, file_path, progress_bar=<function _report_hook>, remote_checksum=None)[source]

Download a remote file

Fetch a dataset pointed to by url, check its SHA-256 checksum for integrity, and save it to file_path.

New in version 0.6.

Parameters:
  • url (string) – URL of file to download
  • file_path (string) – Path to the local file that will be created
  • progress_bar (func callback, optional) – A callback to a function func(count, block_size, total_size) that will display a progress bar.
  • remote_checksum (str, optional) – The expected SHA-256 checksum of the file.
pulse2percept.datasets.fetch_beyeler2019(subjects=None, electrodes=None, data_path=None, shuffle=False, random_state=0, download_if_missing=True)[source]

Load the phosphene drawing dataset from [Beyeler2019]

Download the phosphene drawing dataset described in [Beyeler2019] from https://osf.io/28uqg (66MB) to data_path. By default, all datasets are stored in ‘~/pulse2percept_data/’, but a different path can be specified.

Retinal implants: Argus I, Argus II
Subjects: 4
Number of samples: 400
Number of features: 16

The dataset includes the following features:

subject Subject ID, S1-S4
electrode Electrode ID, A1-F10
image Phosphene drawing
img_shape x,y shape of the phosphene drawing
date Experiment date (YYYY/mm/dd)
stim_class Stimulus type used to stimulate the array
amp Pulse amplitude used (x Threshold)
freq Pulse frequency used (Hz)
pdur Pulse duration used (ms)
area Phosphene area (see [Beyeler2019] for details)
orientation Phosphene orientation (see [Beyeler2019])
eccentricity Phosphene elongation (see [Beyeler2019])
compactness Phosphene compactness (see [Beyeler2019])
x_center, y_center Phosphene center of mass (see [Beyeler2019])
xrange, yrange Screen size in deg (see [Beyeler2019])

New in version 0.6.

Changed in version 0.7: Redirected download to 66MB version of the dataset that includes the fields x_center and y_center.

Parameters:
  • subjects (str | list of strings | None, optional) – Select data from a subject or list of subjects. By default, all subjects are selected.
  • electrodes (str | list of strings | None, optional) – Select data from a single electrode or a list of electrodes. By default, all electrodes are selected.
  • data_path (string, optional) – Specify another download and cache folder for the dataset. By default all pulse2percept data is stored in ‘~/pulse2percept_data’ subfolders.
  • shuffle (boolean, optional) – If True, the rows of the DataFrame are shuffled.
  • random_state (int | numpy.random.RandomState | None, optional, default: 0) – Determines random number generation for dataset shuffling. Pass an int for reproducible output across multiple function calls.
  • download_if_missing (optional) – If False, raise an IOError if the data is not locally available instead of trying to download it from the source site.
Returns:

data (pd.DataFrame) – The whole dataset is returned in a 400x16 Pandas DataFrame

pulse2percept.datasets.get_data_dir(data_dir=None)[source]

Return the path of the pulse2percept data directory

This directory is used to store the datasets retrieved by the data fetch utility functions to avoid downloading the data several times.

By default, this is set to a directory called ‘pulse2percept_data’ in the user home directory. Alternatively, it can be set by a PULSE2PERCEPT_DATA environment variable or set programmatically by specifying a path.

If the directory does not already exist, it is automatically created.

New in version 0.6.

Parameters:data_dir (str or None) – The path to the pulse2percept data directory.
pulse2percept.datasets.load_horsager2009(subjects=None, electrodes=None, stim_types=None, shuffle=False, random_state=0)[source]

Load data from [Horsager2009]

Load the threshold data described in [Horsager2009]. Average thresholds were extracted from the figures of the paper using WebplotDigitizer.

Retinal implants: Argus I
Subjects: 2
Number of samples: 552
Number of features: 21

The dataset includes the following features:

subject Subject ID, S05-S06
implant Argus I
electrode Electrode ID, A1-F10
task ‘threshold’ or ‘matching’
stim_type ‘single_pulse’, ‘fixed_duration’, ‘variable_duration’, ‘fixed_duration_supra’, ‘bursting_triplets’, ‘bursting_triplets_supra’, ‘latent_addition’
stim_dur Stimulus duration (ms)
stim_freq Stimulus frequency (Hz)
stim_amp Stimulus amplitude (uA)
pulse_type ‘cathodic_first’
pulse_dur Pulse duration (ms)
interphase_dur Interphase gap (ms)
delay_dur Stimulus delivered after delay (ms)
ref_stim_type Reference stimulus type (‘single_pulse’, …)
ref_freq Reference stimulus frequency (Hz)
ref_amp Reference stimulus amplitude (uA)
ref_amp_factor Reference stimulus amplitude factor (xThreshold)
ref_pulse_dur Reference stimulus pulse duration (ms)
ref_interphase_dur Reference stimulus interphase gap (ms)
theta Temporal model output at threshold (a.u.)
source Figure from which data was extracted

Some stimulus types require a reference stimulus. For example, ‘bursting_triplets_supra’ were delivered at 2x or 3x threshold of a reference bursting triplet pulse. The parameters of the reference stimulus are given in ref_* fields.

Missing values are denoted by NaN.

New in version 0.6.

Parameters:
  • subjects (str | list of strings | None, optional) – Select data from a subject or list of subjects. By default, all subjects are selected.
  • electrodes (str | list of strings | None, optional) – Select data from a single electrode or a list of electrodes. By default, all electrodes are selected.
  • stim_types (str | list of strings | None, optional) – Select data from a single stimulus type or a list of stimulus types. By default, all stimulus types are selected.
  • shuffle (boolean, optional) – If True, the rows of the DataFrame are shuffled.
  • random_state (int | numpy.random.RandomState | None, optional) – Determines random number generation for dataset shuffling. Pass an int for reproducible output across multiple function calls.
Returns:

data (pd.DataFrame) – The whole dataset is returned in a 552x21 Pandas DataFrame

pulse2percept.datasets.load_nanduri2012(electrodes=None, task=None, shuffle=False, random_state=0)[source]

Load data from [Nanduri2012]

Load the brightness and size rating data described in [Nanduri2012]. Datapoints were extracted from figure 4 of the paper using WebplotDigitizer.

Retinal implants: Argus I
Subjects: 1
Number of samples: 128
Number of features: 17

The dataset includes the following features:

subject Subject ID, S06
implant Argus I
electrode Electrode ID, A2, A4, B1, C1, C4, D2, D3, D4
task ‘rate’ or ‘size’
stim_class “Nanduri2012”
stim_dur Stimulus duration (ms)
freq Stimulus frequency (Hz)
amp_factor Stimulus amplitude ratio over threshold
brightness Patient rated brightness compared to reference stimulus
size Patient rated size compared to reference stimulus
ref_stim_class “Nanduri2012”
ref_amp_factor Amplitude factor (xTh) of reference pulse
ref_freq Frequency (Hz) of reference pulse
pulse_dur Pulse duration (ms)
pulse_type ‘cathodicFirst’
interphase_dur Interphase gap (ms)
varied_param Whether this trial is a part of ‘amp’ or ‘freq’ modulation

New in version 0.7.

Parameters:
  • electrodes (str | list of strings | None, optional) – Select data from a single electrode or a list of electrodes. By default, all electrodes are selected.
  • shuffle (boolean, optional) – If True, the rows of the DataFrame are shuffled.
  • random_state (int | numpy.random.RandomState | None, optional) – Determines random number generation for dataset shuffling. Pass an int for reproducible output across multiple function calls.
Returns:

data (pd.DataFrame) – The whole dataset is returned in a 144x16 Pandas DataFrame

pulse2percept.datasets.load_perezfornos2012(shuffle=False, subjects=None, figures=None, random_state=0)[source]

Load data from [PerezFornos2012]

Load the brightness associated with joystick position data described in [PerezFornos2012]. Datapoints were extracted from Figures 3-7 of the paper.

Retinal implants: Argus II
Subjects: 9
Number of samples: 45
Number of features: 158

The dataset includes the following features:

Subject Subject ID
Figure Reference figure from [PerezFornos2012]
time_series Numpy array of the time series data associated with the figure. Note: This was generated by linear interpolation from [-2.0, 75.5] in steps of 0.5

New in version 0.8.

Parameters:
  • shuffle (boolean, optional) – If True, the rows of the DataFrame are shuffled.
  • random_state (int | numpy.random.RandomState | None, optional) – Determines random number generation for dataset shuffling. Pass an int for reproducible output across multiple function calls.
  • subjects (str | list of strings | None, optional) – Select data from a single subject or a list of subjects. By default, all subjects are selected.
  • figures (str | list of strings | None, optional) – Select data from a single figure or a list of figures. By default, all figures are selected
Returns:

data (pd.DataFrame) – The whole dataset is returned in a 45x158 Pandas DataFrame

pulse2percept.datasets.load_greenwald2009(subjects=None, electrodes=None, shuffle=False, random_state=0)[source]

Load data from [Greenwald2009]

Load the brightness and size rating data described in [Greenwald2009]. Datapoints were extracted from figure 4 of the paper using WebplotDigitizer.

Retinal implants: Argus I
Subjects: 2
Number of samples: 83
Number of features: 12

The dataset includes the following features:

subject Subject ID, S06
implant Argus I
electrode Electrode ID, A2, A4, B1, C1, C4, D2, D3, D4
task ‘rate’
stim_class “Greenwald2009”
stim_dur Stimulus duration (ms)
amp Amplitude of the stimulation
brightness Patient reported brightness
pulse_dur Pulse duration (ms)
interphase_dur Interphase gap (ms)
pulse_type ‘cathodicFirst’
threshold Electrical stimulation threshold

New in version 0.7.

Parameters:
  • subjects (str | list of strings | None, optional) – Select data from a single subject or a list of subjects. By default, all subjects are selected
  • electrodes (str | list of strings | None, optional) – Select data from a single electrode or a list of electrodes. By default, all electrodes are selected.
  • shuffle (boolean, optional) – If True, the rows of the DataFrame are shuffled.
  • random_state (int | numpy.random.RandomState | None, optional) – Determines random number generation for dataset shuffling. Pass an int for reproducible output across multiple function calls.
Returns:

data (pd.DataFrame) – The whole dataset is returned in a 144x16 Pandas DataFrame