doubt.datasets.space_shuttle

Space shuttle data set.

This data set is from the UCI data set archive, with the description being the original description verbatim. Some feature names may have been altered, based on the description.

  1"""Space shuttle data set.
  2
  3This data set is from the UCI data set archive, with the description being the original
  4description verbatim. Some feature names may have been altered, based on the
  5description.
  6"""
  7
  8import io
  9import re
 10
 11import pandas as pd
 12
 13from .dataset import BASE_DATASET_DESCRIPTION, BaseDataset
 14
 15
 16class SpaceShuttle(BaseDataset):
 17    __doc__ = f"""
 18    The motivation for collecting this database was the explosion of the USA Space
 19    Shuttle Challenger on 28 January, 1986. An investigation ensued into the
 20    reliability of the shuttle's propulsion system. The explosion was eventually traced
 21    to the failure of one of the three field joints on one of the two solid booster
 22    rockets. Each of these six field joints includes two O-rings, designated as primary
 23    and secondary, which fail when phenomena called erosion and blowby both occur.
 24
 25    The night before the launch a decision had to be made regarding launch safety. The
 26    discussion among engineers and managers leading to this decision included concern
 27    that the probability of failure of the O-rings depended on the temperature t at
 28    launch, which was forecase to be 31 degrees F. There are strong engineering reasons
 29    based on the composition of O-rings to support the judgment that failure
 30    probability may rise monotonically as temperature drops. One other variable, the
 31    pressure s at which safety testing for field join leaks was performed, was
 32    available, but its relevance to the failure process was unclear.
 33
 34    Draper's paper includes a menacing figure graphing the number of field joints
 35    experiencing stress vs. liftoff temperature for the 23 shuttle flights previous to
 36    the Challenger disaster. No previous liftoff temperature was under 53 degrees F.
 37    Although tremendous extrapolation must be done from the given data to assess risk
 38    at 31 degrees F, it is obvious even to the layman "to foresee the unacceptably high
 39    risk created by launching at 31 degrees F." For more information, see Draper (1993)
 40    or the other previous analyses.
 41
 42    The task is to predict the number of O-rings that will experience thermal distress
 43    for a given flight when the launch temperature is below freezing.
 44
 45    {BASE_DATASET_DESCRIPTION}
 46
 47    Features:
 48        idx (int):
 49            Temporal order of flight
 50        temp (int):
 51            Launch temperature in Fahrenheit
 52        pres (int):
 53            Leak-check pressure in psi
 54        n_risky_rings (int):
 55            Number of O-rings at risk on a given flight
 56
 57    Targets:
 58        n_distressed_rings (int):
 59            Number of O-rings experiencing thermal distress
 60
 61    Source:
 62        https://archive.ics.uci.edu/ml/datasets/Challenger+USA+Space+Shuttle+O-Ring
 63
 64    Examples:
 65        Load in the data set::
 66
 67            >>> dataset = SpaceShuttle()
 68            >>> dataset.shape
 69            (23, 5)
 70
 71        Split the data set into features and targets, as NumPy arrays::
 72
 73            >>> X, y = dataset.split()
 74            >>> X.shape, y.shape
 75            ((23, 4), (23,))
 76
 77        Perform a train/test split, also outputting NumPy arrays::
 78
 79            >>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
 80            >>> X_train, X_test, y_train, y_test = train_test_split
 81            >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
 82            ((20, 4), (20,), (3, 4), (3,))
 83
 84        Output the underlying Pandas DataFrame::
 85
 86            >>> df = dataset.to_pandas()
 87            >>> type(df)
 88            <class 'pandas.core.frame.DataFrame'>
 89    """
 90
 91    _url = (
 92        "https://archive.ics.uci.edu/ml/machine-learning-databases/"
 93        "space-shuttle/o-ring-erosion-only.data"
 94    )
 95
 96    _features = range(4)
 97    _targets = [4]
 98
 99    def _prep_data(self, data: bytes) -> pd.DataFrame:
100        """Prepare the data set.
101
102        Args:
103            data (bytes): The raw data
104
105        Returns:
106            Pandas dataframe: The prepared data
107        """
108        # Collapse whitespace
109        processed_data = re.sub(r" +", " ", data.decode("utf-8"))
110
111        # Convert the bytes into a file-like object
112        csv_file = io.StringIO(processed_data)
113
114        # Load in dataframe
115        cols = ["n_risky_rings", "n_distressed_rings", "temp", "pres", "idx"]
116        df = pd.read_csv(csv_file, sep=" ", names=cols)
117
118        # Reorder columns
119        df = df[["idx", "temp", "pres", "n_risky_rings", "n_distressed_rings"]]
120
121        return df
class SpaceShuttle(doubt.datasets.dataset.BaseDataset):
 17class SpaceShuttle(BaseDataset):
 18    __doc__ = f"""
 19    The motivation for collecting this database was the explosion of the USA Space
 20    Shuttle Challenger on 28 January, 1986. An investigation ensued into the
 21    reliability of the shuttle's propulsion system. The explosion was eventually traced
 22    to the failure of one of the three field joints on one of the two solid booster
 23    rockets. Each of these six field joints includes two O-rings, designated as primary
 24    and secondary, which fail when phenomena called erosion and blowby both occur.
 25
 26    The night before the launch a decision had to be made regarding launch safety. The
 27    discussion among engineers and managers leading to this decision included concern
 28    that the probability of failure of the O-rings depended on the temperature t at
 29    launch, which was forecase to be 31 degrees F. There are strong engineering reasons
 30    based on the composition of O-rings to support the judgment that failure
 31    probability may rise monotonically as temperature drops. One other variable, the
 32    pressure s at which safety testing for field join leaks was performed, was
 33    available, but its relevance to the failure process was unclear.
 34
 35    Draper's paper includes a menacing figure graphing the number of field joints
 36    experiencing stress vs. liftoff temperature for the 23 shuttle flights previous to
 37    the Challenger disaster. No previous liftoff temperature was under 53 degrees F.
 38    Although tremendous extrapolation must be done from the given data to assess risk
 39    at 31 degrees F, it is obvious even to the layman "to foresee the unacceptably high
 40    risk created by launching at 31 degrees F." For more information, see Draper (1993)
 41    or the other previous analyses.
 42
 43    The task is to predict the number of O-rings that will experience thermal distress
 44    for a given flight when the launch temperature is below freezing.
 45
 46    {BASE_DATASET_DESCRIPTION}
 47
 48    Features:
 49        idx (int):
 50            Temporal order of flight
 51        temp (int):
 52            Launch temperature in Fahrenheit
 53        pres (int):
 54            Leak-check pressure in psi
 55        n_risky_rings (int):
 56            Number of O-rings at risk on a given flight
 57
 58    Targets:
 59        n_distressed_rings (int):
 60            Number of O-rings experiencing thermal distress
 61
 62    Source:
 63        https://archive.ics.uci.edu/ml/datasets/Challenger+USA+Space+Shuttle+O-Ring
 64
 65    Examples:
 66        Load in the data set::
 67
 68            >>> dataset = SpaceShuttle()
 69            >>> dataset.shape
 70            (23, 5)
 71
 72        Split the data set into features and targets, as NumPy arrays::
 73
 74            >>> X, y = dataset.split()
 75            >>> X.shape, y.shape
 76            ((23, 4), (23,))
 77
 78        Perform a train/test split, also outputting NumPy arrays::
 79
 80            >>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
 81            >>> X_train, X_test, y_train, y_test = train_test_split
 82            >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
 83            ((20, 4), (20,), (3, 4), (3,))
 84
 85        Output the underlying Pandas DataFrame::
 86
 87            >>> df = dataset.to_pandas()
 88            >>> type(df)
 89            <class 'pandas.core.frame.DataFrame'>
 90    """
 91
 92    _url = (
 93        "https://archive.ics.uci.edu/ml/machine-learning-databases/"
 94        "space-shuttle/o-ring-erosion-only.data"
 95    )
 96
 97    _features = range(4)
 98    _targets = [4]
 99
100    def _prep_data(self, data: bytes) -> pd.DataFrame:
101        """Prepare the data set.
102
103        Args:
104            data (bytes): The raw data
105
106        Returns:
107            Pandas dataframe: The prepared data
108        """
109        # Collapse whitespace
110        processed_data = re.sub(r" +", " ", data.decode("utf-8"))
111
112        # Convert the bytes into a file-like object
113        csv_file = io.StringIO(processed_data)
114
115        # Load in dataframe
116        cols = ["n_risky_rings", "n_distressed_rings", "temp", "pres", "idx"]
117        df = pd.read_csv(csv_file, sep=" ", names=cols)
118
119        # Reorder columns
120        df = df[["idx", "temp", "pres", "n_risky_rings", "n_distressed_rings"]]
121
122        return df

The motivation for collecting this database was the explosion of the USA Space Shuttle Challenger on 28 January, 1986. An investigation ensued into the reliability of the shuttle's propulsion system. The explosion was eventually traced to the failure of one of the three field joints on one of the two solid booster rockets. Each of these six field joints includes two O-rings, designated as primary and secondary, which fail when phenomena called erosion and blowby both occur.

The night before the launch a decision had to be made regarding launch safety. The discussion among engineers and managers leading to this decision included concern that the probability of failure of the O-rings depended on the temperature t at launch, which was forecase to be 31 degrees F. There are strong engineering reasons based on the composition of O-rings to support the judgment that failure probability may rise monotonically as temperature drops. One other variable, the pressure s at which safety testing for field join leaks was performed, was available, but its relevance to the failure process was unclear.

Draper's paper includes a menacing figure graphing the number of field joints experiencing stress vs. liftoff temperature for the 23 shuttle flights previous to the Challenger disaster. No previous liftoff temperature was under 53 degrees F. Although tremendous extrapolation must be done from the given data to assess risk at 31 degrees F, it is obvious even to the layman "to foresee the unacceptably high risk created by launching at 31 degrees F." For more information, see Draper (1993) or the other previous analyses.

The task is to predict the number of O-rings that will experience thermal distress for a given flight when the launch temperature is below freezing.

Arguments:
  • cache (str or None, optional): The name of the cache. It will be saved to cache in the current working directory. If None then no cache will be saved. Defaults to '.dataset_cache'.
Attributes:
  • cache (str or None): The name of the cache.
  • shape (tuple of integers): Dimensions of the data set
  • columns (list of strings): List of column names in the data set
Features:

idx (int): Temporal order of flight temp (int): Launch temperature in Fahrenheit pres (int): Leak-check pressure in psi n_risky_rings (int): Number of O-rings at risk on a given flight

Targets:

n_distressed_rings (int): Number of O-rings experiencing thermal distress

Source:

https://archive.ics.uci.edu/ml/datasets/Challenger+USA+Space+Shuttle+O-Ring

Examples:

Load in the data set::

>>> dataset = SpaceShuttle()
>>> dataset.shape
(23, 5)

Split the data set into features and targets, as NumPy arrays::

>>> X, y = dataset.split()
>>> X.shape, y.shape
((23, 4), (23,))

Perform a train/test split, also outputting NumPy arrays::

>>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
>>> X_train, X_test, y_train, y_test = train_test_split
>>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
((20, 4), (20,), (3, 4), (3,))

Output the underlying Pandas DataFrame::

>>> df = dataset.to_pandas()
>>> type(df)
<class 'pandas.core.frame.DataFrame'>