doubt.datasets.space_shuttle
Space shuttle data set.
This data set is from the UCI data set archive, with the description being the original description verbatim. Some feature names may have been altered, based on the description.
1"""Space shuttle data set. 2 3This data set is from the UCI data set archive, with the description being the original 4description verbatim. Some feature names may have been altered, based on the 5description. 6""" 7 8import io 9import re 10 11import pandas as pd 12 13from .dataset import BASE_DATASET_DESCRIPTION, BaseDataset 14 15 16class SpaceShuttle(BaseDataset): 17 __doc__ = f""" 18 The motivation for collecting this database was the explosion of the USA Space 19 Shuttle Challenger on 28 January, 1986. An investigation ensued into the 20 reliability of the shuttle's propulsion system. The explosion was eventually traced 21 to the failure of one of the three field joints on one of the two solid booster 22 rockets. Each of these six field joints includes two O-rings, designated as primary 23 and secondary, which fail when phenomena called erosion and blowby both occur. 24 25 The night before the launch a decision had to be made regarding launch safety. The 26 discussion among engineers and managers leading to this decision included concern 27 that the probability of failure of the O-rings depended on the temperature t at 28 launch, which was forecase to be 31 degrees F. There are strong engineering reasons 29 based on the composition of O-rings to support the judgment that failure 30 probability may rise monotonically as temperature drops. One other variable, the 31 pressure s at which safety testing for field join leaks was performed, was 32 available, but its relevance to the failure process was unclear. 33 34 Draper's paper includes a menacing figure graphing the number of field joints 35 experiencing stress vs. liftoff temperature for the 23 shuttle flights previous to 36 the Challenger disaster. No previous liftoff temperature was under 53 degrees F. 37 Although tremendous extrapolation must be done from the given data to assess risk 38 at 31 degrees F, it is obvious even to the layman "to foresee the unacceptably high 39 risk created by launching at 31 degrees F." For more information, see Draper (1993) 40 or the other previous analyses. 41 42 The task is to predict the number of O-rings that will experience thermal distress 43 for a given flight when the launch temperature is below freezing. 44 45 {BASE_DATASET_DESCRIPTION} 46 47 Features: 48 idx (int): 49 Temporal order of flight 50 temp (int): 51 Launch temperature in Fahrenheit 52 pres (int): 53 Leak-check pressure in psi 54 n_risky_rings (int): 55 Number of O-rings at risk on a given flight 56 57 Targets: 58 n_distressed_rings (int): 59 Number of O-rings experiencing thermal distress 60 61 Source: 62 https://archive.ics.uci.edu/ml/datasets/Challenger+USA+Space+Shuttle+O-Ring 63 64 Examples: 65 Load in the data set:: 66 67 >>> dataset = SpaceShuttle() 68 >>> dataset.shape 69 (23, 5) 70 71 Split the data set into features and targets, as NumPy arrays:: 72 73 >>> X, y = dataset.split() 74 >>> X.shape, y.shape 75 ((23, 4), (23,)) 76 77 Perform a train/test split, also outputting NumPy arrays:: 78 79 >>> train_test_split = dataset.split(test_size=0.2, random_seed=42) 80 >>> X_train, X_test, y_train, y_test = train_test_split 81 >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape 82 ((20, 4), (20,), (3, 4), (3,)) 83 84 Output the underlying Pandas DataFrame:: 85 86 >>> df = dataset.to_pandas() 87 >>> type(df) 88 <class 'pandas.core.frame.DataFrame'> 89 """ 90 91 _url = ( 92 "https://archive.ics.uci.edu/ml/machine-learning-databases/" 93 "space-shuttle/o-ring-erosion-only.data" 94 ) 95 96 _features = range(4) 97 _targets = [4] 98 99 def _prep_data(self, data: bytes) -> pd.DataFrame: 100 """Prepare the data set. 101 102 Args: 103 data (bytes): The raw data 104 105 Returns: 106 Pandas dataframe: The prepared data 107 """ 108 # Collapse whitespace 109 processed_data = re.sub(r" +", " ", data.decode("utf-8")) 110 111 # Convert the bytes into a file-like object 112 csv_file = io.StringIO(processed_data) 113 114 # Load in dataframe 115 cols = ["n_risky_rings", "n_distressed_rings", "temp", "pres", "idx"] 116 df = pd.read_csv(csv_file, sep=" ", names=cols) 117 118 # Reorder columns 119 df = df[["idx", "temp", "pres", "n_risky_rings", "n_distressed_rings"]] 120 121 return df
17class SpaceShuttle(BaseDataset): 18 __doc__ = f""" 19 The motivation for collecting this database was the explosion of the USA Space 20 Shuttle Challenger on 28 January, 1986. An investigation ensued into the 21 reliability of the shuttle's propulsion system. The explosion was eventually traced 22 to the failure of one of the three field joints on one of the two solid booster 23 rockets. Each of these six field joints includes two O-rings, designated as primary 24 and secondary, which fail when phenomena called erosion and blowby both occur. 25 26 The night before the launch a decision had to be made regarding launch safety. The 27 discussion among engineers and managers leading to this decision included concern 28 that the probability of failure of the O-rings depended on the temperature t at 29 launch, which was forecase to be 31 degrees F. There are strong engineering reasons 30 based on the composition of O-rings to support the judgment that failure 31 probability may rise monotonically as temperature drops. One other variable, the 32 pressure s at which safety testing for field join leaks was performed, was 33 available, but its relevance to the failure process was unclear. 34 35 Draper's paper includes a menacing figure graphing the number of field joints 36 experiencing stress vs. liftoff temperature for the 23 shuttle flights previous to 37 the Challenger disaster. No previous liftoff temperature was under 53 degrees F. 38 Although tremendous extrapolation must be done from the given data to assess risk 39 at 31 degrees F, it is obvious even to the layman "to foresee the unacceptably high 40 risk created by launching at 31 degrees F." For more information, see Draper (1993) 41 or the other previous analyses. 42 43 The task is to predict the number of O-rings that will experience thermal distress 44 for a given flight when the launch temperature is below freezing. 45 46 {BASE_DATASET_DESCRIPTION} 47 48 Features: 49 idx (int): 50 Temporal order of flight 51 temp (int): 52 Launch temperature in Fahrenheit 53 pres (int): 54 Leak-check pressure in psi 55 n_risky_rings (int): 56 Number of O-rings at risk on a given flight 57 58 Targets: 59 n_distressed_rings (int): 60 Number of O-rings experiencing thermal distress 61 62 Source: 63 https://archive.ics.uci.edu/ml/datasets/Challenger+USA+Space+Shuttle+O-Ring 64 65 Examples: 66 Load in the data set:: 67 68 >>> dataset = SpaceShuttle() 69 >>> dataset.shape 70 (23, 5) 71 72 Split the data set into features and targets, as NumPy arrays:: 73 74 >>> X, y = dataset.split() 75 >>> X.shape, y.shape 76 ((23, 4), (23,)) 77 78 Perform a train/test split, also outputting NumPy arrays:: 79 80 >>> train_test_split = dataset.split(test_size=0.2, random_seed=42) 81 >>> X_train, X_test, y_train, y_test = train_test_split 82 >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape 83 ((20, 4), (20,), (3, 4), (3,)) 84 85 Output the underlying Pandas DataFrame:: 86 87 >>> df = dataset.to_pandas() 88 >>> type(df) 89 <class 'pandas.core.frame.DataFrame'> 90 """ 91 92 _url = ( 93 "https://archive.ics.uci.edu/ml/machine-learning-databases/" 94 "space-shuttle/o-ring-erosion-only.data" 95 ) 96 97 _features = range(4) 98 _targets = [4] 99 100 def _prep_data(self, data: bytes) -> pd.DataFrame: 101 """Prepare the data set. 102 103 Args: 104 data (bytes): The raw data 105 106 Returns: 107 Pandas dataframe: The prepared data 108 """ 109 # Collapse whitespace 110 processed_data = re.sub(r" +", " ", data.decode("utf-8")) 111 112 # Convert the bytes into a file-like object 113 csv_file = io.StringIO(processed_data) 114 115 # Load in dataframe 116 cols = ["n_risky_rings", "n_distressed_rings", "temp", "pres", "idx"] 117 df = pd.read_csv(csv_file, sep=" ", names=cols) 118 119 # Reorder columns 120 df = df[["idx", "temp", "pres", "n_risky_rings", "n_distressed_rings"]] 121 122 return df
The motivation for collecting this database was the explosion of the USA Space Shuttle Challenger on 28 January, 1986. An investigation ensued into the reliability of the shuttle's propulsion system. The explosion was eventually traced to the failure of one of the three field joints on one of the two solid booster rockets. Each of these six field joints includes two O-rings, designated as primary and secondary, which fail when phenomena called erosion and blowby both occur.
The night before the launch a decision had to be made regarding launch safety. The discussion among engineers and managers leading to this decision included concern that the probability of failure of the O-rings depended on the temperature t at launch, which was forecase to be 31 degrees F. There are strong engineering reasons based on the composition of O-rings to support the judgment that failure probability may rise monotonically as temperature drops. One other variable, the pressure s at which safety testing for field join leaks was performed, was available, but its relevance to the failure process was unclear.
Draper's paper includes a menacing figure graphing the number of field joints experiencing stress vs. liftoff temperature for the 23 shuttle flights previous to the Challenger disaster. No previous liftoff temperature was under 53 degrees F. Although tremendous extrapolation must be done from the given data to assess risk at 31 degrees F, it is obvious even to the layman "to foresee the unacceptably high risk created by launching at 31 degrees F." For more information, see Draper (1993) or the other previous analyses.
The task is to predict the number of O-rings that will experience thermal distress for a given flight when the launch temperature is below freezing.
Arguments:
- cache (str or None, optional): The name of the cache. It will be saved to
cache
in the current working directory. If None then no cache will be saved. Defaults to '.dataset_cache'.
Attributes:
- cache (str or None): The name of the cache.
- shape (tuple of integers): Dimensions of the data set
- columns (list of strings): List of column names in the data set
Features:
idx (int): Temporal order of flight temp (int): Launch temperature in Fahrenheit pres (int): Leak-check pressure in psi n_risky_rings (int): Number of O-rings at risk on a given flight
Targets:
n_distressed_rings (int): Number of O-rings experiencing thermal distress
Source:
https://archive.ics.uci.edu/ml/datasets/Challenger+USA+Space+Shuttle+O-Ring
Examples:
Load in the data set::
>>> dataset = SpaceShuttle() >>> dataset.shape (23, 5)
Split the data set into features and targets, as NumPy arrays::
>>> X, y = dataset.split() >>> X.shape, y.shape ((23, 4), (23,))
Perform a train/test split, also outputting NumPy arrays::
>>> train_test_split = dataset.split(test_size=0.2, random_seed=42) >>> X_train, X_test, y_train, y_test = train_test_split >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape ((20, 4), (20,), (3, 4), (3,))
Output the underlying Pandas DataFrame::
>>> df = dataset.to_pandas() >>> type(df) <class 'pandas.core.frame.DataFrame'>