doubt.datasets.bike_sharing_daily
Daily bike sharing data set.
This data set is from the UCI data set archive, with the description being the original description verbatim. Some feature names may have been altered, based on the description.
1"""Daily bike sharing data set. 2 3This data set is from the UCI data set archive, with the description being the original 4description verbatim. Some feature names may have been altered, based on the 5description. 6""" 7 8import io 9import zipfile 10 11import pandas as pd 12 13from .dataset import BASE_DATASET_DESCRIPTION, BaseDataset 14 15 16class BikeSharingDaily(BaseDataset): 17 __doc__ = f""" 18 Bike sharing systems are new generation of traditional bike rentals where whole 19 process from membership, rental and return back has become automatic. Through these 20 systems, user is able to easily rent a bike from a particular position and return 21 back at another position. Currently, there are about over 500 bike-sharing programs 22 around the world which is composed of over 500 thousands bicycles. Today, there 23 exists great interest in these systems due to their important role in traffic, 24 environmental and health issues. 25 26 Apart from interesting real world applications of bike sharing systems, the 27 characteristics of data being generated by these systems make them attractive for 28 the research. Opposed to other transport services such as bus or subway, the 29 duration of travel, departure and arrival position is explicitly recorded in these 30 systems. This feature turns bike sharing system into a virtual sensor network that 31 can be used for sensing mobility in the city. Hence, it is expected that most of 32 important events in the city could be detected via monitoring these data. 33 34 {BASE_DATASET_DESCRIPTION} 35 36 Features: 37 instant (int): 38 Record index 39 season (int): 40 The season, with 1 = winter, 2 = spring, 3 = summer and 4 = autumn 41 yr (int): 42 The year, with 0 = 2011 and 1 = 2012 43 mnth (int): 44 The month, from 1 to 12 inclusive 45 holiday (int): 46 Whether day is a holiday or not, binary valued 47 weekday (int): 48 The day of the week, from 0 to 6 inclusive 49 workingday (int): 50 Working day, 1 if day is neither weekend nor holiday, otherwise 0 51 weathersit (int): 52 Weather, encoded as 53 54 1. Clear, few clouds, partly cloudy 55 2. Mist and cloudy, mist and broken clouds, mist and few clouds 56 3. Light snow, light rain and thunderstorm and scattered clouds, light rain 57 and scattered clouds 58 4. Heavy rain and ice pallets and thunderstorm and mist, or snow and fog 59 temp (float): 60 Max-min normalised temperature in Celsius, from -8 to +39 61 atemp (float): 62 Max-min normalised feeling temperature in Celsius, from -16 to +50 63 hum (float): 64 Scaled max-min normalised humidity, from 0 to 1 65 windspeed (float): 66 Scaled max-min normalised wind speed, from 0 to 1 67 68 Targets: 69 casual (int): 70 Count of casual users 71 registered (int): 72 Count of registered users 73 cnt (int): 74 Sum of casual and registered users 75 76 Source: 77 https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset 78 79 Examples: 80 Load in the data set:: 81 82 >>> dataset = BikeSharingDaily() 83 >>> dataset.shape 84 (731, 15) 85 86 Split the data set into features and targets, as NumPy arrays:: 87 88 >>> X, y = dataset.split() 89 >>> X.shape, y.shape 90 ((731, 12), (731, 3)) 91 92 Perform a train/test split, also outputting NumPy arrays:: 93 94 >>> train_test_split = dataset.split(test_size=0.2, random_seed=42) 95 >>> X_train, X_test, y_train, y_test = train_test_split 96 >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape 97 ((574, 12), (574, 3), (157, 12), (157, 3)) 98 99 Output the underlying Pandas DataFrame:: 100 101 >>> df = dataset.to_pandas() 102 >>> type(df) 103 <class 'pandas.core.frame.DataFrame'> 104 """ 105 106 _url = ( 107 "https://archive.ics.uci.edu/ml/machine-learning-databases/" 108 "00275/Bike-Sharing-Dataset.zip" 109 ) 110 111 _features = range(12) 112 _targets = [12, 13, 14] 113 114 def _prep_data(self, data: bytes) -> pd.DataFrame: 115 """Prepare the data set. 116 117 Args: 118 data (bytes): The raw data 119 120 Returns: 121 Pandas dataframe: The prepared data 122 """ 123 # Convert the bytes into a file-like object 124 buffer = io.BytesIO(data) 125 126 # Unzip the file and pull out day.csv as a string 127 with zipfile.ZipFile(buffer, "r") as zip_file: 128 csv = zip_file.read("day.csv").decode("utf-8") 129 130 # Convert the string into a file-like object 131 csv_file = io.StringIO(csv) 132 133 # Read the file-like object into a dataframe 134 cols = [0] + list(range(2, 16)) 135 df = pd.read_csv(csv_file, usecols=cols) 136 return df
17class BikeSharingDaily(BaseDataset): 18 __doc__ = f""" 19 Bike sharing systems are new generation of traditional bike rentals where whole 20 process from membership, rental and return back has become automatic. Through these 21 systems, user is able to easily rent a bike from a particular position and return 22 back at another position. Currently, there are about over 500 bike-sharing programs 23 around the world which is composed of over 500 thousands bicycles. Today, there 24 exists great interest in these systems due to their important role in traffic, 25 environmental and health issues. 26 27 Apart from interesting real world applications of bike sharing systems, the 28 characteristics of data being generated by these systems make them attractive for 29 the research. Opposed to other transport services such as bus or subway, the 30 duration of travel, departure and arrival position is explicitly recorded in these 31 systems. This feature turns bike sharing system into a virtual sensor network that 32 can be used for sensing mobility in the city. Hence, it is expected that most of 33 important events in the city could be detected via monitoring these data. 34 35 {BASE_DATASET_DESCRIPTION} 36 37 Features: 38 instant (int): 39 Record index 40 season (int): 41 The season, with 1 = winter, 2 = spring, 3 = summer and 4 = autumn 42 yr (int): 43 The year, with 0 = 2011 and 1 = 2012 44 mnth (int): 45 The month, from 1 to 12 inclusive 46 holiday (int): 47 Whether day is a holiday or not, binary valued 48 weekday (int): 49 The day of the week, from 0 to 6 inclusive 50 workingday (int): 51 Working day, 1 if day is neither weekend nor holiday, otherwise 0 52 weathersit (int): 53 Weather, encoded as 54 55 1. Clear, few clouds, partly cloudy 56 2. Mist and cloudy, mist and broken clouds, mist and few clouds 57 3. Light snow, light rain and thunderstorm and scattered clouds, light rain 58 and scattered clouds 59 4. Heavy rain and ice pallets and thunderstorm and mist, or snow and fog 60 temp (float): 61 Max-min normalised temperature in Celsius, from -8 to +39 62 atemp (float): 63 Max-min normalised feeling temperature in Celsius, from -16 to +50 64 hum (float): 65 Scaled max-min normalised humidity, from 0 to 1 66 windspeed (float): 67 Scaled max-min normalised wind speed, from 0 to 1 68 69 Targets: 70 casual (int): 71 Count of casual users 72 registered (int): 73 Count of registered users 74 cnt (int): 75 Sum of casual and registered users 76 77 Source: 78 https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset 79 80 Examples: 81 Load in the data set:: 82 83 >>> dataset = BikeSharingDaily() 84 >>> dataset.shape 85 (731, 15) 86 87 Split the data set into features and targets, as NumPy arrays:: 88 89 >>> X, y = dataset.split() 90 >>> X.shape, y.shape 91 ((731, 12), (731, 3)) 92 93 Perform a train/test split, also outputting NumPy arrays:: 94 95 >>> train_test_split = dataset.split(test_size=0.2, random_seed=42) 96 >>> X_train, X_test, y_train, y_test = train_test_split 97 >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape 98 ((574, 12), (574, 3), (157, 12), (157, 3)) 99 100 Output the underlying Pandas DataFrame:: 101 102 >>> df = dataset.to_pandas() 103 >>> type(df) 104 <class 'pandas.core.frame.DataFrame'> 105 """ 106 107 _url = ( 108 "https://archive.ics.uci.edu/ml/machine-learning-databases/" 109 "00275/Bike-Sharing-Dataset.zip" 110 ) 111 112 _features = range(12) 113 _targets = [12, 13, 14] 114 115 def _prep_data(self, data: bytes) -> pd.DataFrame: 116 """Prepare the data set. 117 118 Args: 119 data (bytes): The raw data 120 121 Returns: 122 Pandas dataframe: The prepared data 123 """ 124 # Convert the bytes into a file-like object 125 buffer = io.BytesIO(data) 126 127 # Unzip the file and pull out day.csv as a string 128 with zipfile.ZipFile(buffer, "r") as zip_file: 129 csv = zip_file.read("day.csv").decode("utf-8") 130 131 # Convert the string into a file-like object 132 csv_file = io.StringIO(csv) 133 134 # Read the file-like object into a dataframe 135 cols = [0] + list(range(2, 16)) 136 df = pd.read_csv(csv_file, usecols=cols) 137 return df
Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.
Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.
Arguments:
- cache (str or None, optional): The name of the cache. It will be saved to
cache
in the current working directory. If None then no cache will be saved. Defaults to '.dataset_cache'.
Attributes:
- cache (str or None): The name of the cache.
- shape (tuple of integers): Dimensions of the data set
- columns (list of strings): List of column names in the data set
Features:
instant (int): Record index season (int): The season, with 1 = winter, 2 = spring, 3 = summer and 4 = autumn yr (int): The year, with 0 = 2011 and 1 = 2012 mnth (int): The month, from 1 to 12 inclusive holiday (int): Whether day is a holiday or not, binary valued weekday (int): The day of the week, from 0 to 6 inclusive workingday (int): Working day, 1 if day is neither weekend nor holiday, otherwise 0 weathersit (int): Weather, encoded as
1. Clear, few clouds, partly cloudy 2. Mist and cloudy, mist and broken clouds, mist and few clouds 3. Light snow, light rain and thunderstorm and scattered clouds, light rain and scattered clouds 4. Heavy rain and ice pallets and thunderstorm and mist, or snow and fog
temp (float): Max-min normalised temperature in Celsius, from -8 to +39 atemp (float): Max-min normalised feeling temperature in Celsius, from -16 to +50 hum (float): Scaled max-min normalised humidity, from 0 to 1 windspeed (float): Scaled max-min normalised wind speed, from 0 to 1
Targets:
casual (int): Count of casual users registered (int): Count of registered users cnt (int): Sum of casual and registered users
Source:
https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
Examples:
Load in the data set::
>>> dataset = BikeSharingDaily() >>> dataset.shape (731, 15)
Split the data set into features and targets, as NumPy arrays::
>>> X, y = dataset.split() >>> X.shape, y.shape ((731, 12), (731, 3))
Perform a train/test split, also outputting NumPy arrays::
>>> train_test_split = dataset.split(test_size=0.2, random_seed=42) >>> X_train, X_test, y_train, y_test = train_test_split >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape ((574, 12), (574, 3), (157, 12), (157, 3))
Output the underlying Pandas DataFrame::
>>> df = dataset.to_pandas() >>> type(df) <class 'pandas.core.frame.DataFrame'>