doubt.datasets.bike_sharing_hourly
Hourly bike sharing data set.
This data set is from the UCI data set archive, with the description being the original description verbatim. Some feature names may have been altered, based on the description.
1"""Hourly bike sharing data set. 2 3This data set is from the UCI data set archive, with the description being the original 4description verbatim. Some feature names may have been altered, based on the 5description. 6""" 7 8import io 9import zipfile 10 11import pandas as pd 12 13from .dataset import BASE_DATASET_DESCRIPTION, BaseDataset 14 15 16class BikeSharingHourly(BaseDataset): 17 __doc__ = f""" 18 Bike sharing systems are new generation of traditional bike rentals where whole 19 process from membership, rental and return back has become automatic. Through these 20 systems, user is able to easily rent a bike from a particular position and return 21 back at another position. Currently, there are about over 500 bike-sharing programs 22 around the world which is composed of over 500 thousands bicycles. Today, there 23 exists great interest in these systems due to their important role in traffic, 24 environmental and health issues. 25 26 Apart from interesting real world applications of bike sharing systems, the 27 characteristics of data being generated by these systems make them attractive for 28 the research. Opposed to other transport services such as bus or subway, the 29 duration of travel, departure and arrival position is explicitly recorded in these 30 systems. This feature turns bike sharing system into a virtual sensor network that 31 can be used for sensing mobility in the city. Hence, it is expected that most of 32 important events in the city could be detected via monitoring these data. 33 34 {BASE_DATASET_DESCRIPTION} 35 36 Features: 37 instant (int): 38 Record index 39 season (int): 40 The season, with 1 = winter, 2 = spring, 3 = summer and 4 = autumn 41 yr (int): 42 The year, with 0 = 2011 and 1 = 2012 43 mnth (int): 44 The month, from 1 to 12 inclusive 45 hr (int): 46 The hour of the day, from 0 to 23 inclusive 47 holiday (int): 48 Whether day is a holiday or not, binary valued 49 weekday (int): 50 The day of the week, from 0 to 6 inclusive 51 workingday (int): 52 Working day, 1 if day is neither weekend nor holiday, otherwise 0 53 weathersit (int): 54 Weather, encoded as 55 56 1. Clear, few clouds, partly cloudy 57 2. Mist and cloudy, mist and broken clouds, mist and few clouds 58 3. Light snow, light rain and thunderstorm and scattered clouds, light rain 59 and scattered clouds 60 4. Heavy rain and ice pallets and thunderstorm and mist, or snow and fog 61 62 temp (float): 63 Max-min normalised temperature in Celsius, from -8 to +39 64 atemp (float): 65 Max-min normalised feeling temperature in Celsius, from -16 to +50 66 hum (float): 67 Scaled max-min normalised humidity, from 0 to 1 68 windspeed (float): 69 Scaled max-min normalised wind speed, from 0 to 1 70 71 Targets: 72 casual (int): 73 Count of casual users 74 registered (int): 75 Count of registered users 76 cnt (int): 77 Sum of casual and registered users 78 79 Source: 80 https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset 81 82 Examples: 83 Load in the data set:: 84 85 >>> dataset = BikeSharingHourly() 86 >>> dataset.shape 87 (17379, 16) 88 89 Split the data set into features and targets, as NumPy arrays:: 90 91 >>> X, y = dataset.split() 92 >>> X.shape, y.shape 93 ((17379, 13), (17379, 3)) 94 95 Perform a train/test split, also outputting NumPy arrays:: 96 97 >>> train_test_split = dataset.split(test_size=0.2, random_seed=42) 98 >>> X_train, X_test, y_train, y_test = train_test_split 99 >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape 100 ((13873, 13), (13873, 3), (3506, 13), (3506, 3)) 101 102 Output the underlying Pandas DataFrame:: 103 104 >>> df = dataset.to_pandas() 105 >>> type(df) 106 <class 'pandas.core.frame.DataFrame'> 107 """ 108 109 _url = ( 110 "https://archive.ics.uci.edu/ml/machine-learning-databases/" 111 "00275/Bike-Sharing-Dataset.zip" 112 ) 113 114 _features = range(13) 115 _targets = [13, 14, 15] 116 117 def _prep_data(self, data: bytes) -> pd.DataFrame: 118 """Prepare the data set. 119 120 Args: 121 data (bytes): The raw data 122 123 Returns: 124 Pandas dataframe: The prepared data 125 """ 126 # Convert the bytes into a file-like object 127 buffer = io.BytesIO(data) 128 129 # Unzip the file and pull out hour.csv as a string 130 with zipfile.ZipFile(buffer, "r") as zip_file: 131 csv = zip_file.read("hour.csv").decode("utf-8") 132 133 # Convert the string into a file-like object 134 csv_file = io.StringIO(csv) 135 136 # Read the file-like object into a dataframe 137 cols = [0] + list(range(2, 17)) 138 df = pd.read_csv(csv_file, usecols=cols) 139 return df
17class BikeSharingHourly(BaseDataset): 18 __doc__ = f""" 19 Bike sharing systems are new generation of traditional bike rentals where whole 20 process from membership, rental and return back has become automatic. Through these 21 systems, user is able to easily rent a bike from a particular position and return 22 back at another position. Currently, there are about over 500 bike-sharing programs 23 around the world which is composed of over 500 thousands bicycles. Today, there 24 exists great interest in these systems due to their important role in traffic, 25 environmental and health issues. 26 27 Apart from interesting real world applications of bike sharing systems, the 28 characteristics of data being generated by these systems make them attractive for 29 the research. Opposed to other transport services such as bus or subway, the 30 duration of travel, departure and arrival position is explicitly recorded in these 31 systems. This feature turns bike sharing system into a virtual sensor network that 32 can be used for sensing mobility in the city. Hence, it is expected that most of 33 important events in the city could be detected via monitoring these data. 34 35 {BASE_DATASET_DESCRIPTION} 36 37 Features: 38 instant (int): 39 Record index 40 season (int): 41 The season, with 1 = winter, 2 = spring, 3 = summer and 4 = autumn 42 yr (int): 43 The year, with 0 = 2011 and 1 = 2012 44 mnth (int): 45 The month, from 1 to 12 inclusive 46 hr (int): 47 The hour of the day, from 0 to 23 inclusive 48 holiday (int): 49 Whether day is a holiday or not, binary valued 50 weekday (int): 51 The day of the week, from 0 to 6 inclusive 52 workingday (int): 53 Working day, 1 if day is neither weekend nor holiday, otherwise 0 54 weathersit (int): 55 Weather, encoded as 56 57 1. Clear, few clouds, partly cloudy 58 2. Mist and cloudy, mist and broken clouds, mist and few clouds 59 3. Light snow, light rain and thunderstorm and scattered clouds, light rain 60 and scattered clouds 61 4. Heavy rain and ice pallets and thunderstorm and mist, or snow and fog 62 63 temp (float): 64 Max-min normalised temperature in Celsius, from -8 to +39 65 atemp (float): 66 Max-min normalised feeling temperature in Celsius, from -16 to +50 67 hum (float): 68 Scaled max-min normalised humidity, from 0 to 1 69 windspeed (float): 70 Scaled max-min normalised wind speed, from 0 to 1 71 72 Targets: 73 casual (int): 74 Count of casual users 75 registered (int): 76 Count of registered users 77 cnt (int): 78 Sum of casual and registered users 79 80 Source: 81 https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset 82 83 Examples: 84 Load in the data set:: 85 86 >>> dataset = BikeSharingHourly() 87 >>> dataset.shape 88 (17379, 16) 89 90 Split the data set into features and targets, as NumPy arrays:: 91 92 >>> X, y = dataset.split() 93 >>> X.shape, y.shape 94 ((17379, 13), (17379, 3)) 95 96 Perform a train/test split, also outputting NumPy arrays:: 97 98 >>> train_test_split = dataset.split(test_size=0.2, random_seed=42) 99 >>> X_train, X_test, y_train, y_test = train_test_split 100 >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape 101 ((13873, 13), (13873, 3), (3506, 13), (3506, 3)) 102 103 Output the underlying Pandas DataFrame:: 104 105 >>> df = dataset.to_pandas() 106 >>> type(df) 107 <class 'pandas.core.frame.DataFrame'> 108 """ 109 110 _url = ( 111 "https://archive.ics.uci.edu/ml/machine-learning-databases/" 112 "00275/Bike-Sharing-Dataset.zip" 113 ) 114 115 _features = range(13) 116 _targets = [13, 14, 15] 117 118 def _prep_data(self, data: bytes) -> pd.DataFrame: 119 """Prepare the data set. 120 121 Args: 122 data (bytes): The raw data 123 124 Returns: 125 Pandas dataframe: The prepared data 126 """ 127 # Convert the bytes into a file-like object 128 buffer = io.BytesIO(data) 129 130 # Unzip the file and pull out hour.csv as a string 131 with zipfile.ZipFile(buffer, "r") as zip_file: 132 csv = zip_file.read("hour.csv").decode("utf-8") 133 134 # Convert the string into a file-like object 135 csv_file = io.StringIO(csv) 136 137 # Read the file-like object into a dataframe 138 cols = [0] + list(range(2, 17)) 139 df = pd.read_csv(csv_file, usecols=cols) 140 return df
Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.
Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.
Arguments:
- cache (str or None, optional): The name of the cache. It will be saved to
cache
in the current working directory. If None then no cache will be saved. Defaults to '.dataset_cache'.
Attributes:
- cache (str or None): The name of the cache.
- shape (tuple of integers): Dimensions of the data set
- columns (list of strings): List of column names in the data set
Features:
instant (int): Record index season (int): The season, with 1 = winter, 2 = spring, 3 = summer and 4 = autumn yr (int): The year, with 0 = 2011 and 1 = 2012 mnth (int): The month, from 1 to 12 inclusive hr (int): The hour of the day, from 0 to 23 inclusive holiday (int): Whether day is a holiday or not, binary valued weekday (int): The day of the week, from 0 to 6 inclusive workingday (int): Working day, 1 if day is neither weekend nor holiday, otherwise 0 weathersit (int): Weather, encoded as
1. Clear, few clouds, partly cloudy 2. Mist and cloudy, mist and broken clouds, mist and few clouds 3. Light snow, light rain and thunderstorm and scattered clouds, light rain and scattered clouds 4. Heavy rain and ice pallets and thunderstorm and mist, or snow and fog
temp (float): Max-min normalised temperature in Celsius, from -8 to +39 atemp (float): Max-min normalised feeling temperature in Celsius, from -16 to +50 hum (float): Scaled max-min normalised humidity, from 0 to 1 windspeed (float): Scaled max-min normalised wind speed, from 0 to 1
Targets:
casual (int): Count of casual users registered (int): Count of registered users cnt (int): Sum of casual and registered users
Source:
https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
Examples:
Load in the data set::
>>> dataset = BikeSharingHourly() >>> dataset.shape (17379, 16)
Split the data set into features and targets, as NumPy arrays::
>>> X, y = dataset.split() >>> X.shape, y.shape ((17379, 13), (17379, 3))
Perform a train/test split, also outputting NumPy arrays::
>>> train_test_split = dataset.split(test_size=0.2, random_seed=42) >>> X_train, X_test, y_train, y_test = train_test_split >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape ((13873, 13), (13873, 3), (3506, 13), (3506, 3))
Output the underlying Pandas DataFrame::
>>> df = dataset.to_pandas() >>> type(df) <class 'pandas.core.frame.DataFrame'>