Cross-validation and the bootstrap are two commonly used methods for partitioning model estimation

A study of Bicing bike sharing user activity in Barcelona found that the average difference in elevation between origin and destination stations for e-bike sharing trips was +6.21 meters, compared to -3.11 meters for conventional bike sharing trips . Although bike sharing offers the opportunity to expand cycling mode share, the evidence from traditional bike sharing ridership suggests that bike sharing users are not socio-demographically representative of the broader population in areas they operate. Existing studies of station-based bike sharing in North America have shown that bike sharing use is strongly correlated with certain user characteristics such as: gender, age, and race. Station-based bike sharing users tend to be younger and upper-to-middle income, with higher levels of educataional attainment than the general population . Station citing has been found to reflect the socio-demographic inbalances in bike sharing ridership, with one study of 42 U.S. bike sharing system reporting that the 60 percent of census tracts with greatest economic hardship contained less than 25 percent of bike sharing stations . Moreover, bike sharing station activity increases in locations with higher percentages of white residents and decreases in relation to older populations . A growing emphasis on transportation equity, particularly with respect to emerging mobiliy services, has motivated many agencies to incorporate equity-focused provisions in their shared micromobility programs . Common approaches to promote equity across station-based bike sharing systems have included offering discounted annual memberships to low income riders, citing stations based on equity reasons,commercial greenhouse supplies providing payment plan options and assistance in obtaining bank accounts, credit, and/or debit cards in order to lower access barriers to bike sharing .

Many cities have required that shared micromobility operators provide such options as a condition for obtaining an operating permit. However, additional barriers to shared micromobility use remain unaddressed. Shaheen et al. introduced the STEPS to Transportation Equity framework to evaluate transportation equity by recognizing the opportunities and limitations of Spatial, Temporal, Economic, Physiological, and Social elements . The STEPS framework can be used to evaluate whether a shared mobility system provides equitable transportation services by identifying specific barriers and opportunities within each category. In particular, spatial factors such as steep terrain and low population density may constrain bike sharing use in certain cities with these characteristics. Temporal factors, which pertain to travel time considerations of travel, may be an issue in cities where shared micromobility demand is unbalanced during peak hours, generating concerns about the reliability of available vehicles. Economic factors include both direct costs and indirect costs that may create hardship for particular groups of travelers. Physiological factors may have posed a serious limitation to bike sharing use that is reflected in the age distribution of riders, though there may be an opportunity to expand shared micromobility use for older and less physically active individuals through electric bike sharing and scooter sharing. Finally, social factors encompass social, cultural, safety, and language barriers that may inhibit an individual’s use of a particular service. Our study consists of three major analytical components: 1) a comparative analysis of bike sharing travel behavior, 2) a discrete choice analysis using a destination choice model, and 3) a geospatial suitability analysis based on the STEPS framework using the DCA coefficients. To inform our analysis, we employed two datasets from February 2018 of Ford GoBike and JUMP, composed of 77,841 docked, conventional pedal bike sharing trips and 24,270 dockless e-bike sharing trips that occurred in San Francisco. We note that February 2018 in San Francisco was slightly warmer than average and relatively dry, with 10 mm of precipitation compared to an average of 112 mm .

The high temperature and low precipitation may have resulted in greater observed ridership than would be expected during this time of year . The trip-level data include trip duration and start and end times. The origin and destination of a trip are docking stations for GoBike and census blocks in which the trip started and ended for JUMP. The age and membership status of GoBike users are also included for each trip. The datasets do not include further information regarding user identification, user characteristics, or the trajectories taken for each trip. Our analysis is thus constrained to the revealed preferences of unidentified, unlinked bike sharing users. Rather than perform a traditional discrete choice model in which individuals’ preferences for specific alternatives among a finite set of choices are modeled, we implemented a destination choice model . We modeled the decision to travel to a particular destination given that a trip originating in a particular location is made using a particular bike sharing service. We supplemented the trip-level data with: tract-level population, job count, employment rate, age, income, and gender distributions from the U.S. Census . From Open Street Map, we used the locations of bike lanes and public bike racks to determine the density of these facilities in each census tract in San Francisco . Finally, we queried the Google Directions and Elevations Application Programming Interfaces for estimates of travel distance, duration, and elevation gain along suggested bike routes for each bike sharing trip . Queries to the Google Directions API used the latitude and longitude of specified trip OD pairs to generate a suggested route that provide a path, estimated travel time, and distance for each query. These paths were then used to query the Google Elevations API for elevation samples at 100 meter intervals, which were used to estimate the total elevation gain of each trip. All unique OD pairs in the activity dataset were used in this querying process, as well as OD pairs for all alternative trips used in the DCA. Alternative GoBike trips included all possible OD pairs starting and ending at a GoBike station in San Francisco, while alternative JUMP trips were generated as the set of all actual origins of JUMP trips paired with the centroid of every census tract in San Francisco. We applied the results of the destination choice model and the STEPS framework in a suitability analysis, which is a geographic information system -based method for determining the ability of a system to meet a user’s needs . In our analysis, we examined the geospatial distribution of bike sharing suitability in San Francisco.

In the following sections, we detail the steps taken to process data, specify a destination choice model, and apply the model and the STEPS framework in a suitability analysis. In this study, observed bike sharing trip destinations are modeled as choices among a discrete set of alternative destinations. Although techniques exist to estimate continuous models , neither the GoBike nor JUMP datasets entail location data on a continuous scale. The GoBike OD locations are constrained to the discrete locations where GoBike stations exist, while the JUMP OD locations are classified by the census block in which the trip started or ended for the purpose of privacy protection. With such discrete spatial data, we took the approach of aggregating trip OD pairs to the census tract level for two reasons to: 1) avoid high correlation between very close OD pairs and 2) simplify the model analysis. Aggregating the data by census tracts also allows for the inclusion of additional attributes to the model such as: demographics, employment rate, job density, and population density,cannabis dry rack all of which can be measured at the census tract level. With aggregation of the data to the census tract level, we note a major limitation in the computability of a model with as many alternatives as there are census tracts in the coverage areas of the two SF bike sharing systems. Forty-six census tracts are serviced by the Ford GoBike system, and 192 census tracts are serviced by JUMP. Discrete choice models generally include Alternative Specific Constants that aim to capture the biases toward each alternative that is not explicitly explained by the other model attributes. To avoid overfitting and aid in the interpretability of our model, we reduced the number of ASCs by clustering the census tracts based on their attributes. We included just one ASC in the model for each of the k clusters, making reasonable assumptions that clustered alternatives have similar unexplained bias. Several techniques can be applied to solve this unsupervised clustering problem. We considered three commonly used techniques for clustering: 1) DBSCAN, 2) Gaussian Mixture Models , and 3) k-means . We decided to work with k-means as it offers two desirable properties: 1) clusters tend to have similar sizes, and 2) clusters are grouped around a centroid. The last property suited our objective of having an average ASC for the entire cluster. K-means is a distance-based algorithm that requires preprocessing of the data to avoid biases due to differences in scale. First, we apply standard normal scaling on every census-level attribute available in our data sets. As our final objective is to determine the relative likelihood of trips destined for a location, we performed a Cross Correlation Analysis between the attributes of a tract and the number of trips that end in the tract. This process produces a projection of the set of attributes so that the clustering analysis favors attributes with a strong correlation to ridership . Figure 2 presents the resulting clusters with an intuitive interpretation of each, based on our a-priori understanding of the neighborhoods they represent. For both systems, we computed elevation gain by summing all increases in elevation observed in the 100 meter intervals sampled.

A complete list of attributes included in the final model are found in Table 1. This model excludes some parameters that were found to be insignificant to the destination choices of bike sharing users. Among them, unemployment measures such as the unemployment ratio or employment to population ratio were not significant when accounting for the log number of jobs. We chose not to include trip cost and membership considerations, as they differed considerably across the GoBike and JUMP systems. GoBike members pay annual membership fees, resulting in variable per-trip costs for each member depending on the frequency with which they use the service. We also did not have information on which of the short-term pass options were used by nonmembers. Though start time has a tremendous impact on destination choice, this choice maker attribute can only be incorporated in the model by interacting it with other relevant features. We chose not to add this refinement for model simplicity. Finally, the distribution of race or ethnicity at trip destinations were found to be highly correlated with economic attributes of destinations thus were not included in the final model. For the JUMP system, we considered every tract in San Francisco County as an alternative, while for GoBike we constrained the choice set to the tracts that contain at least a GoBike station to account for trip feasibility given the service area at the time the data were collected. The sample sizes for each model amounted to 70,779 trips, with 45 alternatives for GoBike and 24,034 trips with 192 alternatives for JUMP. Including all trips and alternatives, our datasets exceeded our computational power to fit the models using the PyLogit Python package. We employed an ensemble method that combines several “weak learners” to divide the workload. In this case, a weak learner is a MNL model trained on a sample of choice experiments. For the GoBike model, each weak learner was trained on 500 choice experiments using all 45 alternatives. However, for JUMP considering all alternatives would result in keeping too few choice experiments. So, we chose to have an approach similar to those employed in stated preference surveys by restricting the number of alternatives for each choice experiment. To fit the JUMP model, we randomly sampled 110 alternatives to use for each weak learner with 500 choice experiments. We chose to use the bootstrap as it measures the variance in the parameters, indicating which parameters are not relevant in the model and can be removed. On the other hand, cross validation is more focused on assessing predictive power . Since we are more concerned with narrowing attributes to those that are most influential in destination choice rather than producing a model that predicts exactly where a bike sharing user will travel to, we considered the bootstrap a more appropriate method for this analysis. Estimating identical models separately on the two datasets required that we keep attributes that happened to be significant for one system but not for the other.