Design Computation & Digital Engineering Lab


A Dataset and Machine Learning Benchmarks for Data-Driven Bicycle Design

Lyle Regenwetter1, Brent Curry2, Faez Ahmed1

1MIT  2BikeCAD 


Learn about various types of data included within BIKED:

BIKED provides parametric design data sourced directly from BikeCAD file data. This parametric data consists of numerical dimensions, discrete counts and categories, and boolean flags. A sequential processing pipeline first condenses the raw 23613-dimensional parameter space into a more manageable reduced parameter space with 1314 parameters, then applies other curation steps like one-hot encoding and imputation to generate an ML-ready 2395-dimensional parameter space with 4512 total models. The raw, reduced, and processed data are each provided, along with the processing code in order to accomodate custom processing methods.

BIKED provides image exports of CAD files with backgrounds and dimension labels cleaned. It also provides a standardized version of these images with colors standardized and patterns and decals removed.

BIKED breaks bikes down into components (saddle, frame, wheels, etc.) and includes segmented component images of these components as well as semantic masks of the bicycle images. Bicycle components in component images are positioned according to their true position in the bicycle and scaled propotionally to their physical dimensions.

Design Space Exploration

BIKED's diverse assortment of 4512 manually-designed bicycle models span all common bicycle styles and feature unique models from remote corners of the bicycle design space. BIKED contains user-specified bicycle class data, which can be used for classification tasks and is great for visualizing the data. Shown below is a T-Distributed Stochastic Neighbor Embedding plot of the parametric bicycle design space. Corresponding images from models in different regions of the embedding are showcased.

BIKED's rich design data is packed with undiscovered insights about the bicycle design space waiting to be identified.

Machine Learning Applications

BIKED's variety of data enables a broad range of machine learning applications. The rich parametric data supports the development of data-driven surrogate models which may be used to quantify bicycle performance and enable high-performance multi-objective design optimization. BIKED's segmented component images and localized parametric data open doors for heirarchical generative synthesis methods that make use of a combination of spatial and parametric data. BIKED's class labels provide opportunities for multi-data-type bicycle classification methods or conditional generative synthesis methods designed to generate bicycle of existing classes or novel hybrid classes.

Interpretability analyses of trained machine learning models can also identify revelations in the bicycle design field. For example, the SHAP analysis shown above was performed on a deep neural classifier trained on BIKED parametric data and class labels and revealed the most significant design parameters in predicting bicycle class. BIKED can also be used to support customer preference studies of bicycles to better model the current bicycle market and consumer demands.

Bicycle Design Synthsis

Enabling research and development of design synthesis methods is one of the key goals of BIKED. We explore and contrast several methods for full bicycle design synthesis including parametric-based generation using Variational Autoencoders.

Further developing performance-aware bicycle synthesis methods is an area of active research in the DeCoDE lab.



Regenwetter, Lyle, Brent Curry, and Faez Ahmed. “BIKED: A Dataset and Machine Learning Benchmarks for Data-Driven Bicycle Design” In proceedings of the International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, {IDETC-21}, virtual, online, 2021.


    title={{BIKED}: A Dataset and Machine Learning Benchmarks for Data-Driven Bicycle Design},
     Lyle and Curry,
     Brent and Ahmed,
    booktitle={International Design Engineering Technical Conferences and Computers and Information in Engineering Conference,
    day = {17-20},
    month = {Aug},
    address = {Virtual,


We would like to thank Professor Daniel Frey for his input and guidance throughout the project. We would like to thank Kris Vu for assisting with sourcing files from the BikeCAD archive and Amin Heyrani Nobari for assisting with the exporting of component images. Finally, we would like to acknowledge MathWorks for supporting this research.