From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design

Cyril Picard1, 3, Kristen M. Edwards1, 3, Anna C. Doris1, Brandon Man, Giorgio Giannone1, 2, Md Ferdous Alam1, Faez Ahmed1

1MIT  2DTU  3Equal contribution 

The Basics

The evaluation dataset will be release in the coming days. Below you can find the most upto date information on this project:


Engineering Design is undergoing a transformative shift with the advent of AI, marking a new era in how we approach product, system, and service planning. Large language models have demonstrated impressive capabilities in enabling this shift. Yet, with text as their only input modality, they cannot leverage the large body of visual artifacts that engineers have used for centuries and are accustomed to. This gap is addressed with the release of multimodal vision language models, such as GPT-4V, enabling AI to impact many more types of tasks. In light of these advancements, this paper presents a comprehensive evaluation of GPT-4V, a vision language model, across a wide spectrum of engineering design tasks, categorized into four main areas: Conceptual Design, System-Level and Detailed Design, Manufacturing and Inspection, and Engineering Education Tasks. Our study assesses GPT-4V's capabilities in design tasks such as sketch similarity analysis, concept selection using Pugh Charts, material selection, engineering drawing analysis, CAD generation, topology optimization, design for additive and subtractive manufacturing, spatial reasoning challenges, and textbook problems. Through this structured evaluation, we not only explore GPT-4V's proficiency in handling complex design and manufacturing challenges but also identify its limitations in complex engineering design applications. Our research establishes a foundation for future assessments of vision language models, emphasizing their immense potential for innovating and enhancing the engineering design and manufacturing landscape. It also contributes a set of benchmark testing datasets, with more than 1000 queries, for ongoing advancements and applications in this field.


Below, you will find a list of the available datasets. Please refer to the original paper to get details on how to use them.

  • Conceptual Design: The datasets for conceptual design use the milk frother sketch datasets published here by Scarlett Miller and team:
    1. Description matching
    2. Design similarity (matching triplets)
  • Design for Additive Manufacturing
  • Design for Subtractive Manufacturing: We provide generated images from 3D models along with the features from MFCAD.
  • Spatial Reasoning:
    1. MechE Rotation test: images and prompts adapted from the original test.
    2. Packing test: images and prompts adapted from the original test.
  • Concrete Crack Detection: We provide the prompt. Researchers need to download the crack images and annotations from the original dataset by Mundt et al.


The authors acknowledges the Swiss National Science Foundation, as well as the National Science Foundation Graduate Research Fellowship for its financial support.