Photo: Interior of automobileThere is no doubt about it: Schema.org is a big success. It has motivated hundreds of thousands of Web site owners to add structured data markup to their HTML templates and brought the idea of exchanging structured data over the WWW from the labs and prototypes to real business.

Unfortunately, the support for information about the sales and rental of vehicles, namely cars, motorbikes, trucks, boats, and bikes has been insufficient for quite a while. Besides two simple classes for http://schema.org/Vehicle and http://schema.org/Car with no additional properties, there was nothing in the vocabulary that would help marking up granular vehicle information in new or used car listing sites or car rental offers.

Recently, Mirek Sopek, Karol Szczepański and I have released a fully-fledged extension proposal for schema.org that fixes this shortcoming and paves the ground for much better automotive Web sites in the light of marketing with structured data.

This proposal builds on the following vehicle-related extensions for GoodRelations, the e-commerce model of schema.org:

It adds the core classes, properties and enumerated values for describing cars, trucks, busses, bikes, and boats and their features. For describing commercial aspects of related offers, http://schema.org/Offer already provides the necessary level of detail. Thus, our proposal does not add new elements for commercial features.

Design Principles

We defined the following design principles for the extension:

  1. Reuse commercial properties from http://schema.org/Offer and the underlying GoodRelations conceptual model. Don’t mix modeling vehicles with modeling offers to buy, rent, or service vehicles.
  2. Standardize objective vehicle characteristics and defer consensus for diverse and vendor-specific vehicle characteristics: Vehicles, and in particular cars, can have a hundred features and more. Some are very objective (e.g. fuel consumption under lab conditions, dimensions); some are best handled as text (e.g. interior colors), and some are very vendor-specific, e.g. engine types, safety features, etc.
  3. Balance between mark-up effort and usefulness of data: The data should be as useful as possible for search engines and other consumers, but the effort for Web developers must be also kept at a minimum.

These design principles are implemented as follows:

  1. Quantitative properties that do not require a unit of measurement can be modeled as either a plain numeric literal (http://schema.org/Number) or a http://schema.org/QuantitativeValue. Typical cases are the number of seats or doors. For a single value, a literal is sufficient. For a range, a http://schema.org/QuantitativeValue is more appropriate.
  2. Qualitative properties, like fuel types or body styles can be modeled either using a Freebase URI, site-specific URI, or plain text. If a site is able to provide the URI of a authoritative definition for a value or characteristic, this is more useful, but if the site can provide only a string, this is better than nothing.
  3. Limit the number of standardized properties and use the complementing property-values proposal for schema.org for vendor-specific vehicle features.

Overview of Changes

The proposal is pretty straightforward: It updates the definitions for Vehicle and Car and adds a total of eight classes for types of vehicles:

 

Vehicle

  • MotorizedRoadVehicle
    • Car
    • Truck
    • Van
    • MotorizedBicycle
    • BusOrCoach
    • Motorcycle
  • Bike
  • Watercraft
    • Boat
    • MotorBoat
    • SailingBoat

A few other classes are added for grouping various property values. While the original intention was support for the automotive industry, basic support for bikes and boats has already been included, too. Support for aircraft information could be added with moderate effort but is left out for the moment, because we think this should be a separate extension proposal based on additional domain expertise.

On the property side, the proposal takes a twofold approach: Such properties that are pretty much standardized across sites and brands are defined as dedicated properties in the vocabulary:

  • acceleration
  • ACRISSCode
  • airbags
  • axles
  • bodyStyle
  • cargoVolume
  • colorInterior
  • damages
  • doors
  • driveWheelConfiguration
  • emissionsCO2
  • engineDisplacement
  • engineName
  • enginePower
  • engineType
  • firstRegistration
  • fuelCapacity
  • fuelConsumption
  • fuelEfficiency
  • fuelType
  • gears
  • interiorType
  • meetsEmissionStandard
  • mileageFromOdometer
  • modelDate
  • numberOfOwners
  • payload
  • productionDate
  • roofLoad
  • seatingCapacity
  • specialUsage
  • speed
  • steeringPosition
  • tongueWeight
  • torque
  • trailerWeight
  • transmission
  • VIN
  • weightTotal
  • wheelbase

For all other vehicle features, the proposal recommends using the complementing property-values proposal for schema.org.

Relationship to Configuration Information for Vehicles

Cars and other vehicles are often highly configurable products, for which the number of possible combinations can be as much as 1020 for a single vendor. Many manufacturers of cars offer cars in a built-to-order fashion, i.e. they market options spaces of possible cars to customers. The space of actually available cars is a subset of the theoretically possible combinations, because…

  1. technical constraints (a configuration would not work well or not at all),
  2. legal constraints (a configuration would not meet regulatory requirements for a target market),
  3. production constraints (a configuration will be logistically difficult or expensive to build), and
  4. marketing considerations (the manufacturer does not want to offer a certain configuration)

…rule our certain configurations.

This proposal focuses on modeling fully-specified cars, like actual new or used cars, or enumerated sets of car configurations.

The modeling of configuration rules (e.g. which alternative options are available and how they can be combined) is outside the scope of this proposal. We plan a second proposal for the non-trivial problem of vehicle configuration and vehicle range information, which will complement this proposal.

The reasons for this staged approach are as follows:

  1. Configurable products are not yet supported by schema.org and the underlying GoodRelations product model. An extension for configurable vehicles should include a generic extension for configurable products, which requires additional time to develop.
  2. The number of Web sites that publish information about actual cars is by orders of magnitude bigger than then number of sites that publish or are able to publish configuration rules. Every dealer listing used and new car inventory and every car listing site will benefit from support for the proposed extension. Configuration rules will mainly be relevant for a manufacturer sites.
  3. The proposal is based on the existing product model of schema.org. A future extension for configurable products must be designed in a way compatible with the existing product model anyway, so there is no risk starting with actual vehicles.

Business Impact

We are convinced that the proposal will allow tens of thousands of vehicle-related sites to expose more granular data bout their products and offers and allow search engines and other clients to implement new and better services.

Resources

Git repository

https://github.com/mfhepp/sdo-vehicles

W3C Wiki

https://www.w3.org/wiki/WebSchemas/Vehicles

Development version of schema.org including the proposal

http://sdo-vehicles.appspot.com/

See e.g.

http://sdo-vehicles.appspot.com/Car

http://sdo-vehicles.appspot.com/Van

http://sdo-vehicles.appspot.com/Motorcycle

For a quick overview of the proposed changes, see

https://www.w3.org/wiki/WebSchemas/Vehicles#New_Elements

About the Author

Photo of Martin HeppMartin Hepp is a professor of E-business and General Management at the Universität der Bundeswehr Munich and the CEO and Chief Scientist of Hepp Research GmbH. His key research interest is in shared data structures at Web scale. As part of his work, he authored more than 60 peer-reviewed publications and developed the GoodRelations vocabulary for e-commerce, widely used by companies like Google, Yahoo, BestBuy, Kmart, Volkswagen, Renault, and tens of thousands of smaller businesses.

For more information, see http://www.heppnetz.de. Martin is on Twitter at @mfhepp.

Image: courtesy flickr / artvlive