Storage Almost Full: Driverless Cars Create Data Crunch

Editor's note: This story is part of the WardsAuto digital archive, which may include content that was first published in print, or in different web layouts.

Driverless cars, as well as the advanced driver-assistance systems preceding their rollout, promise new levels of transportation freedom and safety, but experts warn they also could create a crippling data crunch from the massive amounts of information generated during development and deployment.

“The work that is being done by the larger OEMs is unprecedented in this industry, in terms of the amount of data it is generating,” says Varun Chhabra, senior director-product marketing at Dell EMC. “Some of these figures are just staggering.”

For example, Twitter’s 270 million users produce about 100 GB of data per day. A single autonomous test vehicle produces about 30 TB per day, which is 3,000 times the scope of Twitter’s daily data.

Extrapolate those data figures over thousands of autonomous and ADAS prototypes tested every day by every major OEM and suppliers around the world, and the industry’s output of information from items such as fully adaptive cruise control and lane-keeping to full-blown Level 5 automation could fill traditional R&D computer servers to the gills.

But it is not just the sheer volume of data confronting the industry. It also is the rapidly expanding task of tagging the data, creating metadata for further use in soon-to-be-ubiquitous machine learning and artificial intelligence and making it searchable for additional R&D work down the road.

It is unclear whether data may be disposable, too. When your personal computer or device reaches it data limit, it is easy enough to make more space by deleting files. But in the new frontier of autonomy and ADAS, data may have to be stored perpetually to protect automakers and suppliers from potential legal action if the technology fails in the field.

Automated vehicles also are expected to be on the road longer than today’s piloted cars and trucks, which could increase maintenance schedules and put an additional onus on keeping data for an indefinite period.

“It’s not a question of being able to store this data in a vault somewhere you will never use. This data is actively used by these companies as they develop these technologies,” says Chhabra, whose Hopkinton, MA-based unit of computer-technology giant Dell provides data storage solutions across a range of industries.

Humans are expected to interact less with AVs than piloted vehicles, too, so manufacturers may have to protect themselves against potential litigation if one of the cars crashes, Chhabra says.

“They cannot throw this data away,” he adds. “It has to be stored cost effectively, it’s got to be scaled and people have to be able to run analytics on it so it has to be immediately accessible, metaphorically, at the snap of a finger.”

But like many technological developments over the years, data-storage requirements vary from company to company. It leads to different opinions over how deep an automaker’s or supplier’s data library should be.

Glen De Vos, chief technical officer at Aptiv, the former electronics unit of Delphi looking to be the go-to supplier for autonomous-vehicle technology, says the supplier stores about 2 TB per day. He says Aptiv is unique in that it has learned over the years what it needs to keep and what it can discard, but also says its data-storage demands today are greater than the amount of data Delphi generated over more than a century.

“What you need to keep is the important data, and sorting that out is different for every company,” he tells WardsAuto.

Broadly speaking, De Vos says the industry as a whole has taken a “brute force” approach by saving everything, just in case.

“If something breaks, you fix it, but then the fix affects something else, so you need all that data to fix it again,” he says.

Tumbling data storage costs make that possible, he adds. But before Aptiv split off from powertrain solutions provider Delphi Technologies, the company bought data specialist Control-Tec in 2015 to enhance its software capabilities.

Control-Tec is widely considered the industry leader in capturing, transferring and analyzing vehicle data. Control-Tec uses edge computing, or the process of analyzing data near its source to reduce communication back to the central data center, to extract valuable information to perhaps quickly resolve anomalies such as a conflict between the vehicle’s central computing system and its mapping system.

Woongjung Jang, director-advanced driver-assist systems for Hyundai, says the Korean automaker generates about 10 GB of data in real time every second from its AVs, but it only stores what he says is “essential” data, with some stored on the vehicle and off-loaded later to a server and some uploaded to the cloud.

Toyota Research Institute CEO Gill Pratt does not see a need to preserve every byte of data, either, because he says much of what gets documented is superfluous.

“For development reasons we’re going to want to record, store and share, but not all, because most driving is quite boring,” he says. “I think the government is putting in a window of time around certain types of events, and for the most part you don’t have to store the other stuff.”

For the self-driving cars it intends to put into ride-hailing service in the U.S. in 2019, General Motors says it has IT teams in multiple states building computer systems to store and process data it retrieves from the vehicles. The Detroit automaker says it keeps the data to evaluate design and driving performance during vehicle development and deployment and for continuous improvement for future generations of self-driving vehicles.

GM’s two main data warehouses were erected in southeast Michigan three years ago at a combined cost of $288 million. In addition to handling product-development computing demands, it satisfies the needs of the automaker’s manufacturing, marketing and sales groups.

Data, however, also is seen as a potentially huge new revenue stream for automakers and suppliers. Data ownership remains a question – does it belong to the car’s owner, the manufacturer or software developer – but the industry has placed a premium on monetizing whatever it can keep its hands on.

According to global industry consultant Boston Consulting Group, automotive data monetization including connectivity will grow to a $28 billion business by 2035 from $1 billion today.

Data from autonomous vehicles in ride-hailing fleets held by manufacturers, for example, presumably would be a prized commodity for mobility-service operators such as Uber to learn about transportation consumption habits. Insurance companies and retail advertisers also would covet the information, experts speculate.

De Vos says data monetization is a key element of Aptiv’s business going forward. For example, the AV software it may integrate into an OEM customer’s vehicle would need to be updated with new software data down the road. Delphi’s acquisition of Boston-based nuTonomy last year for $450 million bolstered the supplier’s expertise in fleet management, where Aptiv could potentially control millions of dollars in data gathered from AVs delivering everything from packages to pizza.

“We have to be part of that broader ecosystem, instead of just selling components,” De Vos says.

Tapping the new revenue stream means OEMs and suppliers must build out expertise in data management, Chhabra says, whether it be in-house as at GM, through a strategic acquisition as Aptiv has demonstrated or contracting a third party.

“There are two kinds of auto manufacturers in this situation: One is a company that is perhaps at an early stage of this transformation and believes they can do it themselves, so they’ll go and get open-source storage (and) management platforms and then try to hire developers,” Chhabra says.

“But the customers that we see a lot of traction with are those that have gone through that curve and (now) are focused on the higher level, value-add, which is how do I get the best developers to figure out my best machine-learning algorithms, people who can work with my automotive engineers and really translate good software into a good experience for an end customer versus how do I build a storage platform that is scaled,” he says.

Chhabra doubts it is in the best interest of any automotive company to push its data entirely into cloud storage, either, given its R&D value and the potential revenue opportunity.

“Data is really the lifeblood of your business going forward,” he says. “There is a tendency to want to keep this data in-house. It is the crown jewels.”

– with Christie Schweinsberg in Los Angeles

[email protected]

Storage Almost Full: Driverless Cars Create Data Crunch

Ford recalls more than 355K F-Series trucks for blank instrument cluster displays

Studies, reports reveal persistent dealer pain points

Company Announcements

Ford recalls more than 355K F-Series trucks for blank instrument cluster displays

Studies, reports reveal persistent dealer pain points

Reach our audience

Related Publications

Don't miss tomorrow's automotive industry news

Storage Almost Full: Driverless Cars Create Data Crunch

WardsAuto news delivered to your inbox

Editors' picks

Ford recalls more than 355K F-Series trucks for blank instrument cluster displays

Studies, reports reveal persistent dealer pain points

WardsAuto news delivered to your inbox

Company Announcements

Ford recalls more than 355K F-Series trucks for blank instrument cluster displays

Studies, reports reveal persistent dealer pain points