The move to cloud computing is one of the most important technology shifts of our generation. Along with it, the decades-long push to centralize data storage in a single warehouse is coming to an end, as dumping everything into a “data lake” has caused more harm than good.
For some applications, centralizing data via cloud storage solutions such as Amazon S3 and Snowflake works to an extent (read: Snowflake’s IPO). At the same time, several major factors are creating greater data decentralization. Here are three of the biggest.
The major ad platforms have shared less and less detailed customer and performance data over the years. I was on the phone with a Procter & Gamble executive recently and they complained that even at their level of spend, Google and Facebook expect them to “rent” data instead of owning it. So a marketer can’t just build a giant data warehouse, like in the good ol’ days, dump everything in it, and run analytics on it. Because as Google and Facebook have taken over the ad market, marketers are getting only “cohort-level” data, which makes analysis in one central data store impossible (this has interesting effects on marketing spend and prices, but that’s another post). It looks like this trend won’t stop anytime soon since moves like the banning of third-party cookies and Apple’s move to ban IDFA are only consolidating more power in Google, Facebook, and somewhat in Apple.
Data scarcity has existed for a long time, particularly in the retail/consumer sector. Retailers sharing as little as possible with the brands they work with has been a persistent source of friction. For example, pharmaceutical companies don’t know anything about where their drugs are prescribed and sold; meanwhile, vendors like CVS and Walgreens sell their prescription data to IQVIA, who in turn rents it back to the pharma companies. They literally rent their own data!
It’s gotten worse with e-commerce as Amazon doesn’t share any data with its sellers, so these brands are receiving even less data now than they used to from brick-and-mortar stores. That is why traditional brands have been eager to acquire direct-to-consumer brands (think Walmart and Jet.com, for example): They have direct data on end-consumers.
As international privacy measures like GDPR and CCPA, as well as other regulations, get more strict, the laws about how you move data around both internally and between companies are undergoing more scrutiny. So far, companies have “solved” this by requiring their software vendors to take on the entirety of the monetary risk of violating regulations. My firm has walked away from some investment opportunities in software startups because they were too exposed on this front, and I expect there will be technology companies that fail due to regulatory exposure down the line.
Eventually, I think this will lead to changes in how these companies operate. The first change that we’ll likely see is a move from SaaS data tools to self-hosted tools in virtual private clouds, something that’s already happening in finance. Every large company is one security breach away from freaking out and bringing everything in-house. This has created the need for a new generation of tools like OneTrust and BigID, among others.
Let’s say a retailer like Macy’s wants a SaaS vendor to run its application and store its data in Azure or Google’s VPC. In that case, the SaaS vendor needs to think about data partitioning, running software in multiple clouds and sometimes even multiple zones of the same cloud. Retailers also frequently don’t want their data in AWS because they all compete with Amazon. So a SaaS company that’s running on AWS needs to figure out a way to store that retailer’s data in Microsoft or Google Cloud, or have to rewrite its entire software to run on multiple clouds. Sometimes that data needs to reside in a specific geographic or availability zone of a cloud provider in order to integrate with other data or services in that same zone.
Outside of retail this is affecting advertising, though to a much lesser degree. But at a minimum, when a customer is on a given cloud they expect the vendor to at least integrate with it. So if the customer runs its databases in Azure or GCP or AWS, the vendor must follow suit when it’s time to export data.
Another issue is data location. Say some big company keeps its data in the West Coast region of AWS or Snowflake and uses software from Salesforce or another SaaS provider. When that SaaS provider shares data with another partner/SaaS provider, both SaaS companies have to figure out ways to move or replicate data from region to region.
The growing orthogonal movements towards data decentralization and cloud data migration will continue to spawn useful and profitable data tools, especially with the rise of the data science profession.
Moving forward, companies that provide applications, data transport tools, and data itself will see real demand. Data diversity is now greater than data scale. No one has a data decoder ring. Other than building one of those, the transition to decentralized data will result in the need for data management solutions in areas like data survivability, data residency, access control, data masking, encryption, identity management, and much more.
Alex Rosen is a Co-founder and Managing Partner at Ridge Ventures. He has worked in venture capital for more than 20 years, tallying 15 exits exceeding $20 billion in value.
(Source: Venture Beat https://venturebeat.com/2021/03/18/the-great-data-decentralization-is-coming-are-you-ready/ )