Cloudian CEO On The Importance Of S3 Data Lake For AI, ML

S3-compatible Storage Growth To Meet AI/ML Requirements

Cloudian Tuesday closed a $23 million round of growth financing from Morgan Stanley Expansion Capital, bringing total funding in the San Mateo, Calif.-based developer of S3-compatible object storage technology.

For Cloudian, the new funding is not critical as the company has already passed the break-even point and indeed saw its annual recurring revenue grow by 30 percent year-over-year, said CEO and Co-founder Michael Tso.

Instead, that new infusion of funds will target growth for the company’s storage technology which is seeing strong pull as customers look for ways to store and manage data for artificial intelligence and machine learning uses, Tso told CRN.

“Cloudian started as and still is the most [AWS] S3-compatible object storage that’s out there,” he said. “We are committed to being completely compatible with Amazon and guaranteeing that anything that runs in the cloud with Amazon is going to run for you on-prem with Cloudian. That guarantee is very, very important now, because that’s basically allowing you to future-proof your AI and ML journey. New tools are coming up all the time that are cloud-native. They’re all born in the cloud. So they all speak S3 as its native language.”

Object storage and S3-compatible object storage have transformed from a slow backup and archive technology to a key foundation for building high-performance artificial intelligence infrastructures, Tso said.

“People think about object storage as slow, as secondary storage,” he said. “I can tell you that the fastest growing workloads on our systems are actually modern applications including AI and ML workloads. And for those workloads, people typically deploy Cloudian on all-flash gear. That is growing very strongly for us. We have customers with close to 100 petabytes each on Cloudian and running on all-flash across multiple data centers. And that is their primary data store. We provide tier-zero storage for these guys.”

How do you define Cloudian?

Cloudian is the leading provider of on-prem data lake solutions for AI and ML (machine learning). We provide software that runs on standard hardware that basically creates exabyte-scale data lakes on-prem for people who have latency and data sovereignty deeds.

What are customers using those data lakes for?

There’s a huge number of different use cases. One of our clients has 3,000 internal applications running on top of data that they have with Cloudian. It runs the gamut from traditional back-end backup and archive and ransomware protection to things like file sharing and research data, and all the way to AI and ML What we have seen in the past year or so is that AI is making everybody question their data strategy and their cloud versus on-prem footprint. People are much more sensitive in terms of where they store their data. So essentially, what we’re seeing is that enterprises are realizing that data is very fundamental to their AI journey. And at Cloudian, the data lake always sits at the foundation of everything else.

You seem to say Cloudian already uses AI and machine learning as part of its technology. How does the company use AI?

We have been very active. We of course use AI for our internal operations and for our products. But more importantly, we announced connectivity to [Meta’s] PyTorch earlier this year. We already worked with TensorFlow. The piece which is really important is that Cloudian started as and still is the most [AWS] S3-compatible object storage that that’s out there. We are committed to being completely compatible with Amazon and guaranteeing that anything that runs in the cloud with Amazon is going to run for you on-prem with Cloudian. That guarantee is very, very important now, because that’s basically allowing you to future-proof your AI and ML journey. New tools are coming up all the time that are cloud-native. They’re all born in the cloud. So they all speak S3 as their native language. By being the most compatible cloud out there and having a full commitment to stay alongside Amazon—we actually meet with them every quarter, and we align our roadmap—so we are by far the most compatible out there.

We are compatible with everything that’s out there, the machine learning and all the AI pipeline tools including [Apache] Kafka, Spark, and Neutrino. All those different tool sets and data analytics tools that prepare data for deep learning support S3. So if your data is in a Cloudian S3 data lake, you never have to move that data. And that is really important because we all know that moving large amounts of data is very costly. And also for AI and ML, data doesn’t cache well because you’re scanning over the database only once so hierarchical storage doesn’t really work that well. So you want to have a data lake where you have everything that integrates well with your entire pipeline and all the workflow you have going on.

Leave a Reply

Your email address will not be published. Required fields are marked *