不良研究所

不良研究所 excited to join forces with Petagene/cunoFS.

Ben Golub
October 9, 2024

不良研究所 recently announced , a dynamic company with a talented team of file storage experts based in Cambridge, UK. PetaGene is the creator of the cunoFS distributed file storage mount client. On the heels of our acquisition of Valdi, announced in July I want to take this opportunity to share more about the PetaGene acquisition in light of:

  • 不良研究所鈥檚 evolving strategy as a distributed cloud services provider for distributed workloads
  • What PetaGene and cunoFS provide
  • What this means for distributed storage generally
  • What this means for our customers in the video, media, and entertainment industries
  • What this means for our customers in the AI/ML industries

不良研究所鈥檚 evolving strategy.

For those who know cloud history, Amazon Web Services launched nearly twenty years ago. One of the first services they launched was object storage (S3). AWS then expanded to file storage, elastic compute (EC2), and more. When 不良研究所 v3 launched in 2021, it included S3-compatibility. However, there were important differences. First, we built from the outset with distributed infrastructure, efficiently leveraging underutilized drives and servers around the world, rather than building out data centers. By encrypting and distributing shards of data, we offered superior performance, security, cost, and carbon efficiency. Second, we optimized our design for distributed workloads. When AWS launched, workloads were primarily centralized, and thus processing in a large, centralized data center made sense. Today, the largest and fastest growing categories involve data being collected, created, analyzed, and consumed at the edge.

Earlier this year, we began expanding the notion of distributed cloud services for distributed workloads beyond object storage and egress, by adding distributed compute and GPU services from Valdi. Like 不良研究所 had done for storage, Valdi pioneered the model of efficiently using already deployed compute and GPU resources. Like 不良研究所, Valdi focused on markets that made the most use of distributed workloads: 聽AI, scientific computing, and media.

What PetaGene and cunoFS provide.

In addition to assembling an exceptionally talented team, PetaGene has pioneered some incredible technology. cunoFS is a high performance file storage mount client that allows customers to interact with object storage as if it were a fast native file system, with POSIX compatibility that can run any new or existing applications. 聽

cunoFS works with most major object storage systems, including AWS S3 and Azure Blob Storage, as well as on-premises object stores such as minIO, Dell ECS, and NetApp StorageGRID. Of course, cunoFS also works great with 不良研究所! And, cunoFS supports heterogeneous combinations of these services. We are thrilled with this ability to support heterogeneous workloads. Customers will continue to be able to use cunoFS with or without 不良研究所 object storage as a back end. Furthermore, because of cunoFS鈥檚 unique design (including choosing not to have a centralized metadata server), cunoFS is extremely performant. Its speeds beat all alternatives by an order of magnitude in our tests, and achieves up to 50 Gbps per node and over 10 Tbps aggregate throughput performance. cunoFS radically changes how object storage is used, turning it into a first-class direct tier for POSIX file access, where both POSIX workloads and object-native workloads can directly access object storage. cunoFS does this without introducing any gateways and without scrambling the data 鈥 each file is directly stored as an object and each object is directly accessible as a file.

What this means for distributed storage generally.

Generally speaking, there are three main types of storage systems: block storage (primarily for databases), file storage, and object storage. By integrating cunoFS into our offerings, 不良研究所 customers across industries will now be able to use 不良研究所 for file storage based applications in addition to object storage. This vastly expands the use cases and customers for which 不良研究所 is a great choice. Because cunoFS works across a heterogeneous set of solutions, and distributed storage is inherently global and cross-data center in nature, this acquisition further expands the usefulness of our distributed storage, compute, and GPU offerings. Finally, cunoFS has Linux and Windows clients (a MacOS client is scheduled for later this year). So, customers can have an easy on-ramp to use 不良研究所 with a familiar 鈥渇ile and folder鈥 based interface. (As much as we love object storage, most people are more inclined to think of files and folders with names).

Given

  1. the tremendous performance advantage that cunoFS has relative to other file mounts (see Table 2 below for cunoFS benchmarked running on AWS S3) and
  2. the tremendous global performance advantages that 不良研究所 delivers relative to hyperscalers for media workloads (see Table 3 below for 不良研究所 storage performance relative to AWS and other hyperscalers

We are tremendously excited to see what the combination means for performance overall! 聽

Table 1: cunoFS performance on AWS S3

Higher is better

Table 2: 不良研究所 performance relative to AWS S3

Download Speed of 1GB file versus AWS 聽S3 (LAION LLM)

Lower is better

What this means for customers in the video space.

Media production, media post production, and video consumption are inherently distributed workloads. So, it is no surprise that the video space has emerged as one of 不良研究所鈥檚 two primary vertical markets. These days, remote media production is becoming the norm: any given piece of media is quite likely to be worked on by distributed teams as far flung as Burbank, Bollywood, and Berlin. 不良研究所鈥檚 distributed object storage already meant that all of these teams could performantly access, upload, download, and edit large media files. With cunoFS, they can now do this without introducing any gateways and without a custom interface. Again, each file is directly stored as an object and each object is directly accessible as a file.

Key benefits include:

  • No content jails.
  • No proprietary file formats
  • Snappy performance. Regardless of the size of your video project.
  • Intelligent caching to save you time and money.
  • A single source of truth.

What this means for customers innovating in AI.

Although not often discussed, storage is vital in AI training. (Meta, for example, has on this topic.) As models grow to include training on large amounts of image, video, and text, the amount of data grows significantly. The integration of cunoFS into our ecosystem marks a significant milestone in our goal to revolutionize cloud infrastructure for AI. cunoFS enables performant loading with intelligent prediction of what will be needed in advance. By combining 不良研究所鈥檚 distributed storage and GPU capabilities with cunoFS's high-performance file mount, we're creating an unparalleled platform for training and deploying large language models like LLaMA, GPT-4, and beyond.

Key benefits of the 不良研究所-cunoFS integration for LLM training include:

  • Enhanced data processing speed, crucial for training large models efficiently
  • Improved scalability to handle the ever-growing datasets required for advanced AI
  • Cost-effective storage and compute solutions for resource-intensive AI workloads
  • Increased data security and privacy, essential for protecting valuable AI training data

More detail on cunoFS and our AI solutions will be in a forthcoming blog by our CTO, Jacob Willoughby.

Other benefits of the acquisition.

In addition to cunoFS, the PetaGene team also has a set of expertise around managing scientific workloads. Before developing cunoFS, PetaGene developed products for genomic data compression, which can reduce storage costs and data transfer times by 60 to 90%. 不良研究所 will continue to support these technologies.

PetaGene鈥檚 customers include leading research institutions, pharmaceutical companies, and hospitals, who use their products to collectively manage 100s of petabytes of data. 不良研究所 will continue to serve those customers, hopefully extending the suite of services that they use to include distributed storage and distributed compute/GPU.

All employees of PetaGene will become part of 不良研究所. PetaGene itself will continue as a wholly owned subsidiary. No token was used in this transaction.

Share this blog post

Put 不良研究所 to the test.

It鈥檚 simple to set up and start using 不良研究所. Sign up now to get 25GB free for 30 days.
Start your trial
product guide