This is a big week in terms of announcements from Pachyderm. Not only did we raise $10M in series A, but we’re also announcing the general availability of Pachyderm v1.8! With this latest release, Pachyderm is faster - much faster - and includes new and improved support for structured data, as well as new options for auth. Needless to say, there’s a lot of enterprisey goodness packed into 1.8 and you can read all about it below.
Massive performance and scalability improvements (more than 1000x for certain workloads), are the crown jewels of our v1.8 release. By rearchitecting how we store and track our versioning metadata, Pachyderm can now meet and exceed the demands of any enterprise. Users can now scale to billions of files, 100’s of TB of data processed per job, and thousands of nodes in the cluster. If you’re interested in learning more, check out this post.
SQL and CSV data sources dominate the data science space, especially in the financial industry. In past versions, working with headers/footers was difficult because breaking up CSV files for distributed processing often required cumbersome data management. With Pachyderm v1.8, header and footer information can be embedded into the directory and automatically apply header/footer information to all the children files. This makes it easy to break up large CSV or SQL dumps into many smaller files, but still process them in a distributed fashion with simple file I/O operations just like any other data in Pachyderm. For a complete walk-through check our technical deep dive.
Controlling who has access to specific data is a critical aspect of IT. Which is why we’re thrilled to announce that Pachyderm v1.8 will expand our SSO support to include Okta. With enterprise customers have the ability to leverage both GitHub and Okta for Auth and controlling user access within our system.