Hi! Nice to meet you. In this first post I'd like to introduce the Orchest project and the core team behind it.
The ideas for Orchest are rooted in our personal experience working with data. What worked for us in practice:
- Interactive computing environments
- Rapid prototyping and iteration
- Learning from each other's explorations
- Flexibility over rigid structure
- Modularity and reuse
Our vision is simple: create better tools for data scientists to empower them to do their best work. More focus on data and algorithms, less time wasted fighting with complex systems and reinventing the wheel.
What is Orchest?
Orchest is a new kind of Integrated Development Environment (IDE) built specifically for Data Science.
What's different about an IDE for Data Science compared to your 'regular old Eclipse or VS Code'? In short: data and compute.
In our experience, data science projects require long running compute jobs and moving large data sets close to where compute is being performed. This naturally draws an integrated development experience towards the distributed system paradigm powered by the flexibility offered by cloud computing primitives.
Data Scientists should not have to deal with the raw complexity of distributed systems. We think data science workflows should be intuitive, experimental, and interactive. In our view, this is when you get to apply the scientific method. A process grounded in rapid prototyping and validation of ideas.
We operationalized these ideas in Orchest, by combining high code data analysis: scripting in Python, R, and Julia with low code infrastructure management: a GUI to manage pipelines, scheduled jobs, containers, data providers, and managed services.
Figure 1: High code analysis
Figure 2: Low code infrastructure
Our vision for Orchest
We believe in the power of abstractions, sometimes, the less you know the better.
Take for example the power of the SQL abstraction:
SELECT ProductName FROM Products WHERE ProductID = ANY (SELECT ProductID FROM OrderDetailsWHERE Quantity > 99);
The domain specific language of SQL enables you to think about what you want and ignore how it's going to work. This declarative type of abstraction is a superpower as it allows you to think about what you actually care about, and forget about the rest.
We believe abstractions of this kind are going to enable data scientists to do more and move faster than ever before.
What this means for Orchest is that we will always try to reduce the complexity you need to deal with as a data scientist. Giving you the flexibility and degrees of freedom you need, without overwhelming you with complexity and underlying details of the technology stack (Kubernetes, containers, CUDA, Linux, git, compilers, microservices, networking, etc.).
Anyone that has followed the data ecosystem will acknowledge the power of open source communities. The diversity of ideas and use cases lead to more general purpose software that benefits a large group of people.
With Orchest we want to respect the open source tradition and contribute to the community with powerful tools available under an open source license. Free of charge and accessible to the broad and diverse community of data enthusiasts.
As a venture backed company we need to have a clear strategy around sustainability and future development. To that end we will fund the open source project with a cloud hosted version of Orchest with extended functionality as part of an Enterprise version. It's an extension of the Open Core that is aimed at solving organizational complexity rather than technical complexity.
We believe that under this model we will be able to give away most of what we create, while at the same time provide a sustainable income stream necessary to invest in the development of the underlying foundation of Orchest.
Who we are
We'd like to end with a quick introduction of the team. We're a small and young team with a big ambition to create the next generation of tools that will empower data teams across the globe.
Rick Lamers, CEO and Co-Founder, GitHub
Orchest is exciting to me because we can, as Steve Jobs famously stated, create 'bicycles for the mind', tailored to a domain that I care deeply about: data science & machine learning. Working with a team of motivated people that set a high bar when it comes to developer tools for data scientists is a dream come true.
Yannick Perrenet, CTO and Co-Founder, GitHub
I truly believe in data-driven decision making. Getting to work on Orchest to help individuals and companies make the transition to data-driven processes is very rewarding to me. Ohh, and I love learning about innovative technologies!
Jacopo Gobbi, Founding Engineer, GitHub
To me, Orchest is the opportunity, and responsibility, to offer something of value to data science & machine learning people alike. We face the challenge of engineering so that you don't have to.
As a new project, we're always looking to learn from the community. Yet, we're full of ideas about how we can improve the workflow of data scientists working on some of the most important and exciting projects in the world.
What a wonderful article detailing this highly useful and productive IDE. Will surely give it a try. Have been looking for something like this for a while. Great job!