Getting Perspective on Open Source Projects and Forks with Ted Dunning, Apache Software Expert

We spent some time with Apache Software Foundation board member, PMC member, and open source expert Ted Dunning to get his thoughts on the state of open source software. 

Can you start by telling us a bit about yourself and your open source background?

I’ve been involved with machine learning and big data systems for a very long time and have spent even longer working in the open source world. The first open source I was involved with was in the mid-70’s, which at the time was guys trading floppy disks in a coffee shop parking lot in Colorado. More recently I’ve been with the Apache Software Foundation (ASF) since 2007. 

What’s your general philosophy on open source and the community behind an OSS project?

In my view, it’s not just one part or a few individuals that create an open source community – it’s the community as a whole. Sure, you might have a few more vocal folks, but an open source community is only as good as its least vocal member. And the community is a great thing because it gives people the ability to be part of a community – a wonderfully electrifying experience. It’s a unifying experience, and it is the ultimate marketing. 

There are cases where people pretend to be open but really won’t really let anyone else play. If a community is done cynically, it can turn into vast bitterness. And it’s very common for companies to say “we employ the creators of this project”, but in fact I think that it is the community in aggregate who wrote it. Yes, a few people may have written the first lines of code and may have gotten the ball rolling, but we have to remember that the project is not just the source code. There are always a lot of different contributions to the creation of a project.

And you’ve been a part of several open source companies, MapR being the latest as the CTO. How did you strike the balance of open source and proprietary?

MapR was actually very, very interesting because it had open and closed parts. But they were good about finding a balance – it wasn’t just all proprietary software or all open source. I tried to be very publicly clear about our decisions about open sourcing some features and keeping others proprietary. One of the challenges of a vendor-backed open source technology is that you’re going to have a culture gap inside the company. You have people come from closed source, and you have people that come from open source, and neither side necessarily understands the other. But despite that, I felt much better being straightforward and honest about our decisions rather than pretending that everything was open.

When it comes to building open source communities around projects, I’m sure you’ve seen many times that things can go wrong. In some cases that might mean a project splits off, or forks, from the original project. In your experience, what can that mean for a community?

It’s so sad when communities split up. When these destructive forks happen, quite frankly it’s a huge tragedy for open source. The common characteristics of these cases are that a) everyone is just confused and b) there typically are major differences in worldviews of the same events on either side of the fork, making it very hard to get people together when there’s no shared understanding of what was happening. I see these two specific things happen when it comes to bad forks:

  1. Both projects keep the same name or too similar of a name, so no one knows the differences between the two
  2. Splitting the project splits the community, and this may render both irrelevant

If you split a community, a mailing list, a slack channel, there’s a reverse of synergy effect. A lot of good people who want to contribute to something but don’t really like conflict end up just leaving. People have plenty of distractions in their lives. And they have plenty of other opportunities to be creative, fun, and happy. Then like 10 years later they go, “Oh, I remember that, wonder what happened to it?”

What’s a specific example of a fork that went wrong? 

Sun’s Open Office/LibreOffice project split was a classic fork that split the community. As Sun’s contributions to the project declined, there were concerns from the Open Office community over whether it would remain under a neutral foundation. When Oracle acquired Sun, discussions of a fork began and as a result, LibreOffice was created and the community was split between the two projects. Unfortunately, it created a lot of conflict and negativity which harmed both products and the Open Office/LibreOffice community at large. Much of the community stopped contributing and maintaining either.

The Hudson/Jenkins fork is a fork that had a similar potential for problems, but essentially the entire community left the original Hudson branch and moved over to the Jenkins fork. Today, it is hard to see that there ever was a fork. Things that went right included distinctive naming (Hudson versus Jenkins) and a strong consensus about which fork to stay with moving forward (Jenkins). The original Hudson project moved to the Eclipse foundation, but has been dormant since 2017, leaving room for Jenkins to flourish.

What about projects that keep too similar of a name? We’re seeing that play out today with the Presto open source project with the original PrestoDB and the similarly-named fork PrestoSQL. What are your thoughts on this?

My initial reaction with Presto is that I always have to go look up which one is which. There’s PrestoSQL, PrestoDB, Presto Foundation, Presto Software Foundation and I can’t keep them straight. And so there’s massive confusion unless you’re one of the participants – as an outsider, I have no idea which one is which. From what I understand about the project, my guess is that the Presto creators who worked at Facebook didn’t make it apparent that they were going off to start a new community. They may not have been as open as they could have been about this, as sometimes happens when people leave a company. Later Facebook decided to put more effort behind the community, and it caused confusion because there’s now effectively two communities with nearly identical names.

But I think that the moral of the story is that when you name something, names matter. Anybody who forks should have a completely different name. That alone makes it easier for everyone to understand. Of course, it isn’t always clear which is the fork and which is the original, but at least you’d know if it was Presto in one place and Abracadabra in the other. Everybody would get it. It doesn’t even matter all that much which side gets which name.