Defining Open: FOSSY Day 2
The internet’s most powerful engine of collective intelligence is at an uncertain point. The free and open-source software (FOSS) community emerged and refined its principles over twenty years during the early internet, co-evolving with regulation and technology. Decentralized movements need time to grow and adapt. But to adapt to advanced AI, FOSS needs 20 years’ worth of evolution in just a few months.
What does openness mean in the context of advanced AI, where the backbone of good models is a giant pile of semi-copyrighted data? Is openness a means to an end, like human flourishing, or a goal in and of itself? When AIs accidentally print 40,000 chemical war agents in six hours, is it time for the open-source community to revisit its prohibition on “discriminating against fields of endeavor?” We don’t know, and we better find out quickly!
Three Key Insights
Data enrichment
When you’re trying to understand how a collective project is going, collecting data is the easy part. The bigger challenge is enriching the data! Some contributors have multiple handles, which can intensely skew the results. Attributing contributions to the right user, and grouping users in useful ways, isn’t always obvious from the surface.
Specification gaming
To avoid specification gaming, researchers monitoring collective projects should change which metrics they use to measure results over time. This could include tracking new contributors and contributions, event impacts, or change request durations.
Copyrights
Access to datasets is a make or break for open-source projects and raises tricky questions around copyright. EleutherAI was built with a 225GB compressed dataset, which may not sound that big – but consider that all of Wikipedia is about 43GB. Excluding copyrighted data from being considered open would make it incredibly difficult for open models to learn much about our world.
Three Faces of FOSSY
Shauna Gordon-McKeon is an open-source developer and community leader. Her current project, GoverningOpen.com, is a resource center for tackling open governance problems to help OSS projects scale and sustain themselves.
Monica Ayhens-Madon: A fellow Atlantan, Monica is a developer advocate and organizer for FOSSY. She’s a skilled emcee and communicator who creates inclusive, energized environments. I’ve never felt as welcomed as a workshop guest as when she hosts!
Carlos Maltzahn co-founded Ceph (ceph.io), an incredibly successful open storage platform deployed at CERN. He founded and directs the UC Santa Cruz Center for Research in Open Source Software.
Two Sessions I Enjoyed
“From Commit Bits to Bylaws - Governing your Open Source Project,” led by Shauna, dug deep into misconceptions about open-source governance – that it is optional, complicated, or needs to be bureaucratic. We identified common governance challenges and solutions, including encouraging and empowering community members to participate in the governance process.
In Stefano Maffulli’s “Defining Open Source AI” workshop, which I’ve been looking forward to all week, a large group gathered to participate in an ongoing series to develop a set of shared principles to recreate the “permissionless, pragmatic, and simplified” collaboration for AI practitioners.
Looking forward to tomorrow
I’ve learned a lot today about measuring the success of collective digital projects and avoiding governance pitfalls. Tomorrow I’m hopeful to begin understanding what lessons open-source projects can provide to more typical organizations trying to generate the same levels of collaboration and growth. What can leaders learn about improving CI in their organizations from watching the FOSS community?