Produire un Logiciel Libre

Comment Diriger avec Succès un Projet de Logiciel Libre

Karl Fogel

(Auteur) 

Étienne Savard

(Traducteur) 

Bertrand Florat

(Traducteur) 

Note: la traduction française est réalisée sous la forme d'un wiki à http://www.framalang.org/wiki/Producing_Open_Source_Software. Une fois terminée, la traduction sera déplacée ici. Pour le moment, veuillez consulter le wiki pour la version à jour.


Dédicace

Ce livre est dédié à deux amis chers sans qui il n'aurait pu être possible: Karen Underhill et Jim Blandy.

Table des matières

Préface
Pourquoi écrire ce livre?
À qui s'adresse ce livre ?
Sources
Remerciements
Note
1. Introduction
Historique
L'avènement des logiciels propriétaires et des logiciels libres
Résistance consciente
Résistance accidentelle
« Libre » contre « Open Source »
La situation actuelle
2. Genèse d'un projet
Revue de paquetage
Choisir un nom adéquat
Fixer des finalités claires
Préciser que le projet est libre
Lister les fonctionnalités et pré-requis
Informer sur le statut de développement
Téléchargements
La gestion de configuration logicielle et les systèmes de gestion de tickets
Les canaux de communication
Le guide du développeur
La documentation
Mise à disposition de la documentation
La documentation développeurs
Échantillons et captures d'écrans
Hébergement sur une forge
Choisir une licence et la mettre en oeuvre
Les licences "Faites ce que vous voulez"
La GPL
Comment mettre en oeuvre cette licence au projet
Donner le ton
Évitez les discussions privées
Tuez l'agressivité dans l'oeuf
Pratiquez la revue par pairs
Lorsque vous ouvrez un projet propriétaire, soyez attentif à la gestion du changement
Annoncer le projet
3. L'infrastructure technique
Les besoins d'un projet
Les listes de diffusion
Se prémunir du spam
Filtrer les messages
Masquer les adresses dans les archives
Identification et gestion des en-têtes
Le grand débat du « Répondre à »
Deux rêves
L’archivage
Les logiciels
Les logiciels de gestion de versions
Vocabulaire de la gestion de versions
Choisir un logiciel de gestion de versions
Using the Version Control System
Version everything
Browsability
Commit emails
Use branches to avoid bottlenecks
Singularity of information
Authorization
Bug Tracker
Interaction with Mailing Lists
Pre-Filtering the Bug Tracker
IRC / Real-Time Chat Systems
Bots
Archiving IRC
Wikis
Web Site
Canned Hosting
Choosing a canned hosting site
Anonymity and involvement
4. Social and Political Infrastructure
Benevolent Dictators
Who Can Be a Good Benevolent Dictator?
Consensus-based Democracy
Version Control Means You Can Relax
When Consensus Cannot Be Reached, Vote
When To Vote
Who Votes?
Polls Versus Votes
Vetoes
Writing It All Down
5. Money
Types of Involvement
Hire for the Long Term
Appear as Many, Not as One
Be Open About Your Motivations
Money Can't Buy You Love
Contracting
Review and Acceptance of Changes
Case study: the CVS password-authentication protocol
Funding Non-Programming Activities
Quality Assurance (i.e., Professional Testing)
Legal Advice and Protection
Documentation and Usability
Providing Hosting/Bandwidth
Marketing
Remember That You Are Being Watched
Don't Bash Competing Open Source Products
6. Communications
You Are What You Write
Structure and Formatting
Content
Tone
Recognizing Rudeness
Face
Avoiding Common Pitfalls
Don't Post Without a Purpose
Productive vs Unproductive Threads
The Softer the Topic, the Longer the Debate
Avoid Holy Wars
The "Noisy Minority" Effect
Difficult People
Handling Difficult People
Case study
Handling Growth
Conspicuous Use of Archives
Treat all resources like archives
Codifying Tradition
No Conversations in the Bug Tracker
Publicity
Announcing Security Vulnerabilities
Receive the report
Develop the fix quietly
CAN/CVE numbers
Pre-notification
Distribute the fix publicly
7. Packaging, Releasing, and Daily Development
Release Numbering
Release Number Components
The Simple Strategy
The Even/Odd Strategy
Release Branches
Mechanics of Release Branches
Stabilizing a Release
Dictatorship by Release Owner
Change Voting
Managing collaborative release stabilization
Release manager
Packaging
Format
Name and Layout
To capitalize or not to capitalize
Pre-releases
Compilation and Installation
Binary Packages
Testing and Releasing
Candidate Releases
Announcing Releases
Maintaining Multiple Release Lines
Security Releases
Releases and Daily Development
Planning Releases
8. Managing Volunteers
Getting the Most Out of Volunteers
Delegation
Distinguish clearly between inquiry and assignment
Follow up after you delegate
Notice what people are interested in
Praise and Criticism
Prevent Territoriality
The Automation Ratio
Automated testing
Treat Every User as a Potential Volunteer
Share Management Tasks as Well as Technical Tasks
Patch Manager
Translation Manager
Documentation Manager
Issue Manager
FAQ Manager
Transitions
Committers
Choosing Committers
Revoking Commit Access
Partial Commit Access
Dormant Committers
Avoid Mystery
Credit
Forks
Handling a Fork
Initiating a Fork
9. Licenses, Copyrights, and Patents
Terminology
Aspects of Licenses
The GPL and License Compatibility
Choosing a License
The MIT / X Window System License
The GNU General Public License
Is the GPL free or not free?
What About The BSD License?
Copyright Assignment and Ownership
Doing Nothing
Contributor License Agreements
Transfer of Copyright
Dual Licensing Schemes
Patents
Further Resources
A. Free Version Control Systems
B. Free Bug Trackers
C. Why Should I Care What Color the Bikeshed Is?
D. Example Instructions for Reporting Bugs
E. Copyright

Préface

Pourquoi écrire ce livre?

Dans les soirées, les gens ne me regardent plus bizarrement quand je leur dis que je fais du logiciel libre. « Ah oui, Open Source, comme Linux ? » me répondent-ils. J'acquiesce avec empressement. « Oui, exactement, c'est ce que je fais ! » C'est agréable de ne plus être complètement à l'écart. Dans le passé, la question suivante était généralement assez prévisible : « Comment gagnes-tu ta vie en faisant ça ? ». Ma réponse résumait toute l'économie de l'Open Source : certaines organisations trouvent leur intérêt dans l'existence d'un logiciel particulier, mais n'ont aucunement besoin d'en vendre des copies, elles veulent juste être sûres que le logiciel continue à exister et à être maintenu : en tant qu'outil plutôt que bien.

Ces derniers temps pourtant, la question suivante ne traitait pas toujours d'argent. Le côté économique de l'Open Source [1] n'est plus si mystérieux et beaucoup de non-programmeurs comprennent, ou du moins ne sont pas surpris, que certains y soient employés à plein temps. Par contre, la question entendue de plus en plus souvent était : « Ah ! Comment ça marche ? »

Je n'avais pas de réponse satisfaisante toute prête et, plus j'essayais d'en donner une, plus je réalisais à quel point ce sujet est complexe. Mener un projet de logiciel libre n'est pas exactement comme diriger une entreprise (imaginez devoir constamment discuter de la nature du produit avec un groupe de volontaires que vous ne rencontrerez jamais pour la plupart !). Pour différentes raisons, ce n'est pas non plus comme diriger une association à but non lucratif traditionnelle ou un gouvernement. Il y a des ressemblances entre tous ces types d'organisations, mais je suis lentement arrivé à la conclusion que le logiciel libre est « sui generis ». Il peut être comparé à de nombreuses choses, sans pouvoir être assimilé à aucune. En fait, même l'hypothèse selon laquelle un projet de logiciel libre peut être « dirigé » est limite. Un tel projet peut être démarré et influencé par les personnes impliquées, souvent même très fortement. Mais son capital ne peut être considéré comme la propriété d'une seule et unique personne et, tant qu'il y aura des individus, quelque part, peu importe où, désirant continuer le projet, nul ne pourra décider unilatéralement de l'arrêter. Chacun possède un pouvoir infini, personne ne possède le moindre pouvoir : le tout produisant une dynamique intéressante.

Voilà pourquoi j'ai voulu écrire ce livre. Les projets de logiciels libres ont donné naissance à une culture distincte, une philosophie où la liberté de créer un logiciel réalisant ce que l'on veut est la doctrine centrale. Pourtant, il résulte de cette liberté non pas une dispersion des individus, chacun allant dans sa propre direction avec le code, mais une collaboration enthousiaste. En effet, la capacité de collaborer est l'une des facultés les plus hautement estimées dans le monde du libre. Diriger de tels projets signifie s'engager dans une sorte de coopération hypertrophiée où l'habilité d'une personne non seulement à travailler avec les autres, mais aussi à proposer de nouvelles manières de travailler ensemble, peut aboutir à des avancées tangibles du logiciel. Ce livre tente d'expliquer comment rendre ceci possible. Il n'est en aucun cas exhaustif, mais c'est au moins un début.

Produire un bon logiciel libre est un but louable en soit, et j'espère que les lecteurs qui sont venus à la recherche de manières d'y parvenir vont être satisfaits de ce qu'ils vont trouver ici. Mais au delà de cela, j'espère également vous transmettre le réel plaisir de travailler avec une équipe motivée de développeurs de logiciel libre, et celui d'interagir avec les usagers de la manière merveilleusement directe que l' Open Source encourage. Participer à une projet de logiciel libre qui a du succès est amusant, et ultimement c'est ce qui permet à tout ce système de fonctionner.

Produire du bon logiciel libre est en soi un but valable, et j'espère que les lecteurs qui viennent chercher des moyens de l'atteindre seront satisfaits de ce qu'ils trouveront ici. Bien au-delà, j'espère transmettre un peu du pur plaisir que l'on peut prendre à travailler avec une équipe de développeurs Open Source motivée et à interagir avec les utilisateurs de cette manière merveilleusement directe encouragée par le monde du libre. Prendre part à un projet de logiciel libre réussi est amusant et c'est bien ce qui, au bout du compte, permet au système entier de fonctionner.

À qui s'adresse ce livre ?

Ce livre s'adresse aux développeurs de logiciels et aux responsables envisageant de lancer, ou ayant déjà commencé, un projet Open Source, et se demandant que faire maintenant. Il devrait également être utile aux gens souhaitant simplement s'impliquer dans un projet Open Source sans l'avoir jamais fait auparavant.

Nul besoin d'être programmeur, mais le lecteur devrait connaître les concepts basiques d'ingénierie logicielle tels que code source, compilateur et correctif.

Une expérience préalable avec le logiciel Open Source, en tant qu'utilisateur ou développeur, n'est pas nécessaire. Ceux qui ont déjà travaillé au sein d'un projet de logiciel libre trouveront certaines parties du livre évidentes et pourront passer ces sections. Étant donné le large éventail possible d'expériences des lecteurs, j'ai essayé de nommer clairement les différentes parties et de préciser celles pouvant être ignorées par les familiers du sujet.

Sources

Une grande partie de la matière première de ce livre provient de cinq ans d'expérience à travailler sur le projet Subversion (http://subversion.tigris.org/). Subversion est un programme Open Source de gestion de versions, écrit à partir de rien, et créé pour remplacer CVS comme gestionnaire de versions de référence de la communauté Open Source. Le projet fut lancé, au début des années 2000, par mon employeur CollabNet (http://www.collab.net/), lequel comprit, heureusement dès le départ, comment le faire fonctionner de manière collective et répartie. De nombreux développeurs bénévoles y adhérèrent très tôt. Aujourd'hui, le projet en compte une cinquantaine dont peu sont employés par CollabNet.

De bien des manières, Subversion est un exemple classique de projet Open Source et je m'en suis finalement inspiré plus que je ne le pensais au début, en partie parce que c'était plus pratique. Chaque fois que j'avais besoin d'un exemple pour illustrer un phénomène particulier, c'est une expérience issue de Subversion qui me venait à l'esprit. Mais aussi pour une question de vérification : quoiqu'engagé, à divers degrés, dans d'autres projets de logiciels libres et malgré mes entretiens avec des amis et connaissances impliqués dans bien d'autres encore, on réalise rapidement en écrivant en vue d'une publication que tous les faits doivent être vérifiés. Je ne voulais pas faire de déclarations à propos d'évènements se produisant dans d'autres projets en me basant simplement sur ce que je pouvais lire dans les archives de leur liste de diffusion publique. Si quelqu'un le faisait avec Subversion, je m'en rends bien compte, il se tromperait la moitié du temps. Donc quand je m'inspire ou que je prends exemple sur d'autres projets avec lesquels je n'ai pas eu d'expérience directe, j'ai toujours essayé de vérifier mes informations auprès de quelqu'un d'impliqué, quelqu'un en qui je pouvais faire confiance pour qu'il m'explique ce qui s'est vraiment passé.

J'ai travaillé sur Subversion pendant les cinq dernières années, mais je suis impliqué dans le logiciel libre depuis douze ans. Voici d'autres projets ayant influencés ce livre :

  • Le projet d'éditeur de texte GNU Emacs de la Free Software Foundation, pour lequel je maintiens quelques paquets.

  • Concurrent Versions System (CVS), sur lequel j'ai beaucoup travaillé en 1994–1995 avec Jim Blandy mais auquel je participe seulement par intermittence depuis.

  • L'ensemble des projets Open Source connus sous le nom d'Apache Software Foundation et plus spécialement l'Apache Portable Runtime (APR) et l'Apache HTTP Server.

  • OpenOffice.org, la base de données de Berkeley de Sleepycat et la base de données MySQL: je n'ai pas été personnellement impliqué dans ces projets mais je les ai observés et, dans certains cas, j'ai pu parler avec des personnes en faisant partie.

  • Le GNU Debugger (GDB) (idem).

  • Le projet Debian (idem).

Évidemment, cette liste n'est pas exhaustive. Comme la plupart des programmeurs Open Source, je garde un œil sur de nombreux projets différents, juste pour avoir une idée générale de ce qui se passe. Je ne vais pas tous les mentionner ici, mais certains sont cités dans le livre quand je l'ai jugé approprié.

Remerciements

Écrire ce livre m'a pris quatre fois plus de temps que je ne le pensais et, quel que soit le sujet, j'avais souvent l'impression de marcher avec une épée de Damoclès suspendue au-dessus de ma tête. Sans l'aide de nombreuses personnes, j'aurais été incapable de le finir en gardant toute ma tête.

Andy Oram, chez O'Reilly, a été pour moi l'éditeur idéal. En plus de très bien connaître le sujet (il a suggéré un grand nombre des thèmes), il a le don rare de savoir ce que quelqu'un veut dire et de pouvoir l'aider à le dire correctement. Travailler avec lui a été un honneur. Merci aussi à Chuck Toporek pour avoir immédiatement proposé ce projet à Andy.

Brian Fitzpatrick a relu tout mon travail à mesure que j'écrivais, ce qui a non seulement amélioré le livre mais m'a aussi permis de poursuivre l'écriture quand j'aurais souhaité me trouver n'importe où plutôt que devant mon ordinateur. Ben Collins-Sussman et Mike Pilato ont aussi gardé un œil sur l'avancement du travail et étaient toujours heureux de discuter, parfois longuement, quel que soit le sujet que j'essayais de traiter cette semaine-là. Ils remarquaient également quand j'avais tendance à me relâcher et me taquinaient si nécessaire. Merci les gars !

Biella Coleman rédigeait également un mémoire en même temps que j'écrivais ce livre. Elle sait ce que signifie s'asseoir et écrire tous les jours : elle fut pour moi aussi bien un exemple qu'une oreille compatissante. Elle possède également le regard fascinant de l'anthropologue sur le mouvement du logiciel libre, me fournissant à la fois des idées et des références que je pouvais utiliser. Alex Golub (un autre anthropologue, avec un pied dans le monde du logiciel libre, qui lui aussi terminait son mémoire au même moment) m'a apporté un soutien exceptionnel dès le début, ce qui m'a beaucoup aidé.

Micah Anderson n'a, d'une certaine manière, jamais semblé trop stressé par son propre contrat d'écriture, ce qui fut source de motivation mais me rendait maladivement jaloux. Il a toutefois toujours été présent par son amitié, sa conversation et (au moins à une occasion) son aide technique. Merci Micah !

Jon Trowbridge et Sander Striker m'ont apporté chacun leurs encouragements et une aide précieuse. Leur grande expérience du logiciel libre m'a fourni la matière que je n'aurais pu trouver nulle part ailleurs.

Merci à Greg Stein, non seulement pour son amitié et ses encouragements arrivés à point nommé, mais aussi pour avoir montré au projet Subversion à quel point une inspection régulière du code est importante pour le développement d'une communauté de programmeurs. Je remercie également Brian Behlendorf qui a fait entrer avec tact dans nos esprits l'importance des discussions publiques : j'espère que ce principe se reflète dans ce livre.

Merci à Benjamin « Mako » Hill et Seth Schoen pour les nombreuses discussions à propos du logiciel libre et de sa politique, à Zack Urlocker et Louis Suarez-Potts pour avoir trouvé quelques minutes dans leur emploi du temps surchargé afin de m'accorder une interview, à Shane de la liste Slashcode pour m'avoir permis de citer son article et à Haggen So pour sa comparaison des sites d'hébergement qui m'a énormément aidé.

Merci à Alla Dekhtyar, Polina et Sonya pour leur encouragement patient et sans faille. Je suis ravi de ne plus avoir à écourter (ou plutôt, essayer en vain d'écourter) nos soirées pour aller travailler sur « Le Livre ».

Merci à Jack Repenning pour son amitié, sa conversation et son entêtement à refuser d'accepter une analyse simple mais fausse quand une autre plus complexe mais juste existe. J'espère qu'une partie de sa longue expérience dans le développement et l'industrie du logiciel a déteint sur ce livre.

CollabNet s'est montré exceptionnellement généreux en me permettant d'adapter mon agenda pour écrire ce livre, sans se plaindre quand cela me prit plus de temps que prévu. Je ne connais pas tous les rouages d'une telle décision mais je suppose que Sandhya Klute, et par la suite Mahesh Murthy, y sont pour quelque chose. Je les remercie tous les deux.

Toute l'équipe de développement Subversion a été une source d'inspiration au cours de ces cinq dernières années et c'est auprès d'eux que j'ai appris l'essentiel de ce qui est expliqué dans ce livre. Je ne les remercierai pas nominativement, car ils sont trop nombreux, mais j'implore chaque lecteur qui croise un committer de Subversion de lui offrir un verre. C'est ce que je compte faire moi-même.

J'ai souvent pesté auprès de Rachel Scollon à propos de l'état de ce livre. Elle a toujours été prête à m'écouter et, d'une certaine manière, a réussi à faire paraître mes problèmes moins graves qu'ils ne l'étaient avant nos conversations. Cela m'a beaucoup aidé, merci.

Merci (encore) à Noel Taylor, qui a certainement dû se demander pourquoi je voulais écrire un autre livre, sachant à quel point je m'étais plaint la fois précédente, mais dont l'amitié et la conduite de Golosá m'ont aidé à préserver, dans mon existence, la musique et un cercle fraternel, même pendant les périodes les plus chargées. Merci aussi à Matthew Dean et Dorothea Samtleben, amis et compagnons musicaux que j'ai longtemps fait souffrir et qui se sont montrés très compréhensifs quand les excuses que je donnais pour ne pas répéter s'accumulaient. Megan Jennings m'a toujours soutenu et a montré un réel intérêt pour le sujet malgré son inexpérience dans le domaine, un vrai énergisant pour un écrivain qui doute. Merci mon amie !

J'ai eu quatre relecteurs compétents et persévérants pour ce livre : Yoav Shapira, Andrew Stellman, Davanum Srinivas et Ben Hyde. Si j'avais pu ajouter toutes leurs excellentes suggestions, ce livre serait encore meilleur. Des contraintes de temps m'ont forcé à faire des choix mais les améliorations sont quand même visibles. Toutes les erreurs qui subsistent sont entièrement les miennes.

Mes parents, Frances et Henry, m'ont apporté comme toujours un magnifique soutien, et comme ce livre est moins technique que le précédent, j'espère qu'ils le trouveront un peu plus lisible.

Pour finir, j'aimerais remercier les dédicataires : Karen Underhill et Jim Blandy. L'amitié et la compréhension de Karen représentent tout pour moi, non seulement pendant l'écriture de ce livre mais aussi au cours des sept dernières années. Je n'aurais simplement pas pu finir sans son aide. De même, Jim, véritable ami et maître hacker qui, le premier, m'a fait découvrir les logiciels libres tel l'oiseau enseignant à l'avion comment voler.

Note

Les pensées et opinions exprimées dans ce livre sont les miennes. Elles ne représentent pas nécessairement les idées de CollabNet ni du projet Subversion.



[1] Les termes « Open Source » et « libre » sont essentiellement des synonymes dans ce contexte: j'en parle plus dans la section intitulée la section intitulée « « Libre » contre « Open Source » » dans le Chapitre 1, Introduction.

Chapitre 1. Introduction

La plupart des projets de logiciel libre échouent.

Nous avons tendance à ne pas trop remarquer les échecs. Seuls ceux qui réussissent attirent notre attention, mais le nombre total[2]de projets de logiciels libres est si conséquent que leur visibilité reste néanmoins importante pour une quantité de réussites relativement faible. De même, nous n'entendons pas parler des échecs car l'insuccès est un non-évènement. Le moment précis où un projet cesse d'être viable n'existe pas : les gens s'en écartent et finissent par l'abandonner. Il se peut qu'à un moment donné un dernier changement soit apporté au projet mais l'auteur, à cet instant, ne sait pas qu'il s'agit du dernier. Comment définir la mort d'un projet ? Est-ce quand on n'y a plus travaillé activement depuis six mois ? Quand le nombre de ses utilisateurs cesse de croître, sans avoir dépassé celui des développeurs ? Et que dire d'un projet abandonné parce que ses développeurs, se rendant compte qu'ils reproduisaient un travail existant, se décident à rejoindre un autre projet pour l'améliorer en y intégrant une grande partie de leurs travaux précédents ? Le premier projet est-il mort ou a-t-il juste changé d'adresse ?

En raison de cette complexité, il est impossible de déterminer précisément le taux d'échec. Mais le constat qui ressort de plus d'une décennie dans l'Open Source, de quelques participations à SourceForge.net et d'un peu de « googlage » est le suivant : ce taux est extrêmement élevé, probablement de l'ordre de 90% à 95%. Il l'est encore plus si l'on y inclut les projets survivants mais qui fonctionnent mal, à savoir ceux qui produisent des applications utilisables sans pour autant être des lieux agréables ou progresser de manière aussi rapide et fiable qu'ils le pourraient.

L'objet de cet ouvrage est d'éviter l'échec. Il passe en revue non seulement les bonnes attitudes à adopter, mais aussi les mauvaises, afin que chacun puisse détecter et corriger les problèmes rapidement. Mon plus grand souhait est qu'après sa lecture, on dispose d'un répertoire de techniques pour, d'une part, éviter les pièges les plus courants du développement Open Source, et d'autre part, gérer la croissance et la maintenance d'un projet réussi. Le succès n'est pas un jeu à somme nulle et cet écrit ne parle ni de victoire ni de maitrise de la concurrence. En réalité, une part importante du développement d'un projet Open Source consiste à travailler sans heurt avec les autres projets apparentés. À long terme, chaque réussite contribue au bien-être du « logiciel libre », dans son ensemble et partout dans le monde.

Il serait tentant de dire les projets de logiciels libres ou propriétaires échouent pour les mêmes raisons. Le libre n'a pas le monopole des cahiers de charges farfelus, des spécifications floues, de la gestion déficiente des ressources humaines, des phases de conception insuffisantes et autres lutins malveillants bien connus de l'industrie du logiciel. La somme d'écrits traitant de ce sujet est déjà considérable, je ne m'étendrai donc pas sur ce sujet. J'essaierai plutôt de décrire les problèmes spécifiques au logiciel libre. Quand un projet s'écroule c'est souvent parce que les développeurs (ou les directeurs) n'ont pas su évaluer les problèmes propres au développement d'un logiciel Open Source, alors même qu'ils étaient rodés aux difficultés mieux connues du développement à code fermé.

Mesurez vos attentes vis à vis de l'Open Source, ne tombez pas dans le piège d'une trop grande attente. Une licence ouverte ne garantit pas que des hordes de développeurs se mettront immédiatement au service de votre projet et ouvrir le code d'un projet en difficulté ne le guérit pas automatiquement de tous ses maux. En fait, c'est plutôt le contraire : ouvrir un projet peut ajouter une nouvelle série de complications et coûter plus cher à court terme que le garder fermé. Ouvrir veut dire remanier le code pour le rendre compréhensible par quelqu'un de complètement étranger au projet, mettre en place un espace Web de développement, des listes de diffusion et, souvent, écrire pour la première fois la documentation. Tout ceci représente beaucoup de travail. Et bien sûr, si des développeurs intéressés se présentent, il faudra les intégrer, répondre à leurs questions, tout cela peut prendre un certain temps avant que vous ne perceviez les bénéfices de leur présence. Comme l'a dit Jamie Zawinski en parlant des périodes troubles du début du projet Mozilla :

L'Open Source fonctionne, mais ce n'est vraiment pas la panacée. La morale de cette histoire c'est qu'on ne peut pas prendre un projet moribond et le saupoudrer de poudre de perlimpinpin « Open Source » pour que tout se mette à marcher par magie. Le développement logiciel c'est difficile. Les solutions ne sont pas si simples.

(extrait de http://www.jwz.org/gruntle/nomo.html)

Lésiner sur la présentation et la création de paquets en remettant cela à plus tard, une fois le projet sur les rails, sont des erreurs liées. La présentation et la création de paquets comportent un nombre important de tâches visant à faciliter l'approche. Pour rendre le projet accueillant aux néophytes, il faut écrire la documentation développeur et utilisateur, créer pour le projet un site Web informant les curieux, automatiser autant que possible la compilation et l'installation du logiciel, etc. Beaucoup de programmeurs considèrent malheureusement ce travail comme secondaire par rapport au code lui-même. Et ceci pour plusieurs raisons. Premièrement, c'est à leurs yeux une perte de temps car ils n'en tirent pas directement les bénéfices contrairement aux personnes moins familières avec le projet. Après tout, les gens qui développent le projet n'ont pas vraiment besoin qu'on leur prépare des paquets. Ils savent déjà comment installer, administrer et utiliser le logiciel qu'ils ont écrit. Deuxièmement, les compétences requises pour la présentation et la création de paquets sont souvent complètement différentes de celles requises pour écrire le code. Les gens ont tendance à se concentrer sur ce qu'ils connaissent le mieux, même s'ils peuvent rendre un meilleur service au projet en consacrant un peu de temps aux choses qui leur conviennent moins. Le Chapitre 2, Genèse d'un projet, examine en détail les questions de présentation et de création de paquets. Il explique pourquoi il est essentiel d'en faire une priorité dès le lancement du projet.

Vient ensuite l'idée fausse que l'Open Source n'a guère besoin de gestion de projet et, qu'inversement les techniques de direction utilisées pour les développements en interne fonctionneront également pour le projet Open Source. Le management dans un projet Open Source n'est pas toujours très visible, mais dans les projets réussis il est toujours présent en coulisse d'une manière ou d'une autre. Inutile de pousser la réflexion très loin pour s'en rendre compte. Un projet Open Source consiste en un assemblage fortuit de programmeurs, catégorie déjà réputée pour son indépendance d'esprit, qui ne se sont très probablement jamais rencontrés et qui peuvent y participer en ayant chacun des objectifs personnels différents. Il est facile d'imaginer ce que deviendrait un tel groupe sans management. À moins d'un miracle, il s'écroulerait ou éclaterait très vite. Les choses ne vont pas marcher d'elles-même, que nous le voulions ou non. Mais le management, aussi actif soit-il, est le plus souvent informel, subtil et voilé. La seule chose qui maintienne un groupe de développeurs ensemble est la croyance partagée qu'ils peuvent faire plus collectivement qu'individuellement. Dans ce cadre, le management a pour but principal de faire en sorte qu'ils continuent à le croire, en établissant des formes de communication, en s'assurant que des développeurs utiles ne sont pas marginalisés en raisons de caractéristiques personnelles et, de manière générale, en faisant en sorte que le projet reste un espace où les développeurs ont envie de revenir. Les techniques spécifiques pour réussir cela sont abordées dans le reste de l'ouvrage.

Enfin, il y a une catégorie générique de problèmes qu'on pourrait appeler « échecs de navigation culturelle ». Il y a dix ou même cinq ans, il aurait été prématuré de parler d'une culture mondiale du logiciel libre : plus maintenant. Une culture reconnaissable a émergé lentement et bien qu'elle ne soit pas monolithique (elle est autant sujette à la dissidence et aux factions que n'importe quelle culture géographiquement définie), elle est basée sur un noyau fondamentalement solide. La plupart des projets Open Source réussis affichent quelques-unes sinon toutes les caractéristiques de ce noyau. Ils récompensent certains types de comportements et en punissent d'autres, ils créent une atmosphère qui encourage la participation non planifiée (parfois au détriment de la coordination centralisée), ils possèdent leur propre conception de la politesse et de la brutalité pouvant différer foncièrement de celle qui prévaut ailleurs. Et, chose primordiale, les participants de longue date ont généralement intériorisé ces critères au point d'être plus ou moins unanimes sur les comportements attendus. Les projets qui échouent généralement s'écartent de manière significative de ce noyau, même involontairement, et n'ont pas de définition unanime de la constitution du comportement raisonnable par défaut. Ainsi, quand les problèmes surgissent, la situation peut se détériorer rapidement, les participants ne disposant pas d'un stock de réflexes culturels établis auxquels recourir pour résoudre les différends.

Cet ouvrage est un guide pratique et non une étude anthropologique ou historique. Cependant, une réelle connaissance des origines de la culture actuelle du logiciel libre est une base essentielle pour tout conseil pratique. Une personne qui en comprend la culture peut parcourir, en long et en large, le monde de l'Open Source, rencontrer maintes variantes locales des coutumes et des dialectes, tout en étant capable de participer partout avec aisance et efficacité. En revanche, pour quelqu'un ne comprenant pas cette culture, le processus d'organisation ou de participation à un projet sera difficile et plein de surprises. Le nombre de personnes qui développent des logiciels libres ne cessant de croître à toute allure, cette dernière catégorie ne désemplit pas. C'est en grande partie une culture de nouveaux migrants, et ça continuera à l'être pendant un certain temps. Si vous pensez être l'un d'eux, la section suivante vous fournit l'arrière-plan des discussions que vous entendrez plus tard, aussi bien dans ce livre que sur Internet (d'autre part, si vous travaillez dans le monde du libre depuis un moment, vous en savez peut-être déjà pas mal sur son histoire. Dans ce cas, n'hésitez pas à sauter la prochaine section).

Historique

Le partage des logiciels est aussi ancien que les logiciels eux-mêmes. Aux premiers temps des ordinateurs, les fabricants, sentant que les bénéfices économiques se trouvaient principalement dans la production de matériel, ne prêtèrent guère attention aux logiciels et leurs atouts financiers. Un grand nombre d'acheteurs de ces premières machines étaient des scientifiques ou des techniciens capables de modifier et d'améliorer eux-mêmes les logiciels livrés avec les machines. Parfois les clients distribuaient leurs correctifs non seulement aux fabricants, mais aussi aux propriétaires de machines semblables. Les fabricants toléraient, voire encourageaient ceci : pour eux, l'amélioration des logiciels, peu importe la source, rendait leurs machines plus attrayantes aux yeux d'autres acheteurs potentiels.

Bien que cette période ressemblât, sous de nombreux aspects, à la culture des logiciels libres d'aujourd'hui, elle en déviait sur deux points importants. Premièrement la standardisation du matériel n'était pas vraiment à l'ordre du jour. C'était une époque d'innovation florissante pour la fabrication d'ordinateurs, mais face à la diversité des architectures la compatibilité n'était pas une priorité. Ainsi, les logiciels écrits pour une machine ne fonctionnaient en général pas sur une autre. Les programmeurs avaient tendance à se spécialiser dans une architecture ou dans une famille d'architectures (alors qu'aujourd'hui, ils auraient plutôt tendance à se spécialiser dans un langage de programmation, ou une famille de langages, sachant pertinemment que leur savoir sera transférable sur n'importe quel système auquel ils se trouveraient confrontés). Comme leur expertise tendait à être spécifique à un type d'ordinateur, l'accumulation des savoirs avait pour effet de le rendre plus attractif à leurs yeux et à ceux de leurs collègues. C'était alors dans l'intérêt du constructeur de voir les codes et connaissances spécifiques à sa machine se répandre le plus possible.

Deuxièmement, l'Internet alors n'existait pas. Bien que les restrictions légales vis à vis du partage fussent moins fortes qu'aujourd'hui, elles étaient plus nombreuses sur le plan technique. Les moyens pour transporter les données d'un endroit à un autre étaient peu pratiques et encombrants, comparés à maintenant. Il existait des petits réseaux locaux, pratiques pour partager des informations entre employés du même laboratoire de recherche ou de la même entreprise. Mais certaines barrières restaient à surmonter si l'on voulait partager avec le monde entier tout en s'affranchissant des contraintes géographiques. Ces difficultés ont été surmontées dans de nombreux cas. Parfois des groupes entraient en contact les uns avec les autres indépendamment, en s'envoyant des disquettes ou des cassettes par courrier. Parfois les fabricants eux-mêmes centralisaient les correctifs. Un point positif étant qu'une grande partie des premiers développeurs travaillaient au sein d'universités où la publication des connaissances d'une personne est attendue. Mais les réalités physiques de l'échange de données entraînaient un temps de latence, un retard proportionnel à la distance (physique ou organisationnelle) que le logiciel avait à parcourir. Le partage souple et à grande échelle, comme nous le connaissons aujourd'hui, était impossible alors.

L'avènement des logiciels propriétaires et des logiciels libres

À mesure que l'industrie mûrissait, plusieurs changements liés se sont produits en même temps. La jungle des normes matérielles a progressivement permis l'émergence de quelques lauréats : lauréats grâce à une meilleure technologie, une meilleure stratégie voire la combinaison des deux. En même temps, et pas entièrement par hasard, le développement de langages de programmation dits de « haut niveau » signifiait que l'on pouvait écrire un programme une seul fois, dans un langage, et le voir automatiquement traduit (compilé) pour qu'il puisse fonctionner sur différents types d'ordinateurs. Les constructeurs de matériel n'ont pas manqué de voir les implications : un client pouvait, dès ce moment, entreprendre un travail considérable d'ingénierie logicielle sans s'enchaîner nécessairement à une architecture particulière. Ceci, couplé à une diminution des écarts de performances entre les différents ordinateurs (les conceptions les moins performantes étant évincées), faisait qu'un fabriquant qui misait tout sur le matériel pouvait s'attendre à voir baisser ses marges dans un futur proche. La puissance brute des ordinateurs n'était plus suffisante pour créer un avantage décisif, la concurrence s'est alors déplacée sur le terrain des logiciels. Vendre des logiciels, ou au moins leur donner la même importance qu'au matériel, devenait la nouvelle stratégie gagnante.

Cela signifiait que les fabricants devaient commencer à faire respecter le droit d'auteur sur leur code de manière plus stricte. Si les utilisateurs continuaient à partager et modifier le code de manière libre entre eux, ils pourraient reproduire certaines des améliorations désormais vendues en tant que « valeur ajoutée » par le fournisseur. Pire encore, le code partagé pourrait tomber entre les mains des concurrents. L'ironie est que tout ceci se passait à l'époque où Internet commençait à pointer son nez. Alors même que les obstacles au partage de logiciels tombaient, la mue du marché des ordinateurs l'a rendu économiquement indésirable, au moins du point de vue de n'importe quelle entreprise. Les fournisseurs ont renforcé leurs positions, soit en refusant aux utilisateurs l'accès au code faisant fonctionner leurs machines, soit en imposant des accords de confidentialité rendant tout partage impossible.

Résistance consciente

Alors que le monde du libre échange de code s'affaiblissait, l'idée d'une riposte germait dans l'esprit d'au moins un programmeur. Richard Stallman travaillait au laboratoire d'intelligence artificielle au MIT (Massachusetts Institue of Technology) dans les années 70 et au début des années 80, durant une période qui s'est révélée être l'âge d'or du partage de code. Le laboratoire avait une forte « éthique de hacker »,[3]et les gens n'étaient pas seulement encouragés à partager les améliorations qu'ils avaient pu apporter au système, c'était ce qu'on attendait d'eux. Comme Stallman écrivit plus tard :

Nous n'appelions pas nos logiciels « logiciels libres » parce que ce terme n'existait pas, mais c'est bien ce que c'était. Si des gens d'une autre université ou d'une entreprise voulait utiliser nos programmes, nous leur en donnions volontiers la permission. Si vous voyiez quelqu'un utiliser un programme intéressant que vous n'aviez jamais rencontré, vous pouviez toujours lui en demander le code source, pour pouvoir le lire, le modifier ou en piller une partie pour faire un nouveau programme.

(tiré de http://www.gnu.org/gnu/thegnuproject.html)

Cette communauté utopique s'est effondrée autour de Stallman peu après 1980, quand le laboratoire a finalement été rattrapé par les changements qui s'opéraient dans le reste de l'industrie. Une startup débaucha de nombreux programmeurs du laboratoire pour développer un système d'exploitation proche de celui sur lequel ils travaillaient, mais désormais sous licence exclusive. Au même moment, le laboratoire faisait l'acquisition de nouveaux équipements livrés avec un système d'exploitation propriétaire.

Stallman voyait très bien où cette tendance les conduisait :

Les ordinateurs modernes de ce temps, comme le VAX ou le 68020 avaient leur propre système d'exploitation, mais ni l'un ni l'autre n'était libre, vous deviez signer un accord de confidentialité, même pour en obtenir une copie fonctionnelle.

Ce qui signifiait que la première démarche à faire pour utiliser un ordinateur était de promettre de ne pas aider son voisin. Une communauté coopérative était proscrite. La règle mise en place par les créateurs des logiciels propriétaires était : « Si vous partagez avec votre voisin vous êtes un pirate. Si vous désirez des changements suppliez nous de les faire ».

Mais Stallman n'était pas un hacker comme les autres et sa personnalité le poussa à s'opposer à cette tendance. Plutôt que de continuer à travailler dans un laboratoire décimé ou de trouver un emploi de programmeur dans l'une de ces nouvelles compagnies où les fruits de son travail seraient enfermés dans une boîte, il a démissionné pour lancer le projet GNU et la Free Software Foundation (FSF). Le but du projet GNU[4]était de développer un système d'exploitation pour ordinateur complètement libre et ouvert ainsi qu'un ensemble d'applications logiciels dans lequel les utilisateurs ne seraient jamais empêchés d'hacker ou de partager leurs modifications. Il entreprenait en fait de recréer ce qui avait été détruit au laboratoire d'intelligence artificielle mais à l'échelle mondiale et sans les points faibles à l'origine de la destruction de l'esprit du laboratoire.

En plus de travailler sur le nouveau système d'exploitation, Stallman conçut une licence dont les termes garantissent que son code restera libre à tout jamais. La GNU General Public License (GPL) est une pirouette légale bien pensée : elle dit que le code peut être copié ou modifié sans restriction et que les copies ou le travail dérivé ( c'est-à-dire les versions modifiées ) doivent être distribuées sous la même licence que l'original, sans y ajouter de restrictions. En fait, elle utilise les lois du droit d'auteur pour produire l'effet opposé du copyright traditionnel : plutôt que de limiter la diffusion du logiciel, elle empêche quiconque, même l'auteur, de la restreindre. Pour Stallman ça valait mieux que de simplement placer son code dans le domaine public. En appartenant au domaine public, n'importe quelle copie aurait pu être incorporée dans un programme propriétaire (ce qui s'est passé avec du code publié sous des licences trop permissives). Même si cette incorporation ne diminuerait en rien la disponibilité continue du code originel, cela aurait signifié que les efforts de Stallman auraient pu profiter à l'ennemi, le logiciel propriétaire. La GPL peut être vue comme une forme de protection pour le logiciel libre car elle empêche les logiciels non-libres de profiter du code sous licence GPL. La GPL et ses relations avec d'autres licences de logiciels libres sont présentées en détails dans le Chapitre 9, Licenses, Copyrights, and Patents.

Avec l'aide de nombreux développeurs, certains d'entre eux partageant l'idéologie de Stallman et d'autres voulant simplement voir un maximum de code libre disponible, le projet GNU commença à produire des équivalents libres pour beaucoup d'éléments importants d'un système d'exploitation. Grâce à la standardisation du matériel informatique et des logiciels, il était devenu possible d'utiliser les équivalents GNU sur des systèmes non-libres, et beaucoup l'ont fait. L'éditeur de texte GNU (Emacs) et le compilateur C (GCC) en particulier ont rencontré un succès important, rassemblant, non pour l'idéologie qu'ils véhiculaient mais simplement pour leurs mérites techniques, une grande communauté d'utilisateurs loyaux. À l'orée de 1990, GNU avait produit l'essentiel d'un système d'exploitation libre à l'exception du noyau (la partie sur laquelle démarre la machine et qui est en charge de la gestion de la mémoire, des disques et des autres ressources systèmes).

Malheureusement le projet GNU avait fait le choix d'une conception de noyau qui s'est révélée plus difficile à mettre en œuvre que prévu. Le retard qui s'ensuivit, empêcha la Free Software Foundation de sortir le premier système d'exploitation libre. La dernière pièce a, finalement, été mise en place par Linus Torvalds, un étudiant en informatique finlandais qui, avec l'aide de volontaires du monde entier, avait compilé un noyau libre de conception plus classique. Il le nomma Linux, lequel, une fois combiné aux programmes GNU existants, donna un système d'exploitation entièrement libre. Pour la première fois vous pouviez allumer votre ordinateur et travailler sans utiliser un seul logiciel propriétaire. [5]

La plupart des logiciels de ce nouveau système d'exploitation n'étaient pas produits par le projet GNU. En fait, GNU n'était même pas le seul groupe travaillant à l'élaboration d'un système d'exploitation libre (par exemple, le code qui deviendra NetBSD et FreeBSD était déjà en développement à cette époque). Mais l'importance de la Free Software Foundation n'était pas seulement dans le code écrit par ses membres, mais aussi dans leur discours politique. En mettant en avant le logiciel libre comme une cause à part entière, plutôt qu'une commodité, ils lui ont donné une dimension politique difficile à ignorer par les programmeurs. Même les gens en désaccord avec la FSF ont dû traiter de ce problème, parfois pour revendiquer une position différente. L'efficacité du message de la FSF tient au fait qu'il est lié au code grâce à la GPL et d'autres textes. Alors que le code se répand partout, le message se diffuse également.

Résistance accidentelle

Bien que la scène naissante du logiciel libre ait été très dynamique, peu d'activités furent aussi clairement idéologiques que le projet GNU de Stallman. L'un des plus importants était le projet Berkeley Software Distribution (BSD),une ré-implémentation graduelle du système d'exploitation Unix (qui jusqu'à la fin des années 70 fut un projet de recherche vaguement propriétaire chez AT&T) par des programmeurs de l'Université de Californie de Berkeley. Le groupe BSD ne prit pas ouvertement position sur la nécessité des programmeurs de s'unir et de partager, mais ils mirent en pratique cette idée avec style et enthousiasme en coordonnant un gros effort de développement coopératif grâce auquel les lignes de commande des utilitaires Unix, les librairies de code et finalement le noyau du système d'exploitation lui-même furent ré-écrits entièrement, principalement par des volontaires. Le projet BSD devint un exemple important de développement non-idéologique de logiciel libre et servit également de terrain d'entraînement à de nombreux développeurs qui continueront par la suite à être actifs dans le monde de l'Open Source.

Another crucible of cooperative development was the X Window System, a free, network-transparent graphical computing environment, developed at MIT in the mid-1980's in paUn autre nid de développement coopératif fut le système X Window, un environnement graphique, libre et transparent au réseau, développé au MIT au milieu des années 80 en partenariat avec des vendeurs de matériel ayant un intérêt commun à pouvoir proposer à leurs clients un système graphique. Loin de s'opposer aux logiciels propriétaires, la licence X permettait délibérément l'ajout d'extensions propriétaires au cœur libre du système, chaque membre du consortium souhaitant pouvoir améliorer la distribution X de base et, par ce moyen, obtenir un avantage concurrentiel sur les autres. X Windows[6]en lui même était un logiciel libre, mais essentiellement par volonté de placer les concurrents sur un même pied d'égalité, et non pas comme un désir quelconque de mettre un terme à l'hégémonie des logiciels propriétaires. Devançant le projet GNU de quelques années, TeX, le système libre de traitement de texte de Donald Knuth permettant la création de documents de grande qualité, en est un autre exemple. Il fut publié sous une licence permettant à quiconque de modifier et de distribuer le code mais qui interdisait de nommer le résultat « TeX » s'il n'avait pas passé une série de tests de compatibilité stricts ( c'est là un exemple de licence libre de « protection de marque déposée » que nous approfondirons dans le Chapitre 9, Licenses, Copyrights, and Patents). Knuth n'exprimait aucun avis, d'une façon ou d'une autre, sur la question de l'opposition des logiciels libres aux logiciels propriétaires : il cherchait simplement un meilleur traitement de texte pour achever son véritable but, un livre sur la programmation informatique, et ne voyait aucune raison de ne pas offrir son système au monde une fois celui-ci prêt.

Sans faire une liste de tous les projets et de toutes les licences, on peut dire qu'à la fin des années 80, il existait de nombreux logiciels disponibles sous une grande variété de licences. La diversité des licences reflétait une diversité équivalente des motivations. Même certains des programmeurs choisissant la GNU GPL n'étaient pas aussi déterminés idéologiquement que le projet GNU lui-même. Et s'ils appréciaient de travailler sur des logiciels libres, de nombreux développeurs ne considéraient pas les logiciels propriétaires comme un mal social. Quelques personnes ressentaient un besoin moral de débarrasser le monde des « logiciels panneaux d'affichage » (appellation de Stallman pour les logiciels propriétaires), mais d'autres étaient plutôt motivées par l'émulation technique ou le plaisir de travailler avec des collaborateurs partageant leurs idées, voire même, par le simple désir humain de gloire. Pourtant généralement, ces diverses motivations n'interagissaient pas de manière destructive. C'est en parti dû au fait que les logiciels, contrairement à d'autres œuvres créatives comme la prose ou les arts visuels, doivent réussir des tests semi-objectifs pour être considérés comme des succès : ils doivent fonctionner et ne pas trop comporter de bogues. Cela donne aux participants du projet une base commune, une raison et un cadre pour travailler ensemble sans trop se préoccuper des qualifications autres que techniques.

Les développeurs avaient une autre raison de rester unis : il s'est avéré que le code produit par le monde des logiciels libres était de très bonne qualité. Dans certains cas, ils étaient notablement techniquement supérieurs à leur équivalent non-libre le plus proche, dans d'autres cas ils étaient au moins comparables, et bien sûr, toujours meilleur marché. Alors que seulement quelques personnes auraient pu être motivées pour produire des logiciels libres sur des bases strictement philosophiques, bien plus l'étaient du fait des meilleurs résultats obtenus. De plus, il y avait toujours, parmi les utilisateurs, un certain pourcentage prêt à faire don de leur temps et de leurs compétences afin d'aider à entretenir et améliorer le logiciel.

Cette tendance à produire du bon code n'était certainement pas universelle, mais cela se produisait à une fréquence croissante dans les projets de logiciels libres du monde entier. Les entreprises tributaires de logiciels commencèrent progressivement à le remarquer. Beaucoup d'entre elles découvrirent qu'elles utilisaient déjà, sans le savoir, des logiciels libres pour leurs opérations quotidiennes (les dirigeants ne sont pas toujours au courant des choix de leur service informatique). Les sociétés entreprirent de s'impliquer davantage dans les projets de logiciels libres en mettant à disposition du temps et des équipements, voire même en finançant le développement. De tels investissements pouvaient, dans les meilleurs scénarii, se montrer très lucratifs par un coefficient de retour conséquent. Le commanditaire ne paie qu'un nombre réduit de programmeurs experts pour se consacrer à plein temps au projet, mais récolte les fruits de la collaboration de chacun, y compris du travail des bénévoles et des développeurs payés par d'autres sociétés.

« Libre » contre « Open Source »

Alors que le monde de l'entreprise prêtait de plus en plus attention aux logiciels libres, les programmeurs se trouvèrent confrontés à de nouveaux problèmes de représentation. L'un d'entre eux était le mot « free » lui-même. Entendant pour la première fois « free software », nombre de personne pensait, à tort, que cela signifiait « logiciel gratuit ». S'il est vrai que tous les logiciels libres ne coûtent rien,[7], tous les logiciels gratuits ne sont pas libres. Par exemple, durant la bataille des navigateurs dans les années 90 et dans la course aux parts de marché qu'ils se livraient, Netscape et Microsoft offraient leurs navigateurs Web gratuitement. Aucun des deux n'était libre dans le sens des « logiciels libres ». Vous ne pouviez pas obtenir le code source et, même si vous y parveniez, vous n'aviez pas le droit de le modifier ni de le redistribuer [8] Vous pouviez tout juste télécharger un exécutable et le lancer. Les navigateurs n'étaient pas plus libres que les logiciels sous film plastique achetés en magasin : tout au plus étaient-ils diffusés à prix inférieur.

La confusion sur le mot « free » est entièrement due à l'ambivalence malheureuse du terme en anglais. La plupart des autres langues font une distinction entre la notion de prix et de liberté (la différence entre gratuit et libre est évidente pour ceux parlant une langue romane par exemple). Mais l'anglais étant de facto la langue d'échange sur Internet, le problème spécifique à cette langue concerne, à un certain degré, tout le monde. L'incompréhension liée au mot « free » était telle que les programmeurs de logiciels libres ont fini par créer une formule en réponse : « C'est free (libre) comme dans freedom (liberté), pensez à free speech (liberté de parole), pas à free beer (bière gratuite) ». Reste que devoir l'expliquer sans cesse est fatigant. De nombreux programmeurs ressentaient, à juste titre, que l'ambiguïté du mot « free » rendait plus difficile la compréhension par le public de ce type de logiciels.

Cependant, le problème apparaissait plus profond que cela. Le mot « libre » était vecteur d'une connotation morale indéniable : si la liberté était une fin en soi, peu importait que les logiciels libres soient également meilleurs ou plus profitables à certaines sociétés dans certaines circonstances. Ce n'étaient là que les effets secondaires bienvenus d'une motivation qui, au fond, n'était ni technique ni commerciale mais morale. En outre, l'idée « Libre comme dans liberté » imposait une flagrante contradiction aux sociétés souhaitant soutenir certains logiciels libres dans un domaine particulier de leurs activités, mais voulant continuer à commercialiser des logiciels propriétaires dans d'autres branches.

Ces dilemmes touchèrent une communauté déjà promise à une crise d'identité. Les programmeurs qui écrivent des logiciels libres n'ont jamais réussi à se mettre d'accord sur le but final, s'il existe, du mouvement des logiciels libres. Même dire que les opinions vont d'un extrême à l'autre serait trompeur, car cela implique une vision linéaire au lieu de la dispersion multidimensionnelle existante. Cependant, on peut définir deux grands courants de pensée, si vous acceptez de passer outre les détails pour le moment. L'un des groupes partage la pensée de Stallman que la liberté de partager et modifier est la plus importante, et qu'en conséquence, si vous cessez de parler de liberté, vous abandonnez l'idée principale. Pour d'autres, c'est le logiciel qui compte et ils ne se voient pas dire que les logiciels propriétaires sont par définition mauvais. Certains programmeurs de logiciels libres, mais pas tous, pensent que l'auteur (ou l'employeur dans le cas de travail rémunéré) devrait avoir le droit de contrôler les termes de la distribution et qu'aucun jugement moral ne devrait été rattaché au choix de termes particuliers.

Pendant longtemps personne n'a vraiment eu à prêter attention à ces différences ou à leurs interférences, mais le succès grandissant des logiciels libres dans le monde de l'entreprise a rendu cette question inévitable. En 1998, le terme Open Source a été inventé, en tant qu'alternative à « libre », par une coalition de programmeurs devenue par la suite The Open Source Initiative (OSI).[9]. Pour l'OSI non seulement le terme « logiciel libre » était déroutant, mais le mot « libre » n'était qu'un symptôme d'un problème plus général : le mouvement avait besoin d'une stratégie de commercialisation pour s'adapter au monde de l'entreprise, et les discours sur les avantages moraux et sociaux du partage ne pénètreraient pas les conseils des sociétés. Selon eux :

L'Open Source Initiative est un programme de commercialisation des logiciels libres. C'est un argumentaire en faveur des « logiciels libres » qui s'appuie sur une solide base pragmatique plutôt que sur une idéologie dévote. La substance gagnante n'a pas changé, l'attitude et le symbolisme perdants, eux, ont changés. ...

Le point qui doit être exposé à la plupart des techniciens n'est pas le concept de l'Open Source, mais le nom. Pourquoi ne pas l'appeler, comme nous l'avons toujours fait, logiciel libre ?

Une raison simple est que le terme « logiciel libre » peut facilement prêter à confusion et mener au conflit. ...

Mais la vraie motivation de ce changement de nom est économique. Nous essayons maintenant de vendre notre concept au monde de l'entreprise. Nous avons un produit compétitif, mais notre positionnement, dans le passé, était très mauvais. Le terme « logiciel libre » n'a pas été compris correctement par les hommes d'affaires qui ont pris le désir de partage pour de l'anti-capitalisme, ou pire encore, pour du vol.

Les décideurs des principales grandes entreprises n'achèteront jamais un « logiciel libre ». Mais si nous prenons les mêmes idées, les mêmes licences de logiciel libre et que l'on remplace le nom par « Open Source » ? Alors ils achèteront. Certains hackers ont du mal à y croire, mais c'est parce que ce sont des techniciens qui pensent aux choses concrètes, palpables et qui ne comprennent pas l'importance de l'image quand vous vendez quelque chose.

Dans le monde des affaires, l'apparence est une réalité. L'apparence que nous voulons abattre les barricades et travailler avec le monde des affaires compte au moins autant que la réalité de notre comportement, nos convictions et nos logiciels.

(Tiré et traduit de http://www.opensource.org/advocacy/faq.php et http://www.opensource.org/advocacy/case_for_hackers.php#marketing)

La pointe de l'iceberg de la polémique apparaît dans ce texte. Il mentionne « nos convictions » mais évite intelligemment de citer précisément quelles sont ces convictions. Pour certains, ça peut être la conviction que le code développé selon un processus ouvert sera meilleur, pour d'autres ça peut-être la conviction que toutes les informations devraient être partagées. L'usage du mot « vol » fait (sûrement) référence à la copie illégale, dont beaucoup se défendent : ce n'est pas un vol si personne n'est dépossédé de son bien. On retrouve aussi le raccourci facile, mais injuste, entre logiciels libres et anti-capitalisme, sans débattre du bien fondé de cette accusation.

Rien de tout cela ne signifie que le site de l'OSI est incomplet ou trompeur. Il ne l'est pas. Au contraire, c'est la vitrine de ce qui manquait au mouvement du logiciel libre d'après l'OSI : une bonne stratégie commerciale où « bon » veut dire : « viable dans le monde des affaires ». L'Open Source Initiative a offert à beaucoup de gens exactement ce qu'ils cherchaient : un vocabulaire pour parler des logiciels libres avec un plan de développement et une stratégie commerciale, plutôt qu'une croisade idéologique.

L'apparition de l'Open Source Initiative a transformé l'horizon des logiciels libres. Elle a reconnu une dichotomie longtemps inavouée et, par là même, a forcé le mouvement à reconnaître qu'il avait autant une politique interne qu'externe. On voit aujourd'hui que les deux groupes ont dû trouver un terrain d'entente puisque la plupart des projets font participer des programmeurs des deux camps aussi bien que d'autres n'entrant pas clairement dans l'une ou l'autre des catégories. Cela ne veut pas dire que les gens ne parlent jamais de motivations idéologiques, on fait parfois référence aux manquements à la « morale de hacker » traditionnelle par exemple. Mais il est rare de voir un développeur de l'un deux mondes douter ouvertement des motivations profondes de ses collègues. La contribution transcende le participant. Si quelqu'un produit du bon code, personne ne lui demande s'il le fait pour des raisons morales, ou parce que son employeur le paie pour cela, ou parce qu'il étoffe son Curriculum Vitae ou quoique ce soit d'autre. L'évaluation et les critiques de la contribution se font sur des critères techniques. Même des organisations ouvertement politiques comme le projet Debian dont le but est d'offrir un environnement 100% libre (c'est-à-dire « libre comme dans liberté ») sont plutôt ouvertes à l'intégration de code non-libre et à la coopération avec des programmeurs qui ne partagent pas exactement les mêmes buts.

La situation actuelle

Quand vous dirigez un projet de logiciel libre, vous n'aurez pas besoin de discuter de lourds problèmes philosophiques tous les jours. Les programmeurs n'auront pas à cœur de rallier tous les participants du projet à leurs idées sur tous les sujets (ceux qui insistent là dessus se retrouvent rapidement incapables de travailler sur un projet). Mais vous devez savoir que la question « libre » contre « Open Source » existe, au moins pour éviter de dire des choses inamicales à certains des participants, mais aussi parce que comprendre les motivations des développeurs est la meilleure manière, la seule manière en un certain sens, de diriger un projet.

Le logiciel libre est une culture par choix. Pour y naviguer avec succès, vous devez comprendre pourquoi les gens ont fait le choix d'y participer en premier lieu. Les manières coercitives ne fonctionnent pas. Si les gens ne sont pas heureux au cœur d'un projet, ils vont simplement s'en écarter pour en rejoindre un autre. Le logiciel libre est remarquable, même au sein des communautés de volontaires, par la légèreté de l'investissement. La plupart des gens engagés n'ont jamais vraiment rencontrés d'autres participants face à face, et donnent simplement un peu de leur temps quand ils en ressentent l'envie. Les moyens usuels, par lesquels les humains établissent des liens entre eux et forment des groupes qui durent, sont restreints à leur plus simple expression : des mots écrits transmis par des fils électriques. À cause de cela, la formation d'un groupe soudé et dévoué peut prendre du temps. Inversement, un projet peut facilement perdre un volontaire potentiel dans les cinq premières minutes de présentation. Si un projet ne fait pas bonne impression, les nouveaux venus lui donnent rarement une seconde chance.

La brièveté, ou plutôt la brièveté potentielle, des relations est peut-être l'obstacle le plus intimidant au commencement d'un nouveau projet. Qu'est ce qui persuadera tous ces gens de rester groupés suffisamment longtemps pour produire quelque chose d'utile ? La réponse à cette question est suffisamment complexe pour être le sujet du reste de ce livre, mais si elle devait être exprimée en une seule phrase, ce serait :

Les gens doivent sentir que leur relation à un projet, et leur influence sur celui-ci, est directement proportionnelle à leurs contributions.

Aucune catégorie de développeurs, ou de développeurs potentiels, ne devrait se sentir mise à l'écart ou être discriminée pour des raisons non-techniques. Les projets portés par une société et/ou des développeurs salariés doivent y faire particulièrement attention, le Chapitre 5, Moneyaborde ce sujet plus en détail. Bien sûr, cela ne veut pas dire que, si vous ne bénéficiez pas du financement d'une société, vous n'avez à vous soucier de rien. L'argent n'est que l'un des nombreux facteurs pouvant influencer le succès d'un projet. Il y a aussi les questions de choix du langage, de la licence, du processus de développement, du type précis d'infrastructure à mettre en place, de la manière de rendre public la base du projet de manière efficace et bien plus encore. Commencer un projet du bon pied est le sujet du prochain chapitre.



[2] SourceForge.net, un site d'hébergement populaire. Il contenait 79 225 projets enregistrés à la mi-avril 2004. On est bien sûr très loin de la quantité totale de projets de logiciels libres sur Internet: c'est juste le nombre ayant choisi d'utiliser SourceForge.

[3] Stallman utilise le mot « hacker » pour désigner « une personne qui aime programmer et qui adore faire le malin avec ça » mais pas « quelqu'un qui s'introduit dans les ordinateurs » comme le voudrait le nouveau sens.

[4] GNU est l'acronyme de « GNU is Not Unix » et « GNU » dans cette extension signifie... la même chose.

[5] Techniquement, Linux n'était pas le premier. Un système d'exploitation libre pour les ordinateurs compatibles IBM, appelé 386BSD, est sorti peu avant Linux. Cependant, il était bien plus compliqué de faire marcher 386BSD. Linux a créé des remous non seulement parce qu'il était libre mais aussi parce qu'il avait vraiment une bonne chance de faire marcher l'ordinateur sur lequel vous l'aviez installé.

[6] Ils préfèrent qu'il soit appelé le « Système X Window », mais en pratique on l'appelle « X Window » parce que les trois mots sont trop encombrants.

[7] On peut faire payer quelque chose en échange de copies d'un logiciel libre, mais comme on ne peut pas empêcher l'acheteur de le redistribuer gratuitement, le prix tend immédiatement vers zéro.

[8] Finalement le code source de Netscape Navigator a été publié sous une licence libre, en 1998, et deviendra la base du navigateur Web Mozilla. Voir sur http://www.mozilla.org/.

[9] La page Web de l'OSI est : http://www.opensource.org/.

Chapitre 2. Genèse d'un projet

Le modèle d'initialisation des projets libres a été décrit par Eric Raymond dans son papier de référence La cathédrale et le bazar. Il y écrit :

Tout bon logiciel commence par gratter un développeur là où ça le démange.

(cf http://www.linux-france.org/article/these/cathedrale-bazar/cathedrale-bazar_monoblock.html )

Notez que Raymond n'a jamais dit qu'un projet libre n'apparaît que lorsque qu'un développeur se démange. Il dit plutôt que de bons logiciels sont produits lorsque l'initiateur possède un intérêt personnel à voir son problème résolu; Le corollaire de ce principe appliqué au logiciel libre est le fait qu'une problématique personnelle est la motivation la plus fréquente au démarrage d'un projet.

Cette règle est toujours valable mais moins qu'en 1997, lorsque Raymond l'a formulée. Aujourd'hui, un phénomène se développe : le développement ex nihilo par d'importantes organisations -incluant des organismes à but lucratifs- de grands projets libres gérés de façon centralisée. La production de code par des développeurs isolés afin de répondre à un besoin précis est toujours un modèle très répandu mais ce n'est plus le seul.

L'avis de Raymond n'est reste pas moins très clairvoyant. L'intérêt direct des concepteurs du logiciel demeure la condition essentielle de son succès car ils l'utilisent eux-mêmes. Si le logiciel ne répond pas au besoin initial, la personne ou l'organisation le développant éprouvera de la frustration dans ses tâches quotidiennes. Par exemple, le projet OpenAdapter (http://www.openadapter.org/), à l'initiative de la banque d'investissement Dresdner Kleinwort Wasserstein et dont l'objet est le développement d'un framework d'intégration des systèmes financiers hétérogènes aurait difficilement pu provenir de la démangeaison d'un particulier. Il s'agit d'une démangeaison à un niveau institutionnel. Mais cette démangeaison est issue directement de l'expérience de cette institution et de ses partenaires, donc si ce projet les soulage, ils seront les premiers à s'en rendre compte. Ce mode de fonctionnement permet de produire un logiciel adéquat parce que les échanges entre utilisateurs et producteurs permettent d'alimenter un cercle vertueux. Le programme est avant tout écrit par eux et pour eux ; ils peuvent donc répondre à leur problématique. Il a été écrit pour résoudre un problème particulier, et a ensuite été partagé avec d'autres, comme si le problème avait été une maladie et le programme son antidote dont la distribution a pour effet d'en éradiquer l'épidémie.

Ce chapitre décrit la manière de fournir au monde un nouveau projet libre, mais la plupart de ses recommandations pourrait tout aussi bien s'appliquer à l'industrie pharmaceutique. Les objectifs sont très similaires : vous voulez décrire clairement ses capacités thérapeutiques, sa posologie et vous assurer qu'ils tomberont entre de bonnes mains. Mais dans le cas d'un logiciel, vous désirez également attirer certains patients afin qu'ils se joignent à l'effort de recherche pour améliorer le principe actif au bénéfice de tous.

La production de logiciel libre est une tache bipolaire. Le logiciel doit acquérir de nouveau utilisateurs mais également de nouveaux développeurs. Ces deux taches ne sont pas nécessairement ambivalentes mais la différence dans leurs objectifs complexifie la façon de présenter initialement le projet. Certaines informations sont utiles aux deux audiences, certaines leur sont spécifiques. Néanmoins, les deux types d'information doivent respecter le principe de présentation échelonnée, dans le sens où le niveau de détail présenté à chaque étage doit correspondre scrupuleusement à l'effort et au temps consenti par le lecteur. Une augmentation de l'effort doit assurer en contrepartie une récompense proportionnelle. Lorsqu'il y a perte de corrélation entre les deux, les gens perdent rapidement la foi dans le projet et stoppent leurs investissements.

Le corollaire de ce principe est le fait que l'apparence compte. Les développeurs en particulier n'aiment pas cette idée. Leur attachement au fond plutôt qu'à la forme est souvent brandi comme une marque de professionnalisme. Ce n'est pas un hasard si tant de développeurs exhibent une réelle antipathie pour le marketing et les relations publiques ; pas plus que le fait que les graphistes professionnels sont souvent horrifiés par le résultat auquel arrivent les développeurs livrés à eux mêmes.

C'est déplorable car il existe des situations où la forme est le fond et la présentation d'un produit en fait partie. Par exemple, l'apparence du site Web d'un projet est la première chose qu'un visiteur va en retenir. Cet aspect du site est pris en compte avant le contenu en tant que tel, bien avant que tout texte soit lu ou les liens activés. Aussi injuste que cela puisse paraître, les gens ne peuvent se refréner de former leur opinion au premier regard. L'apparence d'un site fourni au visiteur le degré de soin apporté à organiser la présentation. Les humains possèdent une antenne particulièrement sensible pour détecter le niveau d'effort consenti. La plupart d'entre nous peuvent en un simple coup d'oeil s'avancer sur le fait qu'un site soit une simple concaténation d'informations ou le fruit d'une réflexion mûrie. Le site est le premier indicateur exposé par le projet et l'impression qu'il dégage s'appliquera au reste du projet par association mentale.

Ainsi, bien que ce projet se concentre sur le contenu servant au démarrage d'un projet, gardez en tête que l'apparence compte également. Sachant qu'un site Web doit s'adresser à deux publics -les utilisateurs et les développeurs- une attention particulière doit être apportée à la clarté et à l'adéquation du message. Bien que le conception de sites Web ne soit pas le sujet de ce livre, un principe est à retenir, en particulier si le site s'adresse à des audiences distinctes : les visiteurs doivent directement savoir où pointe un lien avant de cliquer dessus. Par exemple, il doit être évident, simplement en regardant les liens vers la documentation utilisateur, qu'ils conduisent bien à la documentation utilisateur, et non -par exemple- à la documentation interne des développeurs. L'un des rôles d'un projet est de fournir de l'information, mais également du confort. Le simple fait de retrouver des informations standards à un endroit attendu rassure les utilisateurs et les développeurs qui veulent décider de s'impliquer ou pas. Le projet signifie ainsi qu'il prend ses responsabilités, qu'il anticipe les questions, et qu'il a fait un effort pour y répondre avec un minimum d'exigence pour l'intervenant. En constituant cette atmosphère de préparation, le projet envoie un message clair : "Votre temps ne sera pas gaspillé si vous vous impliquez" : exactement ce que les gens veulent entendre...

Observez d'abord

Avant de démarrer un projet libre, une règle de base est à respecter :

Vérifiez toujours qu'un projet existant ne répond pas déjà à votre besoin. Il est possible voire probable que quelqu'un ait déjà traité votre problématique. Si tel est le cas, et que le code a été publié en licence libre, il serait peu judicieux de réinventer la roue. Il y a bien entendu des exceptions à cette règle : si le but du projet est principalement didactique ou s'il concerne un domaine de niche si réduit qu'il n'y a aucune chance que quelqu'un ait pu le traiter avant vous. Mais en général, il ne coûte rien de vérifier et le jeu en vaut largement la chandelle. Si les moteurs de recherche classiques ne retournent aucun résultat, jetez un coup d'oeil aux sites d'information traitant des logiciels libres (ce point sera développé ultérieurement) ou dans les registres de la FSF (Free Software Foundation) : http://directory.fsf.org/.

Même si vous ne trouvez pas exactement ce que vous aviez en tête, il est possible de détecter un projet similaire pour lequel une collaboration serait plus fructueuse que de partir seul de zéro.

Revue de paquetage

Finalement, vous avez regardé autour de vous, n'avez rien trouvé qui corresponde à votre besoin et vous avez décidé de démarrer un nouveau projet.

Et maintenant ?

L'aspect le plus difficile du lancement d'un projet libre est de transformer une vision personnelle en vision universelle. Même si vous ou votre organisation avez parfaitement cerné le besoin, l'exprimer de façon compréhensible au reste du monde peut s'avérer être un travail conséquent. Il est néanmoins fondamental de prendre le temps de le faire. Les fondateurs du projet doivent fixer ses objectifs, ce qui implique de fixer ses limites -aussi bien les fonctionnalités qu'il assurera que celle qu'il n'assurera pas- et ainsi de coucher ses finalités sur papier. Cette étape se déroule en général sans difficultés majeures bien qu'elle puisse quelque fois révéler des hypothèses cachées voire des désaccords sur la nature du projet, ce qui est une bonne chose : mieux vaut résoudre les divergences en amont. L'étape suivante est d'empaqueter le projet à destination du grand public, ce qui s'avère être un travail titanesque.

Ce qui rend ce travail si laborieux est le fait qu'il consiste à formaliser et documenter des concepts que tout le monde connaît déjà -"tout le monde" désignant les intervenants actuels du projet. De ce fait, aucun bénéfice immédiat n'en est retiré pour ces derniers. Aucun besoin de fichier LISEZ-MOI qui décrirait le projet, pas plus que d'un dossier de conception ou d'un manuel utilisateur. Aucun besoin d'organiser le code source selon les standards (tacites mais universels) du libre, toute organisation est bonne puisqu'elle leur convient, qu'ils y sont déjà familiarisés et qu'il savent comment exécuter le code de toute manière. De même, il est sans gravité -pour eux- que les principes généraux d'architecture demeurent non documentés : ils les connaissent déjà.

Les nouveaux arrivants de leur coté ont grand besoin de ces documents, mais heureusement pas tous simultanément. Il n'est pas obligatoire de fournir toutes les ressources imaginables en pré-requis du projet. Dans un monde parfait, peut-être, tout nouveau projet libre apparaîtrait avec un dossier de conception impeccable, un guide utilisateur exhaustif (avec le détail des fonctionnalités restant à implémenter et déjà disponibles), un code source superbe et empaqueté de façon portable pour fonctionner sur toutes les plates-formes et ainsi de suite. En réalité, assurer tout ceci serait inacceptablement coûteux en ressource et il s'agit de toute façon de tâches pouvant raisonnablement être réalisées en cours de projet par des volontaires.

Ce qui est incontournable néanmoins est que suffisamment d'investissements ait été assurés sur la présentation du projet pour lever la barrière de l'inconnu auprès des nouveaux venus. Imaginez cet investissement comme la première étape d'un processus de bootstrap qui apporterait au projet le quanta minimal d'énergie d'activation. J'ai entendu parler de ce concept sous le nom d'énergie d'hacktivation , c'est à dire la quantité d'énergie qu'un nouveau venu consomme avant de produire à son tour quelque chose d'utile au projet. Le seuil d'énergie d'hacktivation requis doit être le plus bas possible. C'est là votre première tâche que de limiter au maximum ce niveau d'énergie pour encourager les gens à s'investir.

Chacun de ces sous-chapitres décrivent un aspect important du démarrage d'un nouveau projet. Ils sont globalement présentés dans l'ordre où les visiteurs les rencontrent, bien que l'ordre dans lequel vous les avez mis en place puisse différer. Traitez les comme une check-list. Vérifiez quand vous démarrez un nouveau projet, pour chaque point, qu'il a été traité, où au moins que vous savez apprécier les conséquences si tel n'a pas été le cas.

Choisir un nom adéquat

Mettez vous à la place d'une personne qui aurait entendu parlé de votre projet pour la première fois, peut être après l'avoir découvert à l'issue d'une laborieuse recherche de solution à sa problématique. Le premier contact avec le projet se fera au travers de son nom.

Un bon nom ne fera pas systématiquement le succès d'un projet, pas plus qu'un mauvais son échec (un très mauvais y conduira mais nous pouvons supposer que personne ne tente de faire échouer son propre projet volontairement). Néanmoins, un mauvais nom peut ralentir l'adoption d'un projet, soit parce que les gens ne le prennent pas au sérieux, soit parce qu'il est difficile à mémoriser.

Un bon nom:

  • Donne une idée immédiate du champs d'action d'un projet ou au moins y est lié de façon évidente de telle sorte que quelqu'un connaissant le projet et son domaine se souviennent immédiatement de son nom.

  • Est facile à mémoriser. Il est clair que la langue de Shakespeare est devenue le standard de facto de la communication sur Internet. "Facile à mémoriser" signifie en réalité "Facile à mémoriser pour un anglophone". Les noms issus de jeux de mots liés à la prononciation locale par exemple seront totalement opaques pour la plupart des lecteurs anglophones. Si le jeu de mot est particulièrement saisissant ou plaisant, il peut tout de même être conservé; gardez simplement en tête que de nombreuses personnes lisant le nom n'entendront pas intérieurement ce qu'un lecteur natif entendrait.

  • Est distinct du nom d'autres projets et n'enfreint pas des marques déposées. Il s'agit simplement de bonnes manières et de bon sens juridique. Ce n'est pas une bonne idée que de créer de la confusion, il est déjà suffisamment difficile de garder une trace de tout ce qui est déjà disponible sur Internet pour ne pas nommer les choses de la même façon. Les liens mentionnés précédemment dans la section intitulée « Observez d'abord » sont utiles pour vérifier si un autre projet utilise déjà le nom que vous aviez à l'esprit. Des recherches de marques déposées sont également disponibles à http://www.nameprotect.org/ et http://www.uspto.gov/.

  • Est de préférence disponible en tant que nom de domaine .com, .net, et .org. Il est conseillé d'en réserver un (probablement le .org) en tant que site officiel du projet ainsi que les deux autres pour éviter le cyber-squatting de tiers qui désirerait tirer profit de la notoriété du projet. Même si vous prévoyez de faire héberger le projet par une forge (voir la section intitulée « Hébergement sur une forge »), vous pouvez enregistrer les noms de domaines du projet et les rediriger vers le site de la forge. Ceci aide les utilisateurs à mémoriser les URL.

Fixer des finalités claires

Une fois que le site Web a été trouvé, la seconde chose que les visiteurs font en général est de rechercher une courte description du projet ou de sa finalité pour déterminer (en moins de trente secondes) s'ils désirent ou non en savoir plus. Cet abstract doit être clairement visible au sein de la page de garde du site, de préférence juste sous le nom du projet.

La description doit être concrète, concise et, par dessus tout, courte. Voici un exemple à suivre :

Créer, en tant que communauté, la suite bureautique de référence au niveau international qui fonctionnera sur toutes les plates-formes majeures et qui assurera l'accès à l'ensemble de ses fonctionnalités et données sous la forme d'une API orientée composants et d'un format fichier de type XML.

En quelques mots, les points principaux sont révélés tout en s'appuyant largement sur les connaissances actuelles du lecteur. En précisant "en tant que communauté", ils affirment qu'aucun organisme privé ne dominera le développement ; "international" signifie que le logiciel permettra aux utilisateurs de travailler dans différentes langues et données régionalisées ; "toutes les plates-formes majeures" ajoute qu'il sera portable sous Unix, Macintosh, et Windows. Le reste signale que les architectures interopérables et les formats ouverts forment une part importante des principes directeurs du projet. Il n'est pas explicitement dit que ce projet est une alternative à la suite bureautique Microsoft Office mais la plupart des visiteurs auront lu entre les lignes. Même si cette description peut sembler quelque peu verbeuse au première abord, elle s'avère en fait exhaustive : les mots "suite bureautique" sont concrets pour les utilisateurs familiers de ce type de logiciel, on s'appuie sur la connaissance présumée du lecteur (probablement au tant qu'utilisateur de MS Office) pour assurer la concision du message.

La description des finalités dépend en partie de son auteur, pas seulement du logiciel en tant que tel. Dans l'exemple précédent d'OpenOffice.org, il est utile d'utiliser les mots "en tant que communauté" car le projet était initialement principalement porté par Sun Microsystems. En précisant ainsi la nature du projet, Sun indique qu'il est sensible à la crainte qu'il puisse dominer le processus de développement et la feuille de route. De cette façon, en exposant clairement la prise en compte du problème, c'est une grande partie du problème lui-même qui disparaît. De leur coté, les projets qui ne sont pas soutenus par un unique organisme n'aura probablement pas à appuyer sur cet aspect puisque le mode de développement communautaire est la norme dans ce domaine, il n'y a donc pas d'impératif à le lister comme élément des principes directeurs.

Préciser que le projet est libre

Les visiteurs toujours intéressés après avoir consulté les finalités désirerons ensuite obtenir d'avantage de détail, peut être le guide utilisateur ou de développement, et voudront éventuellement télécharger quelque chose. Mais ils doivent d'abord avoir la certitude qu'il s'agit d'un logiciel libre.

La page de garde doit indiquer sans ambiguïté que le projet est libre. Ceci peut sembler évident mais vous seriez surpris du nombre de projets qui oublient de le mentionner. J'ai déjà vu des projets dont la page de garde non seulement ne mentionnait pas la licence sous laquelle le projet était distribué mais pas même qu'il était libre. Quelque fois, cette information cruciale est reléguée dans la page de Téléchargement ou la page réservée aux développeurs, voire à d'autres endroits nécessitant plus d'un clic de souris à atteindre. Dans des cas extrêmes, la licence n'est pas donnée du tout et le seul moyen de l'obtenir est de télécharger le logiciel et de regarder à l'intérieur.

Ne commettez pas cette erreur qui peut coûter des développeurs et utilisateurs potentiels. Affichez clairement, juste sous les finalités, que le projet est "libre" ou "open source" et précisez la licence exacte. Un guide de prise en main rapide pour choisir une licence est donnée dans la section intitulée « Choisir une licence et la mettre en oeuvre » plus loin dans ce chapitre, et les questions de licences sont discutées en détail dans le Chapitre 9, Licenses, Copyrights, and Patents.

A cette étape, notre visiteur a déterminé -probablement en moins d'une minute- s'il est suffisamment intéressé pour dépenser, disons, cinq minutes de plus à approfondir ce projet. Les paragraphes suivants décrivent ce qui devrait rencontrer durant ces cinq minutes supplémentaires.

Lister les fonctionnalités et pré-requis

Il s'agit d'une courte liste des fonctionnalités principales du logiciel (si des fonctions ne sont pas encore opérationnelles, vous pouvez les lister à condition de préciser "prévu" ou "en cours"), ainsi que l'environnement technique nécessaire à son exécution. Imaginez cette liste de fonctionnalités/pré-requis comme ce que vous répondriez à quelqu'un vous demandant un bref résumé du logiciel. Cette liste est une extension naturelle des finalités. Par exemple, pour cette mission :

Créer un moteur d'indexation et de recherche textuel utilisable via une riche API permettant aux développeurs de fournir des services de recherche dans de grandes quantités de fichiers texte.

La liste fonctionnalités/pré-requis fournirait les détails afin de clarifier cette mission.

Fonctionnalités:

  • Recherche dans des fichiers texte, HTML et XML

  • Recherche de mots ou de phrases

  • (prévu) Recherche approchante

  • (prévu) Mise à jour incrémentale des index

  • (prévu) Indexation de sites Web distants

Pré-requis:

  • Python 2.2 et supérieur

  • Suffisamment de disque pour stocker les index (approximativement le double de la taille des données indexées)

Disposant de ces informations, les visiteurs peuvent rapidement déterminer si ce logiciel leur convient. Ils peuvent également décider ou non de proposer leurs services en tant que développeurs.

Informer sur le statut de développement

Les gens apprécient toujours de connaître l'état d'un projet. Pour les nouveaux projets, ils veulent connaître le fossé entre les promesses et la réalité courante. Pour les projets mâtures, c'est l'activité de maintenance qui les intéresse : fréquence de nouvelles versions, réactivité face aux tickets d'incidents, etc.

Pour répondre à toutes ces questions, il est conseillé de fournir une page de statut du développement listant les objectifs du projet à court terme et ses besoins (par exemple, il peut rechercher un développeur disposant d'une expertise particulière). Cette page fournit également un historique des versions antérieures, munie d'une liste de fonctionnalités, pour que les visiteurs puissent se faire une idée de ce que le projet appelle "avancement" et à quelle vitesse effective il correspond.

Ne craignez pas de paraître en plein chantier et ne vous laissez pas tentés par un enjolivement du statut. Tout le monde sait qu'un logiciel évolue par paliers; il n'y a aucune honte à affirmer "Ceci est un logiciel en alpha avec des bogues connus. Il démarre et fonctionne la plupart du temps mais utilisez-le à vos risques et périls". Un tel langage ne rebute pas le type de développeurs dont vous avez besoin à cette étape. De même, du coté des utilisateurs, il n'y a rien de pire que de les attirer avant que le logiciel soit prêt pour eux. Une réputation d'instabilité est très difficile à rattraper une fois acquise. L'humilité paye sur le long terme : il est toujours préférable pour un logiciel d'être plus que moins stable qu'attendu et les bonnes surprises assurent le bouche à oreille.

Téléchargements

Le logiciel devrait être distribué en code source dans un format standard. Lorsqu'un projet démarre, les distributions de binaires (exécutables) ne sont pas obligatoires à moins que le projet possède tant de dépendances ou de pré-requis de compilation que le faire fonctionner nécessite un effort considérable. (Mais dans ce cas, le projet aura des problèmes à recruter des développeurs de toute manière !)

La procédure d'installation doit être aussi simple, standard et peu économe en ressource que possible. Si vous tentez d'éradiquer une maladie, vous ne distribueriez pas le médicament de façon à qu'il nécessite une seringue de taille non standard pour être administré. De la même façon, les logiciels doivent se conformer à des standards de construction et d'installation; plus ils dévient du standard, plus d'utilisateurs et de développeurs abandonnent, perplexes.

Cela semble évident mais beaucoup de projets ne daignent standardiser leur procédure d'installation que très tard dans le projet, se disant qu'ils auront l'occasion de le faire à tout moment: "Nous verrons l'empaquetage lorsque le code sera mieux fini". Ce qu'ils ne réalisent pas est qu'en mettant de coté ce travail rébarbatif, ils allongent en réalité la phase de réalisation du code, parce qu'il découragent les développeurs qui -sinon- y auraient contribué. De façon plus insidieuse, ils ne savent pas qu'ils perdent tous ces développeurs car il s'agit d'une accumulation de non-événements : quelqu'un visite le site Web du projet, télécharge le logiciel, tente de le compiler, échoue dans cette entreprise et s'en va. Qui saura que cela s'est produit, excepté la personne elle-même ? Personne du projet ne réalisera que cette motivation et cette compétence a été gaspillée.

Le travail ennuyeux avec un bon retour sur investissement devrait toujours être réalisé tôt pour abaisser suffisamment la barrière d'accès au projet.

Lorsque vous publiez une version téléchargeable, il est vital de lui donner un numéro de version unique pour que les gens puissent comparer deux versions et savoir immédiatement laquelle précède l'autre. La numérotation de version est discutées au paragraphe la section intitulée « Release Numbering », et les détails de la standardisation des procédures de compilation et d'installation sont détaillés dans la section intitulée « Packaging » et Chapitre 7, Packaging, Releasing, and Daily Development.

La gestion de configuration logicielle et les systèmes de gestion de tickets

Télécharger les distribution de sources est satisfaisant pour simplement installer et utiliser le logiciel, mais n'est pas suffisant pour déboguer ou ajouter de nouvelles fonctionnalités. Les extractions de sources journalières sont utiles, mais pas d'une fréquence adéquate pour favoriser l'émergence d'une communauté de développeurs. Les gens ont besoin d'accéder en temps réel aux sources dans leur état courant et le moyen de leur offrir ce service est d'utiliser un système de Gestion de Configuration Logicielle (GCL). Le fait de fournir un accès anonyme aux sources en GCL est un signe -à la fois dirigé vers les utilisateurs et vers les développeurs- que ce projet fait un effort particulier pour fournir aux gens ce dont ils ont besoin pour participer. Même si vous ne pouvez proposer la GCL tout de suite, laissez une indication que vous allez le faire sous peu. L'infrastructure de gestion de configuration logicielle est développée en détail dans la section intitulée « Les logiciels de gestion de versions » du Chapitre 3, L'infrastructure technique.

Il en va de même de la gestion de tickets. Son importance ne réside pas seulement dans son utilité pour les développeurs mais également pour les observateurs du projet. Pour beaucoup, la mise à disposition d'une base de donnée d'incidents est l'un des indicateurs les plus forts du sérieux d'un projet. D'ailleurs, la qualité apparente d'un projet est directement proportionnelle au nombre de tickets saisis. Ceci peut semble contradictoire mais gardez à l'esprit que le nombre de bogues dépend de trois choses : le nombre absolu de bogues inclus dans le logiciel, le nombre d'utilisateurs du logiciel, et la facilité avec laquelle les utilisateurs peuvent reporter leurs incidents. Sur ces trois facteurs, les deux derniers sont les plus significatifs. Tout logiciel d'une certaine taille et complexité contient un nombre indéterminé de bogues attendant d'être découverts. La vraie question est de savoir comment le projet gère l'enregistrement et la priorisation de ces tickets. Un projet avec une base d'incidents importante et correctement gérée (les bogues sont pris en compte rapidement, les doublons sont identifiés, etc..) donne une meilleure impression qu'un projet dénué de gestion de tickets ou doté d'une base de donnée quasiment vide.

Bien entendu, votre projet contiendra au départ peu de bogues, et il n'y a pas à s'en inquiéter. Si la page de statut met l'accent sur la jeunesse du projet et que les gens observant la base de bogues constatent que la plupart des incidents sont récents, ils pourront en conclure que le projet possède un bon niveau d'enregistrement de tickets et ne seront pas excessivement alarmés par le faible nombre absolu de tickets enregistrés.

Notez que les outils de gestion de ticket ne sont pas seulement utilisés pour suivre les incidents mais également les demandes de fonctionnalités, les évolutions de la documentation et bien d'autres choses encore. Nous ne détaillons pas davantage la gestion de ticket car ce point est développé dans la section intitulée « Bug Tracker » du Chapitre 3, L'infrastructure technique. Du point de vue de la présentation du projet, l'important est de posséder un outil de gestion de tickets et de s'assurer que ce fait est visible dès la page de garde du site Web.

Les canaux de communication

Les visiteurs apprécient de savoir comment joindre les personnes responsables du projet. Fournissez les adresses des listes de diffusion, des chambres de discussion (chat), les canaux IRC et autres forums où les personnes liées au projet sont accessibles. Précisez clairement que vous et les autres auteurs du projet sont inscrits sur ces listes de diffusion afin que les utilisateurs constatent qu'il existe un moyen simple de contacter les développeurs. Notez que votre présence sur les listes n'implique pas de répondre à toutes les questions ou d'implémenter toutes les demandes de fonctionnalités. Sur le long terme, la plupart des utilisateurs n'utiliseront pas ces canaux de toute manière, mais ils seront rassurés de savoir qu'il pourraient le faire en cas de besoin.

Dans les prémisses du projet, il n'y a pas de besoin à séparer les forums utilisateurs et développeurs. Il est nettement préférable que ces deux populations se parlent l'une à l'autre, dans une seule "pièce". Parmi les utilisateurs de la première heure, la distinction est floue entre développeurs et utilisateurs. Le ratio développeurs sur utilisateurs est en général bien plus élevé à cette étape que plus tard dans le projet. Bien que vous ne puissiez compter chaque utilisateur comme quelqu'un désirant travailler sur votre projet, vous pouvez estimer qu'une bonne partie est intéressée pour suivre les discussions sur le développement comme moyen de saisir la direction du projet.

Ce chapitre traitant seulement à ce stade de la façon de démarrer un projet, il est suffisant de dire que ces canaux de communications doivent exister. Plus loin, dans la section intitulée « Handling Growth » du Chapitre 6, Communications, nous étudierons comment où et comment monter de tels forums, les façon de les gérer ou de les modérer, et comment séparer les forums utilisateurs et développeurs, lorsque le temps est venu, et sans créer un fossé infranchissable.

Le guide du développeur

Quelqu'un cherchant à contribuer au projet commencera par rechercher le guide du développeur. Ces guides sont davantage sociaux que techniques : ils expliquent comment les développeurs interagissent les uns avec les autres et avec les utilisateurs, et au final comment les choses se font.

Ce sujet est couvert en détail dans la section intitulée « Writing It All Down » du Chapitre 4, Social and Political Infrastructure, mais le contenu général du guide du développeur est le suivant :

  • des pointeurs vers les forums pour interagir avec les autres développeurs

  • des instructions sur la façon de reporter des bogues et soumettre des patchs

  • des indications sur le fonctionnement du projet- est-ce une dictature bienveillante ? Une démocratie ? Quelque chose d'autre ?

A propos, il n'y a pas de sous-entendu péjoratif au mot "dictature". C'est tout à fait acceptable de fonctionner en mode tyrannie où un développeur particulier a un droit de veto sur toute modification. Beaucoup de projets fonctionnent de cette manière avec succès. L'important est que le projet l'exprime clairement et au grand jour. Une tyrannie prétendant être une démocratie fera fuir les gens ; une dictature se présentant comme telle fonctionnera bien aussi longtemps que le dictateur est compétent et possède la confiance de l'équipe.

Voir http://subversion.apache.org/docs/community-guide/ pour un exemple de guide du développeur particulièrement exhaustif, ou http://www.openoffice.org/dev_docs/guidelines.html pour un exemple de guide plus large se concentrant d'avantage sur la gouvernance et l'esprit du projet et moins sur les considérations techniques.

Le sujet distinct de fournir une introduction à la programmation du logiciel est discutée dans la section intitulée « La documentation développeurs » plus loin dans ce chapitre.

La documentation

La documentation est essentielle. Il doit exister quelque chose même si c'est rudimentaire et incomplet. C'est encore une tâche à classer dans la catégorie "déplaisant" déjà évoquée plus tôt et qui est souvent le premier écueil pour les projets libres. Définir des finalités et une liste de fonctionnalités, choisir une licence, synthétiser un statut d'avancement - tout ceci constituent des tâches plutôt légères et souvent faites une fois pour toutes. La documentation, de son coté, n'est jamais réellement achevée et c'est peut être la raison pour laquelle les gens rechignent quelque fois à son écriture.

L'effet le plus insidieux est que l'usage de celui qui écrit la documentation est l'exact inverse de celui qui la consulte. Le contenu documentaire le plus important pour les utilisateurs initiaux comprend les choses les plus élémentaires : comment installer rapidement le logiciel, un aperçu de la façon dont il fonctionne et, peut être, des guides pour réaliser les tâches les plus courantes. C'est précisément les éléments que les auteurs connaissent parfaitement, si bien qu'il est très difficile pour eux d'acquérir le point de vue des utilisateurs, puis de détailler laborieusement des étapes leur semblant évidentes au point d'être inutiles à mentionner.

Il n'existe pas de solution toute prête à ce problème. Quelqu'un doit simplement s'asseoir et démarrer quelque chose, puis de le faire lire par des utilisateurs lambda pour tester sa qualité. Utilisez un format simple, facile à éditer comme le HTML, le texte brut, Textinfo ou des variantes de XML : quelque chose de léger et permettant des mises à jours rapides et au fil de l'eau. Ce critère ne concerne pas seulement la productivité de l'auteur original mais également à long terme de ceux qui joindront le projet plus tard et qui désireront maintenir cette documentation.

Une façon simple de s'assurer que la documentation initiale sera réalisée est de limiter son périmètre à l'avance. Un bon moyen de le déterminer est de s'assurer qu'il respecte les critères minimaux suivants:

  • Dire clairement au lecteur quelle expertise technique il est sensé avoir.

  • Décrire simplement et de façon exhaustive la procédure d'installation du logiciel, et fournir assez tôt dans le manuel un diagnostique d'auto-contrôle ou une simple commande pour confirmer que tout est correctement configuré. La documentation de prise en main est d'une certaine façon plus importante que le guide d'utilisation complet. Plus un utilisateur consent d'efforts à installer et démarrer le logiciel, plus il persistera à faire marcher les fonctionnalités avancées qui sont moins bien documentée. Lorsque les gens abandonnent, ils le font tôt, il faut donc concentrer les efforts de support aux phases de démarrage.

  • Donner un style tutoriel à la description des tâches courantes. Bien entendu, plusieurs exemples pour plusieurs taches serait préférable, mais votre temps est probablement limité. Sélectionnez une tache et décrivez là de façon exhaustive. Une fois que quelque le logiciel peut être utilisé pour une tâche, les utilisateurs commenceront à explorer les autres de façon autonome, et - si vous êtes chanceux - commenceront à alimenter la documentation par eux-même. Ce qui nous conduit au point suivant...

  • Notifiez les endroits où la documentation est incomplète. En montrant au lecteur que vous êtes conscient des manques, vous vous alignez sur son point de vue. Votre empathie rassure les utilisateurs sur le fait que qu'il n'auront pas à lutter pour convaincre de l'importance de cette tâche. Ces notifications ne sont pas un engagement à les remplir à une date donnée, il faut plutôt les considérer comme des demandes ouvertes aux volontaires.

Le dernier point est fondamental et peut s'appliquer au projet dans sa globalité et pas seulement la documentation. Une énumération détaillée des problèmes connus est la norme dans le monde du libre. Nul besoin d'exagérer les défauts du projet, il s'agit simplement de les identifier scrupuleusement et dépassionné lorsque le contexte le requiert (par exemple dans la documentation, l'outil de gestion de tickets ou sur une liste de diffusion). Personne ne le prendra comme du défaitisme ni comme un engagement à résoudre les problèmes à une date butoir, sauf si le projet l'exprime explicitement. Puisque tous les utilisateurs peuvent découvrir des incident par eux-même, il est nettement préférable pour eux d'être préparés psychologiquement : ainsi le projet donnera une image de bonne gestion.

Mise à disposition de la documentation

La documentation doit être accessible de deux endroits : en ligne (directement depuis le site Web), mais également dans la distribution téléchargeable du logiciel (voir la section intitulée « Packaging » du Chapitre 7, Packaging, Releasing, and Daily Development). Elle doit être en ligne, sous un format navigationnel, car les gens lisent souvent la documentation avant de télécharger l'applicatif, celle-ci leur permettant de décider ou non de l'utiliser. Mais elle soit aussi accompagner le logiciel en partant du principe que son téléchargement doit fournir (c'est à dire localement) tout ce qui est utile à son utilisation ultérieure.

En ce qui concerne la documentation en ligne, veillez à fournir un lien qui affiche la documentation entière dans une unique page HTML (apposez une note comme "monolithique", "tout en un" ou "page unique" à coté du lien pour que les gens ne soient pas étonnés de la durée du chargement). Cette façon de faire est utile, en particulier pour réaliser des recherches d'un mot ou d'une phrase spécifique dans l'ensemble de la documentation. Les gens savent souvent ce qu'ils cherchent mais pas la section à consulter. Pour de tels utilisateurs, rien de plus frustrant que de tomber sur une page HTML de table des matières, puis une page d'introduction, d'installation et ainsi de suite. Lorsque la documentation est ainsi découpée, la fonction recherche de leur navigateur est inutile. Le style page par page est utile pour ceux qui connaissent à l'avance la section désirée ou qui parcours la documentation de A à Z séquentiellement, mais ce n'est pas la façon la plus courante d'y accéder. De façon bien plus fréquente, quelqu'un de familiarisé avec le logiciel revient pour chercher un mot ou une phrase spécifique. Ne pas leur fournir une telle possibilité leur rend la vie plus difficile.

La documentation développeurs

La documentation développeurs est écrite pour aider les programmeurs à comprendre le code, de façon à l'étendre ou à le réparer. Elle est distincte du guide du développeur discuté plus tôt et qui est davantage social que technique. Le guide du développeur précise comment les programmeurs travaillent ensemble alors que la documentation développeur explicite comment ils travaillent avec le code en tant que tel. Les deux sont souvent réunis en un seul pour des raisons de facilité (comme l'exemple donné plus tôt), mais ce n'est pas une obligation.

Cette documentation étant très utile, il n'y a pas de raison de la différer à une version donnée. Il est suffisant pour démarrer que les auteurs originaux soient disponibles et désirent répondre aux questions sur le code. En fait, avoir à répondre encore et encore aux mêmes questions est la motivation la plus courante pour écrire la documentation. Mais même avant sa rédaction, des contributeurs déterminés peuvent se contenter du code. La force qui conduit les gens à passer du temps à comprendre du code est le fait que ce code produit quelque chose d'utile pour eux. S'ils ont foi dans cette utilité, ils prendront le temps de comprendre le fonctionnement du code; et inversement, s'ils ne l'ont pas, aucune quantité de documentation ne sera suffisante pour les retenir.

Ainsi, si vous n'avez le temps d'écrire de la documentation que pour une population, écrivez là pour les utilisateurs. Toute documentation utilisateur est également une documentation développeur : tout programmeur prévoyant de travailler sur un logiciel doit préalablement l'avoir pris en main. Ensuite, lorsque vous observez que les programmeurs posent régulièrement les mêmes questions, prenez le temps d'écrire des documents leur étant dédiés.

Certains projets utilisent des wikis pour leur documentation initiale, voire leur documentation principale. Selon mon expérience, ceci ne fonctionne vraiment que si le wiki est activement édité par un groupe restreint de personnes qui s'accordent sur la façon d'organiser la documentation et sur un sorte de "ton" qu'elle doit avoir. Voir la section intitulée « Wikis » du Chapitre 3, L'infrastructure technique pour plus d'information.

Échantillons et captures d'écrans

Si le projet propose une interface graphique ou produit un artéfact distinctif, mettez en avant des exemples sur le site Web du projet. Dans le cas d'une IHM, il s'agit de captures d'écrans; pour les artéfacts, il peut s'agir également de captures d'écrans ou de fichiers bruts. Dans les deux cas, vous assurez une gratification instantanée : une simple capture d'écran est plus convaincante que des paragraphes de description ou de bavardages sur les listes de diffusion, car une capture d'écran est la preuve sans ambiguïté que le logiciel fonctionne. Il peut être bogué, difficile à installer, être insuffisamment documenté, mais cette capture est la preuve qu'il est possible de le faire fonctionner, à condition d'y mettre un effort suffisant.

Il y a bien d'autres choses que vous pouvez mettre sur le site Web de votre projet, si vous en avez le temps, ou si pour une raison une autre, c'est particulièrement approprié : une page de nouvelles, une page d'historique, une page de liens vers des sites associés, une fonction de recherche dans le site, un lien de donation, etc. Aucune n'est impérative au démarrage, mais gardez les en tête pour l'avenir.

Hébergement sur une forge

Certains sites proposent un hébergement gratuit et une infrastructure pour les projets libres : une zone Web, un outil de gestion de configuration logiciel, un outil de gestion de ticket, des référentiels de téléchargement, des forums de discussion, des sauvegardes automatiques, etc. Les détails varient d'un site à l'autre, mais les mêmes services de base sont offerts par tous. En utilisant l'un de ces sites, vous obtiendrez beaucoup gratuitement, mais vous perdrez, bien entendu, le contrôle fin sur l'expérience utilisateur. La forge décide quelle logiciel le site utilise, et peut contrôler ou au moins influencer l'aspect des pages Web.

Voir la section intitulée « Canned Hosting » du Chapitre 3, L'infrastructure technique pour plus de détail sur les avantages et inconvénients des forges, ainsi qu'une liste des sites qui offrent ces services.

Choisir une licence et la mettre en oeuvre

Ce paragraphe est un guide très court et basique sur le choix d'une licence. Consultez Chapitre 9, Licenses, Copyrights, and Patents pour saisir les implications légales des différentes licences et l'impact qu'une licence peut avoir pour fusionner votre logiciel avec du code issue d'autres projets.

Il existe de nombreuses licences pouvant être choisies. La plupart, que nous omettrons, ne seraient probablement pas appropriées à votre projet, couvrant des besoins spécifiques de sociétés ou individus . Nous nous restreindrons simplement aux licences les plus courantes, étant fort probable que l'une d'entre elle vous conviendra.

Les licences "Faites ce que vous voulez"

Si vous ne voyez pas d'objections à ce que votre code soit potentiellement utilisé dans les logiciels propriétaires, utilisez une licence MIT/X ou dérivées. Il s'agit de la plus simple de plusieurs licences minimalistes qui n'apporte pas grand chose de plus que la mention du copyright (sans restriction du droit de copie) et la spécification explicite que le produit est utilisé sans garantie d'aucune sorte. Voir la section intitulée « The MIT / X Window System License » pour plus de détail.

La GPL

Si vous ne désirez pas que votre code puisse être utilisé dans des programmes propriétaires, utilisez la GNU General Public License (http://www.gnu.org/licenses/gpl.html). La GPL est assurément la licence libre la plus largement reconnue dans le monde actuellement. C'est un net avantage, puisque les utilisateurs et contributeurs y sont familiarisés, et ne dépenseront donc pas de temps supplémentaire à lire et comprendre votre licence. Voir la section intitulée « The GNU General Public License » du Chapitre 9, Licenses, Copyrights, and Patents pour davantage de précisions.

Comment mettre en oeuvre cette licence au projet

Une fois la licence choisie, vous devez la préciser sur la page Web de garde du projet. Nul besoin d'inclure le texte de la licence à cet endroit; fournissez simplement le nom de la licence, et prévoyez un lien vers la licence complète située sur une autre page.

Vous explicitez ainsi quelle licence vous prévoyez d'utiliser pour le logiciel, mais c'est insuffisant en terme juridique. Le logiciel lui même doit en effet contenir la licence. La façon standard de le mettre en oeuvre et d'incorporer un fichier texte de licence nommé COPYING (ou LICENSE), et d'ajouter une courte notice dans l'entête de chaque fichier source, fournissant la date du copyright, le détenteur, la licence et pour finir l'endroit ou le texte complet de la licence peut être obtenu.

Il y a bien des variantes possibles, nous fournissons donc un unique exemple ici. La licence GNU GPL requiert d'ajouter une telle notice en entête de chaque fichier source :

Copyright (C) <année>  <nom de l'auteur>

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

Cette notice ne spécifie pas explicitement que la copie de la licence que l'utilisateur reçoit avec le logiciel se trouve dans le fichier COPYING, mais c'est en général l'endroit utilisé (vous pouvez modifier l'entête pour le spécifier directement). Elle fournit également une adresse géographique à partir de laquelle demander une copie de la licence. Une autre méthode commune est de fournir un lien vers une page Web comprenant le texte de la licence. Utilisez simplement votre jugement et pointez vers l'endroit le plus adéquat pour obtenir une copie de façon permanente, ce qui peut être n'importe où sur votre site Web. En général, la notice que vous utilisez n'aura pas exactement le contenu de l'exemple plus haut. L'important est d'y faire figurer les données obligatoires : détenteur du copyright, date, nom de la licence et moyen d'accès au texte de la licence.

Donner le ton

Jusqu'ici, nous avons décrit des tâches réalisées à un moment précis de la mise en place du projet : choisir une licence, concevoir le site Web initial, etc. Mais les aspects les plus importants dans le démarrage d'un projet sont dynamiques. Choisir une adresse de liste de diffusion est facile ; s'assurer que les conversations sur cette liste ne dérivent pas et restent productives est une problématique complètement différente. Si le projet est ouvert au monde après des années de développement propriétaire fermé, son processus de développement va changer, et vous aurez à préparer les développeurs actuels à ce changement.

Les premiers pas sont les plus difficiles, car le projet manque de décisions prises et de reflèxes. La stabilité d'un projet ne vient pas seulement de règles formalisées, mais d'une atmosphère collective difficile à saisir et qui se développe au cours du temps. On trouve souvent des règles écrites, mais elles se bornent souvent à une distillation de préconisations tacites, évoluant régulièrement et servant de guide effectif au projet. Les règlements écrits définissent moins la culture du projet qu'ils la décrivent, et encore souvent de façon approximative.

Il y a plusieurs raisons à cela. Les embauches et les hauts niveau de rotation en entreprise ne sont pas si dommageables pour les normes sociales qu'on pourrait le penser au premier abord. A partir du moment où les changements ne sont pas trop rapides, il y a toujours des périodes de recouvrement dans lesquelles les nouveaux arrivés apprennent les méthodologies, puis les mette en pratique eux mêmes. Voyez comment les comptines d'enfants passent les siècles. Les enfants d'aujourd'hui chantent à peu près les mêmes refrains que d'autres enfants il y des siècles, bien qu'aucun ne soit encore vivant. Les enfants les plus jeunes entendent chanter leurs camarades plus âgés, puis les chantent à leur tour. Les enfants ne sont pas engagés dans un programme conscient de transmission, évidemment, mais la raison pour laquelle les chansons survivent est assurément le fait qu'elles sont transmises régulièrement et de façon répétitive. L'échelle de temps des logiciels libres ne peut être mesuré en siècles (nous le saurons un jour), mais les dynamiques de transmission sont globalement les mêmes. Le niveau de rotation est néanmoins plus élevé, et doit se voir compensé par un effort de transmission plus actif et délibéré.

Cet effort est soutenu par le fait que les gens attendent et recherchent naturellement les normes sociales. C'est la façon dont les humains fonctionnent. Dans tout groupe unit par une cause commune, les gens recherchent instinctivement des comportements qui leur permettrait de se voir identifiés comme membre du groupe. Le but des guides est d'amorcer des comportements de groupe utiles au projet. Une fois mis en place, ils se perpétuerons seuls le plus souvent.

Vous trouverez dans ce qui suit des exemples spécifiques de ce qu'il est possible de faire pour prendre les bonnes décisions. Cette liste n'a pas pour ambition l'exhaustivité mais simplement de montrer qu'une ambiance collaborative aide rapidement et considérablement un projet. Chaque développeur peut travailler seul, mais vous pouvez faire en sorte qu'ils aient le sentiment de travailler en équipe, dans la même pièce. Plus les développeurs le ressentiront, plus ils passeront de temps sur le projet. Je choisi ces exemples particuliers qui apparurent dans le projet Subversion (http://subversion.tigris.org/), pour lequel j'ai participé et que j'ai observé depuis le tout début. Ils ne sont pas spécifiques à Subversion; de telles situations sont communes dans les projets libres, et doivent être considérés comme des opportunités de démarrer les choses du bon pied.

Évitez les discussions privées

Même après avoir rendu le projet publique, vous et les autres fondateurs serez souvent tentés de résoudre les questions complexes en petit comité via un canal privé. C'est spécialement vrai au début du projet, alors qu'il y a tant de décisions importantes à prendre et, en général, si peu de volontaires qualifiés pour les prendre. Tous les inconvénients évidents des listes de discussion publiques vont alors apparaître de façon très palpable : le délai inhérent aux conversations par courriel, le besoin de réserver suffisamment de temps pour qu'un consensus se forme, l'embarras à devoir répondre aux volontaires naïfs qui pense comprendre tous les problèmes mais qui ne les comprennent pas (tous les projets ont ce genre de personnes ; quelques fois elles sont les contributeurs stars de l'année d'après, quelque fois ils restent naïfs ); ceux qui ne comprennent pas que vous désirez seulement résoudre le problème X alors qu'il leur semble évident que ce problème est un sous-ensemble d'un problème Y, et ainsi de suite. La tentation de prendre des décisions toutes portes fermées et de les présenter comme faits accomplis, ou au moins comme de fermes recommandations d'un groupe de votants uni et influent, est en effet très grande.

Ne le faites pas.

Aussi longues et lourdes que ces discussions publiques puissent paraître, elles sont presque toujours préférable sur le long terme. Prendre d'importantes décisions en privé est comme utiliser un repoussoir à contributeurs sur votre projet. Aucun volontaire sérieux ne restera dans un environnement ou un conseil secret prend toutes les décisions importantes. En outre, les discussions privées ont des effets de bord bénéfiques en plus de la simple résolution d'un éphémère problème technique :

  • La discussion aide à former les nouveaux développeurs. Vous ne savez jamais combien d'yeux suivent la conversation ; même si la plupart des gens n'y participent pas, beaucoup peuvent la suivre silencieusement, glanant des informations sur le logiciel.

  • La discussion vous entraîne aussi dans l'art d'expliquer des problèmes techniques aux gens qui ne sont pas encore autant familiarisés avec le logiciel que vous l'êtes. Il s'agit d'une compétence qui requiert de la pratique, et vous ne pouvez obtenir cette compétence en parlant seulement avec les gens qui en savent déjà autant que vous.

  • La discussion et ses conclusions seront disponibles dans les archives publiques pour toujours, rendant possible de démarrer de nouvelles discussions sans partir de zéro. Voir la section intitulée « Conspicuous Use of Archives » du Chapitre 6, Communications.

Enfin, il faut mettre dans la balance la possibilité que quelqu'un sur la liste fasse une réelle contribution à la conversation. C'est difficile de dire si c'est probable, tout dépend de la complexité du code et du degré de spécialisation requis. Mais, si vous permettez l'anecdote, je suppute que c'est plus probable qu'on peut le penser intuitivement. Sur le projet Subversion, nous (les fondateurs) étions en face de ce que nous croyions être un ensemble de problèmes complexes et profonds, sur lesquels nous avions réfléchi dur depuis des mois, et nous doutions franchement que quelqu'un sur la nouvelle liste de diffusion puisse apporter une contribution utile à la discussion. Alors, nous avons emprunté la voie de la facilité et nous avons commencé à batailler sur des idées techniques via des courriels privés, jusqu'à ce qu'un observateur du projet [10] présenti ce qui ce passait et demanda que la discussion soit déplacée sur la liste publique. Nous le fîmes, un peu à reculons il est vrai, et nous furent sidérés par le nombre de commentaires pertinents et par les suggestions qui en ont résultées. Dans de nombreux cas, les gens apportèrent des idées qui ne nous étaient jamais venues à l'esprit. Il apparut qu'il y avait des gens très pointus sur cette liste ; ils étaient simplement à attendre le bon appât. Il est vrai que les discussions qui ont suivies prirent plus de temps que si la conversation était restée privée, mais elles furent tellement plus productives que cela justifiait largement cette rallonge.

Sans descendre aux généralités telle que "le groupe est toujours meilleur que l'individu" (nous avons tous rencontré suffisamment de groupes pour le savoir), nous pouvons mettre l'accent sur le fait que les groupes excellent vraiment dans certaines activités. La revue par pair massive est une ; générer un grand nombre d'idées en est une autre. La qualité des idées dépend de la qualité intellectuelle des contributeurs, bien entendu, mais vous ne connaissez pas le type de penseurs auquel vous vous adressez avant de les avoir stimulés avec un problème coriace.

Bien sûr, certains discussions doivent rester privées, ce livre en donnera des exemples. Mais le principe directeur devrait être : s'il n'y a pas de vrai raison pour que ce soit privé, ça doit être publique.

Y arriver requiert des actions effectives. Il n'est pas suffisant de s'assurer que vos propres messages vont dans une liste publique, vous devez aussi guider les conversations inutilement privées des autres vers la liste. Si quelqu'un démarre une discussion privée, et qu'il n'y aucune raison que cette discussion reste privée, c'est à vous d'ouvrir une méta-discussion immédiatement sur la liste. Ne commentez même pas le sujet initial tant que vous n'avez pas déplacé la conversation dans un endroit publique, à moins de vous assurer que la confidentialité est requise. Si vous procédez de la sorte sur le long terme, les gens saisiront le message assez rapidement et utiliseront la liste publique par défaut.

Tuez l'agressivité dans l'oeuf

Vous devez assurer une politique de tolérance zéro contre les comportements agressifs ou insultants, et ce dès les tous premiers message de la liste publique du projet. La tolérance zéro ne signifie pas de prendre des mesures automatiques. Vous n'avez pas, par exemple, à supprimer quelqu'un de la liste de diffusion quand il attaque un autre contributeur, ou de lui enlever le droit de commit parce qu'il a tenu des propos déplacés (en théorie, vous pouvez éventuellement en arriver là, mais seulement après que tous les autres moyens aient échoués -ce qui n'est pas le cas, par définition, lorsque le projet démarre). La tolérance zéro signifie simplement qu'aucun mauvais comportement ne demeurera sans réponse. Par exemple, si quelqu'un envoi un message mélangeant un commentaire technique avec une attaque ad hominem envers un autre développeur du projet, il est impératif que vous répondiez à l'attaque ad hominem en premier et dans une question spécifique, et seulement après à la question technique.

Il est malheureusement très facile, et régulier, que les discussions constructives dérivent en de véritables pugilats. Les gens ont tendance à dire par courriel ce qu'il ne diraient jamais face à face. Le sujet de discussion amplifie seulement cet effet : dans un débat technique, les gens pensent souvent qu'il n'y a qu'une bonne réponse à une question donnée, et que tout désaccord avec cette réponse peut seulement s'expliquer par l'ignorance ou la stupidité de celui qui l'exprime. Il n'y a qu'un pas entre qualifier de stupide la proposition technique de quelqu'un et le qualifier de stupide lui-même. Il est en fait souvent difficile de déterminer ou fini le débat et ou commence l'attaque personnelle ; c'est la raison pour laquelle les réponses drastiques ou les punitions ne sont pas une bonne idée. Lorsque vous pensez que le débat s'envenime, envoyez plutôt un message mettant l'accent sur l'importance de conserver la discussion amicale, sans accuser quiconque d'être délibérément agressif. De tels messages de "bonne conduite" ont néanmoins tendance à sonner comme une leçon de maîtresse de maternelle :

Tout d'abord, veillons à éviter les potentielles attaques ad hominem; par exemple celle de qualifier la conception de la couche sécurité de J. de "naïve et ignorante des principes de base de la sécurité informatique". Ce peut ou non être vrai, mais c'est hors propos. J. a fait cette proposition de bonne foi. Si elle présente des lacunes, analysons les, et nous les corrigerons ou nous proposerons une nouvelle architecture. Je suis persuadé que M. n'avait pas l'intention de blesser J., mais la phrase était maladroite et nous essayons de rester constructifs.

Maintenant, sur la proposition en elle-même, je pense que M. avait raison de dire que...

Bien que pompeux, ce style de réponse a un effet non négligeable. Si vous reprenez avec constance les débordements, mais sans demander de faire ou de d'accepter des excuses, vous laisser aux gens la possibilité de garder la tête froide et de montrer leur meilleur jour en se comportant de façon plus appropriée la fois suivante. Un des secrets est de ne jamais faire de cette méta-discussion le sujet principal. Elle doit toujours être traitée séparément, comme préface dans votre réponse. Notifiez en passant que "les choses ne se traitent pas de cette manière ici", mais retombez tout de suite sur le sujet initial pour laisser la possibilité aux gens de répondre quelque chose en rapport. Si quelqu'un proteste qu'il ne mérite pas votre réprimande, refusez simplement de répondre à cette escalade. Vous pouvez soit ne pas répondre (si vous pensez qu'il s'agissait simplement d'un accès d'humeur), soit dire que vous êtes désolé si vous avez réagit trop vivement et qu'il est difficile de percevoir toutes les nuances dans un courriel, puis revenez au sujet principal. N'insistez jamais pour exiger des excuses, que ce soit de façon publique ou privée, de quelqu'un qui s'est mal comporté. C'est une bonne chose s'il décide de son propre chef d'envoyer une excuse, mais en exiger ne cause que du ressentiment.

L'étiquette doit être reconnue comme l'un des comportements du groupe. Cela aide le projet car les développeurs peuvent être dissuadés de participer (même des projets sur lesquels ils aimeraient travailler) par les polémiques. Vous pouvez même ne jamais vous en rendre compte; quelqu'un peut scruter la liste, constater qu'il faut être armé d'une cuirasse pour participer au projet, et donc décider de ne pas s'y investir du tout. Garder les forums amicaux est une question de vie ou de mort sur le long terme, et c'est bien plus facile à faire tant que le projet est encore jeune. Une fois que l'étiquette rentre dans les moeurs, vous n'aurez plus à être le seul à la promouvoir, elle le sera par tous.

Pratiquez la revue par pairs

L'une des meilleurs bonnes pratiques pour favoriser le développement d'une communauté productive est de permettre aux gens d'observer leur code les uns les autres. Un peu d'infrastructure technique est requise pour y arriver, en particulier il faut activer l'envoi de courriels sur commit (voir la section intitulée « Commit emails » pour plus de détails). Le but est d'émettre un courriel contenant les annotations et les changements (diff) de chaque commit de code (voir Diff du la section intitulée « Vocabulaire de la gestion de versions »). La revue de code est la pratique consistant à consulter ces courriels au fil de l'eau en y recherchant des bogues ou de possibles améliorations.[11]

La revue de code répond simultanément à plusieurs objectifs. Il s'agit de l'exemple le plus évident de revue par pairs dans le monde du libre, et elle aide à assurer un bon niveau de qualité au logiciel. Chaque bogue incorporé dans la version finale l'a été parce qu'il n'a pas été détecté à temps ; ainsi, plus d'yeux observent les commits, plus le nombre de bogues sera limité. Mais ce processus sert aussi un autre but plus indirect : il conforte les gens dans l'idée que ce qu'ils font a de l'importance, puisqu'il est évident que personne ne prendrait le temps de faire une revue s'il ne se souciait de son effet sur le programme. Les gens produisent leur meilleur travail quand ils savent que d'autres l'évalueront.

Les revues doivent être publiques. Même lorsque j'ai été assis avec d'autres développeurs dans la même pièce physique, et que l'un d'entre nous avait fait un commit, nous prenions garde à ne pas faire la revue verbalement, mais de l'envoyer plutôt sur la liste. Tout le monde profite de cette revue. Les gens suivent les commentaires et quelque fois y découvrent des erreurs, et même si ce n'est pas le cas, cela leur rappelle que la revue est une activité régulière, comme laver la vaisselle ou tondre la pelouse.

Dans le projet Subversion, la revue de code n'a pas été initialement un exercice régulier. Il n'y avait aucune garantie que chaque commit serait revu, même on pouvait consulter une modification si elle se trouvait dans notre champs d'action. Les bogues se glissaient sans difficulté alors qu'ils aurait pu et auraient du être filtrés. Un développeur nommé Greg Stein, qui connaissait la valeur de la revue par pair d'un ancien travail, décida qu'il allait montrer l'exemple en faisant la revue de chaque commit arrivant dans le référentiel. Chaque modification était rapidement suivie d'un courriel de Greg envoyé à la liste développeur dans lequel il disséquait le commit, analysait les problèmes potentiels, et proposait quelque fois un peu de code bien tourné. Instantanément, il attrapa des bogues et des méthodes de programmation non optimales qui autrement auraient glissées sans que personne ne s'en rende compte. Il est à noter qu'il ne s'est jamais plaint d'être le seul à faire ces revues, même s'il y consacrait une quantité de temps importante. Il en chantait simplement les louanges dès qu'il le pouvait. Assez rapidement, d'autres personnes, moi compris, commencèrent à l'imiter. Quelle était notre motivation ? Non pas que Greg avait jeté l'opprobre sur nous, mais il avait prouvé que la revue de code était une façon intelligente de dépenser son temps, et que chacun pouvait contribuer autant au projet en analysant le code des autres qu'en écrivant du code soi-même. Une fois qu'il l'eu démontré, ceci devint le comportement par défaut, au point qu'un commit sans réaction aurait causé de l'angoisse à son émetteur, et qu'il aurait demandé sur la liste si quelqu'un avait déjà pu faire la revue. Plus tard, Greg trouva un travail qui ne lui laissait peu de temps pour Subversion, et du arrêter ses revues régulières, mais cette coutume était si bien enracinée en nous tous qu'elle est toujours activement pratiquée depuis un temps immémorial.

Commencez à faire des revues depuis le premier commit. Les types de problèmes les plus faciles à détecter de cette façon sont les vulnérabilités de sécurité, les fuites mémoire, les manques de commentaires ou de documentation d'API, les effets de bord, les erreurs de conception dans la hiérarchie d'appels, et d'autres problèmes qui nécessitent un minimum de recul pour être détectés. Néanmoins, même les problèmes à plus grande échelle comme l'incapacité à mutualiser du code à un seul endroit devient visible à force de revues, parce que le souvenir des revues précédentes renseigne les revues courantes.

Ne vous inquiétez pas si vous ne trouver rien à commenter, ou si vous de maîtrisez pas assez toutes les parties du code. Il y aura souvent quelque chose à dire sur tout commit ; même si vous ne trouvez rien à redire, quelque chose peut faire l'objet d'un commentaire admiratif. L'important est de rendre clair le fait que chaque développeur sache que ce qu'il fait est vu et compris. Bien entendu, la revue du code n'absout pas les programmeurs de leur responsabilité à vérifier et à tester leurs modifications avant de commiter; personne ne devrait dépendre de la revue pour détecter des problèmes qu'on aurait dû trouver soit-même.

Lorsque vous ouvrez un projet propriétaire, soyez attentif à la gestion du changement

Si vous libérez un projet existant, sur lequel travaillent des développeurs actifs et accoutumés à travailler dans un environnement propriétaire, assurez vous que tous comprennent qu'un changement important va en suivre, et de votre coté, vérifiez que vous êtes en mesure de faire face leurs réactions.

Représentez vous la situation telle qu'ils la perçoivent : tout le code et les décisions sur la conception sont faites par un groupe de développeurs qui connaissent tous plus ou moins autant le logiciel, qui reçoivent la même pression du même management, et qui connaissent leurs forces et faiblesses respectives. Vous leur demandez soudain d'exposer leur code pour qu'il soit scruté par des étrangers ne formant leur jugement que sur le résultat final, et sans prendre en compte la pression de la hiérarchie qui a pu forcer certaines décisions. Ces étrangers posent de nombreuses questions, des questions qui déstabilisent les développeurs lorsqu'ils réalisent que la documentation sur laquelle il avait tant travaillé est toujours inadéquate (c'est inévitable). Ajoutez le fait que tous ces arrivants sont inconnus et sans visage... Si l'un des développeurs manquait d'assurance, imaginez à quel point ce sentiment se trouvera exacerbé quand des nouveaux contributeurs lui désigneront des faiblesses dans son code, et pire, devant ses collègues. A moins de disposer d'une équipe de développeurs parfaits, c'est inévitable, en fait cela se produira probablement pour tous. Non pas qu'ils soient de mauvais programmeurs, c'est simplement qu'au dessus d'une certaine taille un projet a des bogues et la revue par pairs permettra de les détecter (voir la section intitulée « Pratiquez la revue par pairs » précédemment dans ce chapitre). Dans le même temps, les nouveaux arrivants ne seront pas eux mêmes sujets à la revue par pairs puisqu'ils ne peuvent produire de code tant qu'ils ne sont pas suffisamment familiers du projet. Pour vos développeurs, cela peut être pris comme une situation de critique unilatérale. Il y a donc danger de voir apparaître une mentalité d'assiégés parmi les vétérans.

La meilleure façon de l'éviter est de prévenir le groupe de ce qui va arriver. Expliquer lui que l'inconfort initial est parfaitement normal, et rassurez en assurant que les choses iront en s'améliorant. Certains de ces messages peuvent rester privés, avant que le projet ne soit ouvert. Mais vous pouvez également trouver utile d'informer sur la liste publique qu'une nouvelle méthode de développement pour le projet se met en place, et qu'un temps d'adaptation sera nécessaire. La meilleur chose que vous puissiez faire est de piloter par l'exemple. Si vous constatez que vos développeurs ne répondent pas suffisamment aux questions des débutants, leur demander simplement d'y répondre plus ne fera pas avancer les choses. Ils peuvent manquer de discernement entre ce qui nécessite une réponse ou pas, ou ne pas savoir comment prioriser le travail de développement face à la nouvelle charge amenée par cette communication externe. Le moyen de les faire participer est de participer vous mêmes. Soyez sur la mailing liste publique, et répondez y à des questions. Si vous n'avez pas l'expertise pour répondre à une question, désignez clairement un développeur qui la possède, et vérifiez qu'il fournira une solution ou au moins une réponse. Il peut être naturellement tentant sur le long terme de glisser vers des discussion privées, ce qui constituait après tout la situation antérieure. Assurez vous de vous inscrire sur les mailing listes internes sur lesquelles cela peut se produire pour être en mesure de demander à déplacer ces discussions vers la mailing liste publique.

Il existe d'autres considérations de long terme à prendre en compte lorsqu'on ouvre un projet propriétaire. Chapitre 4, Social and Political Infrastructure explore les techniques de management permettant de faire travailler avec succès contributeurs bénévoles et salariés, et Chapitre 9, Licenses, Copyrights, and Patents discute de la nécessité d'un travail juridique consciencieux lorsqu'on ouvre une base de code privée qui peut contenir du code écrit ou "détenu" par d'autres tiers.

Annoncer le projet

Une fois le projet présentable -non pas parfait, simplement présentable- vous êtes prêt à l'annoncer Ubi et orbi. C'est en fait un processus très simple : allez sur http://freshmeat.net/, cliquez sur Submit en haut de la barre de navigation, et saisissez un formulaire d'annonce de votre nouveau projet. Freshmeat est l'endroit que tout le monde scrute pour les annonces de nouveaux projets. Veillez simplement à garder un oeil sur ce site pour observer l'évolution du bouche à oreille. Si vous connaissez des listes de diffusion ou des newsgroups pour lesquelles une annonce de votre projet serait pertinente et intéressante, postez y un message, mais soyez attentif à n'envoyer qu'un unique message par forum, et en dirigeant les personnes vers votre propre liste (en fixant l'entête Reply-to à l'adresse de votre propre liste). Les messages doivent être courts et aller droit à l'essentiel :

To: discuss@lists.example.org
Subject: [ANN] Projet Scanley d'indexeur textuel
Reply-to: dev@scanley.org

Ceci est un message unique d'annonce de création du projet Scanley,
 un indexeur textuel libre et un moteur de recherche doté d'une
riche API à destination des programmeurs désirant proposer des
services de recherche pour de grandes collections de fichiers texte.
Scanley fonctionne, est toujours en développement actif, et 
recherche des développeurs et des testeurs.

Site du projet: http://www.scanley.org/

Fonctionnalités :
   - Recherches texte, HTML et XML
   - Recherche de mots ou de phrases entières
   - (prévu) Recherches approchantes
   - (prévu) Mise à jour incrémentales des index
   - (prévu) Indexation de sites Web distants

Pré-requis :
   - Python 2.2 ou supérieur
   - Suffisamment d'espace disque pour les index (approximativement 2 x
la taille des données traitées)

Pour davantage d'informations, veillez consulter scanley.org.

Merci,
-J. Random

(Voir la section intitulée « Publicity » du Chapitre 6, Communications pour des conseils sur la façon d'annoncer les versions suivantes et les autres événements du projet.)

Il existe actuellement un large débat dans le monde du libre sur la nécessité ou pas de démarrer avec du code fonctionnel, ou si un projet peut bénéficier du fait d'être encore en conception. J'ai longtemps pensé que démarrer avec un code fonctionnel était le plus important facteur de réussite d'un projet, étant ce qui différencie les vrais projets des gadgets, et que des développeurs sérieux ne seraient attirés que car quelque chose de suffisamment concret.

Il s'avéra que ce n'est pas le cas. Dans le projet Subversion, nous avons démarré avec un document de conception, un noyau de développeurs intéressés et en contact, beaucoup de tapage, et aucun code fonctionnel. A ma grande surprise, le projet acquis des participants actifs dès son démarrage, et au moment ou nous obtinrent du code fonctionnel, un nombre assez grand de volontaires étaient déjà largement impliqués. Subversion n'est pas le seul exemple ; le projet Mozilla a également été lancé sans code fonctionnel, et a atteint aujourd'hui le statut de navigateur Web très populaire.

Face à l'évidence, j'eus à ravaler mes certitudes sur la nécessité absolue de disposer de code fonctionnel pour lancer un projet. Un code fonctionnel est encore la meilleure fondation d'un projet pour son succès, et c'est une bonne règle que d'attendre d'en disposer avant de l'annoncer. Il existe néanmoins des circonstances où le faire en avance est utile. Je pense qu'il faut au moins proposer un document de conception correct, ou un cadre de développement (bien entendu ils pourront être modifiés par des retours des visiteurs) mais il doit y avoir quelque chose de concret, de plus tangible que de simples bonnes intentions, pour que les gens s'y fassent les dents.

Une fois que vous avez fait votre annonce, n'attendez pas qu'une horde de volontaires rejoignent immédiatement votre projet. En général, le résultat d'une telle annonce se résume à quelques questions de politesse, quelques inscriptions sur la liste de diffusion, et tout continue à peu près comme avant. Mais sur le long terme, vous noterez une progression constante de la participation à la fois en nouveaux contributeurs de code et d'utilisateurs. Faire l'annonce est seulement planter la graine : il peut falloir un temps considérable pour que la nouvelle croissent et se répande. Si le projet donne constamment en retour à ceux qui s'y investissent, la nouvelle se répandra car les gens aiment partager lorsqu'ils trouvent quelque chose de valable. Si les choses se passent bien, les dynamiques de réseau de type exponentielles transformeront lentement le projet en une communauté complexe, où vous ne connaîtrez pas nécessairement le nom de tous et où vous ne pourrez plus suivre chaque conversation. Le chapitre suivant concerne la façon de travailler dans cet environnement.



[11] C'est en général de cette façon qu'est réalisée la revue de code dans les projets libres. Dans des projets plus centralisés, cette revue peut également se faire en réunissant plusieurs personnes et en analysant des listings de code source à la recherche de problèmes spécifiques ou de patrons de conception à mettre en place.

Chapitre 3. L'infrastructure technique

Les projets dans le monde du logiciel libre reposent sur des technologies assurant l'acquisition sélective et l'intégration de l'information. Plus vous êtes compétent dans l'utilisation de ces technologies, plus votre projet sera performant. Ceci devient de plus en plus vrai à mesure que le projet grandit. Une bonne gestion de l'information préserve les projets libres de s'écraser sous le poids de la loi de Brooks[12] qui affirme qu'ajouter des ressources humaines à un projet en retard ne fait que d'accentuer ce retard. Fred Brooks avait observé que la complexité d'un projet augmentait suivant le carré du nombre de participants. Lorsque peu de gens sont impliqués, il est aisé à chacun de communiquer avec les autres, mais lorsque des centaines de personnes travaillent sur le même projet, il n'est plus possible à chaque personne de rester constamment informé de ce que font tous les autres. Si la bonne gestion d'un projet libre doit tendre à donner à chacun le sentiment de travailler dans la même pièce, que ce passe-t-il si tout le monde se met à parler en même temps dans une pièce bondée ?

Le problème n'est pas nouveau. Dans les assemblées réelles, la solution est l'application de la procédure parlementaire : un guide formel dérivant la manière de mener les discussions en temps réel dans de larges groupes de personnes, d'assurer que de réels désaccords ne soient pas noyés dans des commentaires d'approbation, de former des sous-comités, de reconnaître qu'une décision a été prise, etc. Une part importante de la procédure parlementaire est la spécification des interactions du groupe avec le système de gestion de l'information. Certaines remarques sont faites "pour enregistrement", les autres non. L'enregistrement lui-même est sujet à des manipulations directes et est conçu pour être non pas une transcription littérale mais une représentation de ce qui a été dit, selon le groupe, et avec l'accord de tous ses membres. L'enregistrement n'est pas monolithique mais prend différentes formes selon son usage. Il comprend le memorandum des interventions individuelles et de la réunion dans son ensemble, des résumés, des agendas et leurs annotations, des rapports de comité, des rapports réalisés par des personnes non présentes, des listes d'actions, etc.

Du fait qu'Internet ne soit pas réellement une salle, nous n'avons pas à nous soucier de reproduire ces parts de la procédure parlementaire qui tiennent certains silencieux pendant que d'autres s'expriment. Mais dans le domaine des techniques de gestion de l'information, les projets libres s'avèrent être au moins aussi performants que la procédure parlementaire. Comme la plupart des communications sont écrites dans le développement du libre, des systèmes élaborés ont émergés pour router et étiqueter correctement les données ; pour limiter les doublons et ainsi éviter les divergences malencontreuses ; pour stocker et retrouver des données ; pour corriger des informations erronées ou obsolètes et pour mettre en relation des informations les unes avec les autres lorsque des connections apparaissent. Les participants actifs des projets libres internalisent beaucoup de ces techniques et s'attacheront souvent à effectuer des opérations complexes manuelles pour s'assurer que l'information est correctement routée. Mais la réussite dans cette entreprise dépend au final d'outils sophistiqués. Le media de communication doit autant que possible se charger lui-même du routage, de l'étiquetage et de l'enregistrement des données et doit rendre l'information la plus facile possible à manipuler par les humains. En pratique, les humains devront néanmoins toujours intervenir dans de nombreux points du processus, et il est important que le logiciel rende également ces interventions aisées, mais en général si les humains s'attachent à correctement étiqueter et router l'information dès son entrée dans le système, le logiciel est configuré pour tirer un maximum de ces méta-données.

Le contenu de ce chapitre est essentiellement pratique, basé sur l'expérience dans l'utilisation de plusieurs logiciels spécifiques et de pratiques méthodologiques. Cependant, son objet n'est pas de vous former à une liste particulière de techniques mais de monter par de petits exemples l'attitude générale qui encourage la meilleure gestion de l'information dans un projet. Cette attitude englobe à la fois des compétences techniques et relationnelles. Les compétences techniques sont essentielles car les logiciels de gestion de l'information projet demandent une configuration initiale puis une maintenance et enfin des ajustements lorsque de nouveaux besoins apparaissent (par exemple, voir la discussion sur la manière de gérer la croissance du projet dans la section intitulée « Pre-Filtering the Bug Tracker » plus loin dans ce chapitre). Les compétences relationnelles quant à elles sont également nécessaire car les communautés humaines nécessitent également de la maintenance : il n'est pas toujours trivial au premier abord de savoir comment tirer le mailleur de ces outils, et dans certain cas les projets peuvent avoir des conventions contradictoires (par exemple, voir la discussion sur le paramétrage de l'entête Reply-to sur les posts de liste de diffusion, dans la section intitulée « Les listes de diffusion »). Chacun doit être encouragé au bon moment et de la bonne façon à travailler du mieux qu'il peut à garder le projet bien organisé. Plus un contributeur est impliqué dans le projet, plus il se formera à des techniques complexes et spécialisées.

La gestion de l'information n'a pas de solutions sur l'étagère, il y a trop de variables à prendre en compte. Même si votre obtenez à un système totalement configuré tel que vous le désiriez et que la communauté fonctionne correctement, certaines pratiques peuvent ne plus tenir la charge lorsque le projet grandit. Un autre cas est la fracture technologique : le projet se stabilise, les développeurs et utilisateurs s'installent dans une relation confortable avec une infrastructure technique stable quand soudain quelqu'un arrive et invente un nouveau type de service de gestion de l'information et peut après, les nouveaux visiteurs demandent pourquoi votre projet n'en dispose pas. C'est par exemple ce qui est arrivé aux projets démarré avant l'émergence du wiki (voir http://en.wikipedia.org/wiki/Wiki). Bien des choix sont subjectifs et impliquent des compromis entre le confort des producteurs d'information et celui des consommateurs, ou entre le temps requis à configurer les outils face aux bénéfices qu'ils apportent au projet.

Prenez garde à ne pas trop automatiser, c'est à dire d'automatiser des actions qui nécessitent une prise en compte humaine. L'infrastructure technique est certes importante, mais ce qui fait vivre votre projet est l'attention, et une expression intelligente de cette attention, par des humains impliqués dans le projet. Elle est globalement un ensemble de commodités donné aux humains pour y parvenir.

Les besoins d'un projet

La plupart des projets Open Source offrent un minimum d'outils pour la gestion de l'information :

Site Web

La vitrine de votre projet aux yeux du public (centralisé et à sens unique). Le site Web peut également servir d'interface administrative à d'autres outils du projet.

Listes de diffusion

Traditionnellement le principal moyen de communication, et aussi le plus actif, au sein du projet et le « médium d'enregistrement »

Contrôle de versions

Permet aux développeurs de contrôler facilement les changements apportés au code, les régressions et de gérer les branches de développements parallèles. Elle permet à chacun d'observer les modifications du code.

Référencement de bogues

Permet aux développeurs d'avoir l'historique de leurs travaux, de se coordonner les uns avec les autres et de planifier les correctifs. Permet à chacun de connaître le statut précis des bogues, et les informations liées (par exemple, les conditions de leur reproductibilité). La même méthode peut d'ailleurs être employée pour faire le suivi, non seulement des bogues, mais également des tâches, des versions, des nouvelles fonctionnalités, etc.

Messagerie instantanée / chat en temps réel

Un endroit pour les discussions et les échanges en mode questions/réponses rapides et simples. N'est pas toujours archivé complètement.

Chaque outil, dans cet ensemble, satisfait un besoin particulier, mais leurs fonctions sont étroitement liées et ces outils doivent être conçus pour fonctionner ensemble. Plus loin, nous verrons comment ils peuvent le faire, et surtout, comment faire pour que les gens les utilisent. Le site Web ne sera pas évoqué tout de suite, car il s'agit plus d'un ciment pour les autres composants que d'un outil à part entière.

Vous pouvez vous éviter les prises de tête liées au choix et à la configuration de tous ces outils en optant pour une forge : un serveur qui offre, prêts à l'emploi, des modèles avec tous les outils nécessaires pour gérer un projet Open Source. Voir la section intitulée « Canned Hosting » plus loin dans ce chapitre pour une évaluation des avantages et des inconvénients des forges.

Les listes de diffusion

Les listes de diffusion sont la base de la communication au sein d'un projet. Si un utilisateur est confronté à un espace de dialogue, en dehors des pages Web, il y a de fortes chances que ce soit une des listes de diffusion du projet. Mais avant d'expérimenter la liste de diffusion elle-même, il sera en contact avec l'interface de la liste de diffusion ; c'est à dire le mécanisme par lequel il peut rejoindre la liste (« souscrire »). Ceci nous amène à la règle #1 des listes de diffusion :

N'essayez pas de gérer une liste de diffusion à la main: procurez-vous un logiciel de gestion de listes.

Il serait tentant de repousser cela à plus tard. Le temps passé à installer un logiciel de gestion de listes peut sembler peu rentable au début. Gérer à la main de petites listes générant peu de trafic semble séduisant : établissez simplement une adresse d'abonnement qui redirige vers votre boîte mail, et quand quelqu'un l'utilise, ajoutez (ou enlevez) son adresse mail dans un fichier texte qui contient toutes les adresses de la liste. Qu'y a-t-il de plus simple ?

Le hic, c'est qu'une bonne gestion de listes de diffusion, ce que les gens sont en droit d'attendre, n'est pas simple du tout. Il ne s'agit pas simplement d'abonner et de désabonner les utilisateurs quand ils le demandent. Il s'agit également de faire de la modération pour empêcher le spam, d'offrir des versions résumées, et, message par message, de fournir de l'information standard et de l'information orientée projet grâce à des messages pré-écrits, ainsi que diverses autres choses. Un être d'humain gérant lui-même une adresse de souscription ne peut assurer que le strict minimum, et même en cela, il n'est pas aussi fiable et performant qu'un logiciel.

Les logiciels modernes de gestion de liste offrent au minimum les fonctionnalités suivantes :

Inscription par e-mail et par le Web

Quand un utilisateur s'abonne à une liste, il devrait recevoir rapidement un message d'accueil automatique en réponse, lui indiquant ce à quoi il s'est abonné, quels sont les possibilités offertes par le logiciel de liste de diffusion, et (ce qui est le plus important) comment se désinscrire. Bien sûr, cette réponse automatique peut être personnalisée pour donner plus d'informations sur le projet, comme par exemple l'adresse du projet, où trouver la FAQ, etc.

L'abonnement en mode résumé ou message par message

En mode résumé, l'abonné reçoit un courrier par jour contenant toutes les activités du jour de la liste. Pour les gens qui suivent une liste de manière détachée, sans participer, le mode résumé est souvent préférable car il permet de survoler rapidement tous les sujets et évite la distraction de recevoir des e-mails à n'importe quel moment.

Les possibilités de modération

La modération sert à vérifier les messages pour s'assurer que ce n'est pas a) du spam et b) hors sujet avant qu'ils ne soient envoyés à la liste entière. La modération demande une intervention humaine, mais les logiciels peuvent mâcher une grosse partie du travail. Nous reviendrons à la modération plus tard.

Interface administrative

L'interface administrative permet, entre autres choses, à l'administrateur de retirer les adresses obsolètes facilement. Cela peut devenir urgent quand l'adresse d'un destinataire commence à renvoyer automatiquement des messages du type « Je ne suis plus à cette adresse » à la liste à chaque e-mail (certains logiciels de listes de diffusion peuvent même les détecter seuls et désabonner ces personnes automatiquement).

Manipulation des en-têtes

Beaucoup de gens ont mis en place des filtrages sophistiqués et des règles de réponse dans leur logiciel de messagerie. Les logiciels de listes de diffusion peuvent ajouter et manipuler certains en-têtes standards pour permettre à ces personnes d'en tirer partie (nous y reviendrons).

Archivage

Tous les messages des listes sont enregistrés et mis à disposition sur le Web, certains logiciels de listes de diffusion proposent des interfaces spéciales pour assurer leur compatibilité avec des utilitaires d'archivage tiers comme MHonArc (http://www.mhonarc.org/). Comme nous le verrons dans la section « la section intitulée « Conspicuous Use of Archives » » dans le Chapitre 6, Communications l'archivage est crucial.

Retenez simplement ici que la gestion des listes de diffusion est un problème complexe, ayant déjà reçu beaucoup d'attention, mais en grande partie résolu. Vous n'êtes pas obligé de devenir expert sur le sujet, mais vous devriez savoir qu'il y a toujours de nouvelles choses à découvrir et que la gestion des listes demandera votre attention de temps à autres au cours de la vie de votre projet de logiciel libre. Ci-dessous nous allons examiner quelques uns des principaux problèmes rencontrés lors de la configuration des listes de diffusion.

Se prémunir du spam

Entre le moment où j'écris cette phrase et le moment où elle sera publiée, le problème du spam sur Internet aura sûrement pris des proportions beaucoup plus importantes ou, en tout cas, on le ressentira comme tel. Il fut un temps, il n'y a pas si longtemps, où l'on pouvait créer une liste de diffusion sans avoir à prendre de mesures de protection contre le spam. De temps en temps, on pouvait recevoir un e-mail égaré, mais c'était suffisamment rare pour que cela reste peu gênant. Cette âge d'or est révolu. De nos jours, une liste de diffusion qui ne se prémunie pas du spam sera rapidement noyée sous les e-mails indésirables, au point qu'elle en devienne inutilisable. Les protections contre le spam sont indispensables.

On peut séparer les protections contre le spam en deux catégories : celles qui empêchent les courriers indésirables d'apparaître sur la liste de diffusion et celles qui protègent les listes de diffusions contre les collecteurs d'adresses des spammeurs. La première étant la plus importante, c'est celle que nous allons détailler en premier.

Filtrer les messages

Il existe trois techniques de base pour éviter les messages indésirables, la plupart des logiciels de listes de diffusions les proposent toutes les trois. Il vaut mieux les utiliser de concert :

  1. Autoriser automatiquement les messages uniquement envoyés par les abonnés.

    Cette méthode remplit très bien son rôle, et ne demande que peu de travail puisqu'en général il suffit de modifier un paramètre dans les réglages du logiciel de liste de diffusion. Mais prenez garde, les messages qui ne sont pas automatiquement approuvés ne doivent pas être rejetés pour autant. Ils devraient subir une inspection pour deux raisons. D'abord, vous feriez mieux de laisser la possibilité aux non-abonnés d'envoyer des messages. Une personne ayant une question, ou une idée à soumettre, ne devrait pas avoir besoin de s'inscrire à la liste de diffusion juste pour y envoyer un message. Ensuite, même les abonnés envoient parfois des messages depuis d'autres adresses que celle qu'ils ont utilisées pour s'inscrire. Les adresses mails ne sont pas une méthode sure pour identifier les personnes, et par conséquent ne doivent pas servir à cela.

  2. Filtrer les messages grâce à un logiciel de filtrage.

    Si la liste de diffusion le permet (la plupart le font), vous pouvez filtrer les messages grâce à un logiciel de filtrage de spam. Le filtrage automatique des spams n'est pas parfait et ne le sera jamais vu que les spammeurs et les développeurs de filtres se sont engagés dans une course à l'armement sans fin. Malgré cela, le filtre peut largement réduire le nombre de spams en attente de modération. Comme la longueur de la liste d'attente se traduit en temps de travail manuel, tout gain obtenu à ce niveau grâce au filtrage automatique est bon à prendre.

    Je ne peux pas détailler ici la mise en place des filtres à spam. Je vous renvoie donc à la documentation de votre logiciel de liste de diffusion pour en savoir plus (voir la section appelée la section intitulée « Les logiciels » plus loin dans ce chapitre ). Les logiciels de liste de diffusion incluent souvent des fonctionnalités anti-spam, mais vous pouvez aussi choisir d'utiliser un programme de filtrage tiers. J'apprécie ces deux programmes : SpamAssassin (http://spamassassin.apache.org/) et SpamProbe (http://spamprobe.sourceforge.net/). Je ne ferai pas de liste exhaustive, il existe bien d'autres logiciels de filtrage de spam Open Source, et certains semblent également très performants. C'est simplement que j'ai utilisé moi-même les deux logiciels pré-cités et j'en ai été très satisfait.

  3. Modération.

    En ce qui concerne les courriers qui ne sont pas automatiquement admis parce qu'ils n'émanent pas d'un abonné, et qui passent au travers du logiciel anti-spam, s'il est présent, la dernière étape est la modération : le mail est redirigé vers une adresse spéciale où une personne l'examinera et l'acceptera ou le rejettera.

    Accepter un message peut se faire de deux manières différentes : vous pouvez autoriser le message juste cette fois, ou encore, dire au logiciel de liste de diffusion de laisser passer dans le futur tous les messages de cet expéditeur. En général, c'est la deuxième option qui est favorisée afin de faciliter la tâche de modération à l'avenir. La manière de procéder est différente selon les systèmes, mais en général il faut répondre à une adresse particulière en incluant la commande « accepter » (ce qui signifie « accepter uniquement ce message ») ou « autoriser » (autoriser ce message ainsi que tous les futurs messages).

    Le rejet se fait en général simplement en ignorant le courrier de modération. Si le logiciel de la liste de diffusion ne reçoit jamais de consigne pour dire qu'un message est valide, alors, il ne fera pas suivre ce message sur la liste : laisser le message de côté aura donc l'effet désiré. Il arrivera aussi parfois que vous ayez la possibilité de répondre avec une commande « rejeter » ou « empêcher » pour rejeter automatiquement et de façon permanente les messages de cet utilisateur sans même qu'ils ne repassent par la case « Modération ». En général, ce n'est pas très utile puisque la modération sert principalement à éviter le spam, et que, de toute façon, les spammeurs utilisent rarement la même adresse deux fois .

La modération doit servir uniquement au filtrage des spams et des messages hors-sujet, ainsi quand quelqu'un envoie un message sur la mauvaise liste de diffusion. Le système de modération devrait vous fournir un moyen de répondre directement à l'expéditeur, mais n'employez pas cette méthode pour répondre directement à une question adressée à la liste de diffusion, même si vous pouvez fournir une réponse rapidement. Fonctionner ainsi empêcherait le projet de se faire une idée précise du genre de questions que les gens se posent et enlèverait aux membres l'occasion de répondre aux questions eux-mêmes et/ou de voir les réponses des autres. La modération des listes de diffusion doit se borner à l'entretien de la liste de diffusion, rien d'autre.

Masquer les adresses dans les archives

Pour éviter que vos listes de diffusion ne deviennent une mine d'adresses pour les spammeurs, une technique courante est de masquer les adresses e-mail des personnes dans les archives en remplaçant par exemple

a.nonyme@undomaine.com

par

a.nonyme_AT_undomaine.com

ou par

a.nonymeNOSPAM@undomaine.com

ou d'autres codes similaires évidents pour un humain. Comme les collecteurs d'adresses à spammer fonctionnent souvent en naviguant sur les pages Web, y compris vos archives en ligne des listes de diffusion, à la recherche de séquences contenant « @ », coder les adresses est une manière de rendre les adresses e-mail invisibles ou inutilisables par les spammeurs. Cela ne change rien à la quantité de spam envoyée directement à la liste de diffusion évidemment, mais au moins vous évitez d'augmenter encore la quantité de spams envoyés directement aux utilisateurs des listes.

Le masquage d’adresse peut être sujet à controverse. Certaines personnes appréciant beaucoup cette technique seront surprises si vos archives ne le font pas automatiquement. D’autres pensent plutôt que c’est un désagrément (parce que les utilisateurs doivent aussi traduire les adresses avant usage). Certains doutent de l’efficacité de la méthode puisqu’un collecteur peut, en théorie, s’adapter aux codes les plus répandus. Notez, malgré tout, que, par expérience, le masquage d’adresse se montre efficace (voir http://www.cdt.org/speech/spam/030319spamreport.shtml).

Idéalement, le programme de gestion de listes devrait laisser le choix à chaque abonné, grâce à un en-tête oui/non ou un paramètre dans les préférences de son compte. Mais je ne connais aucun logiciel permettant ce réglage, ce qui oblige, pour l’instant, le responsable des listes à faire ce choix pour tout le monde (en supposant que le logiciel d’archivage propose cette option, ce qui n’est pas toujours le cas). Je penche légèrement en faveur du masquage d’adresses. Certaines personnes sont très prudentes et n’affichent pas leur adresse e-mail sur les pages Web ou à n’importe quel endroit qu’un collecteur d’adresse pourrait inspecter. Elles seraient déçues de voir tous leurs efforts réduits à néant par une archive de liste de diffusion. De plus, le désagrément imposé aux utilisateurs des archives par le masquage est très faible puisqu’il est fort simple de « traduire » ces adresses si vous avez besoin de contacter quelqu’un. Mais n’oubliez pas qu’au final, cela reste une course à l’armement : au moment où vous lirez ceci, les collecteurs auront peut-être évolué au point qu’ils pourront reconnaître les techniques classiques de masquage d’adresse e-mail et nous devrons trouver alors une autre parade.

Identification et gestion des en-têtes

Les utilisateurs des listes rangeront souvent les messages dans des dossiers réservés au projet, séparés de leurs autres courriers. Leur logiciel de lecture de courrier peut faire cela automatiquement en vérifiant les en-têtes du message. Les en-têtes sont les champs au début du courrier qui indiquent l’expéditeur, le destinataire, le sujet, la date et d’autres informations à propos du message. Certains en-têtes sont bien connus et obligatoires :

De: ...
À: ...
Sujet: ...
Date: ...

D’autres sont optionnels bien que plutôt courants. Par exemple, vous n’êtes pas obligés de remplir l’en-tête :

Répondre à: expediteur@addresse.courriel.ici

Mais la plupart des gens le font puisque cela permet au destinataire de répondre de manière certaine à l’auteur du message (c’est particulièrement utile si l’auteur a dû recourir à une adresse différente de celle à laquelle les réponses devraient être adressées).

Certains logiciels de courrier fournissent une interface simple d’emploi pour trier les messages en fonction du sujet. Certaines personnes demandent par conséquent que les listes de diffusion ajoutent automatiquement un préfixe aux sujets afin que leur logiciel puisse automatiquement ranger ces messages dans le bon dossier. L’idée est que l’auteur du message écrive :

Sujet: Préparation de la version 2.5

mais que le message au final soit envoyé sous cette forme (par exemple) :

Sujet: [discussion@listes.exemple.org] Préparation de la version 2.5

Bien que la plupart des logiciels de gestion de listes de diffusion proposent cette option, je ne vous recommande pas de l’activer. Le problème réglé ici peut l’être par des moyens beaucoup moins marqués, et le prix à payer, l’utilisation de l’espace dans le champ Sujet, est bien trop élevé. Les utilisateurs habitués aux listes de diffusion passent en général en revue les sujets des messages pour décider de ce qu’ils vont lire et ce à quoi ils vont répondre. Ajouter le nom de la liste au sujet peut repousser la partie importante du sujet hors de l’écran, la rendant ainsi invisible. Cela masque les informations sur lesquelles les gens se reposent durant leur inspection des sujets, réduisant par conséquent l’utilité de la liste de diffusion.

Plutôt que de grignoter une partie du champ Sujet, enseignez aux utilisateurs à utiliser les autres champs, en commençant par le champ À : qui devrait afficher le nom de la liste :

À: <discussion@listes.exemple.org>

N’importe quel logiciel de messagerie pouvant filtrer les sujets devrait également pouvoir filtrer aussi facilement le champ À :.

D’autres champs optionnels sont en général remplis pour les listes de diffusion. Baser le filtrage sur ces champs est encore plus efficace que d’utiliser les champs À : ou Cc : puisque ces champs sont remplis automatiquement par le logiciel de la liste de diffusion, certains utilisateurs s’attendent à les trouver :

list-help: <mailto:discuss-help@lists.example.org>
list-unsubscribe: <mailto:discuss-unsubscribe@lists.example.org>
list-post: <mailto:discuss@lists.example.org>
Delivered-To: mailing list discuss@lists.example.org
Mailing-List: contact discuss-help@lists.example.org; run by ezmlm

Pour la plupart, leur fonction est évidente. Voyez http://www.nisto.com/listspec/list-manager-intro.html plus d’informations à ce sujet ou, si vous cherchez des spécifications vraiment détaillées et formelles, voyez http://www.faqs.org/rfcs/rfc2369.html.

Vous remarquerez que ces champs suggèrent que si vous avez une liste nommée "list", alors vous possédez aussi les listes administratives "list-help" et "list-unsubscribe". En plus de celles-ci, il est normal que vous proposiez "list-subscribe" pour s’inscrire et "list-owner" pour contacter les administrateurs des listes. En fonction du logiciel utilisé pour gérer vos listes, ces adresses et/ou plusieurs autres adresses administratives peuvent être mises en place, vous trouverez des explications dans la documentation. En général, un descriptif complet de toutes ces adresses spéciales est communiqué à chaque nouvel utilisateur au sein d’un message de bienvenue lors de l’inscription. Vous recevrez aussi certainement une copie de ce courrier. Si vous ne la recevez pas, alors demandez ce que reçoivent les utilisateurs lorsqu’ils s’inscrivent à une liste. Gardez ce message à portée afin de pouvoir répondre aux questions concernant les fonctions de la liste de diffusion ou, encore mieux, affichez-le sur une page Web, ainsi quand quelqu’un perd sa copie des instructions et envoie un message concernant les modalités de désinscription, vous n’avez qu’à lui envoyer l’URL.

Certains logiciels de liste de diffusion possèdent une option pour ajouter les instructions de désabonnement à chaque message. Si cette option est présente, activez-la. Cela n’ajoute que quelques lignes au message, ne gêne pas sa lecture, et peut vous faire gagner beaucoup de temps en évitant que les gens ne vous écrivent, ou pire encore, n’écrivent à toute la liste, pour demander comment se désinscrire.

Le grand débat du « Répondre à »

Précédemment, dans une partie appelée la section intitulée « Évitez les discussions privées », j’ai insisté sur l’importance du fait que les discussions doivent se dérouler dans les forums publics, et j’ai dit que parfois des mesures devaient être prises pour éviter que les conversations ne se transforment en échanges d’e-mails privés. Ce chapitre traite de la mise en place des logiciels de communication du projet afin qu’ils facilitent au maximum la vie du projet. Par conséquent, si le logiciel de gestion des listes de discussion vous offre la possibilité de garder automatiquement les discussions sur la liste, vous vous direz qu’il est logique d’activer cette fonctionnalité.

En fait ce n’est pas si évident. Cette option existe, mais elle comporte des inconvénients plutôt restrictifs. Son utilisation est sujette à l’un des plus importants débats concernant la gestion des listes de diffusion, rien qui ne fera la une des journaux du soir, mais parfois les discussions à ce propos peuvent devenir tendues au sein des projets de logiciels libres. Ci-dessous je vais décrire la fonctionnalité, exposer les principaux arguments de chaque camp et vous donner mes meilleures recommandations.

La fonctionnalité en elle-même est très simple : le logiciel peut, si vous le souhaitez, remplir le champ Répondre à : automatiquement afin que les réponses soient redirigées sur la liste de diffusion. C’est à dire que, peu importe ce que l’expéditeur met dans le champ Répondre à : (ou même s’il ne le remplit pas), quand les abonnés de la liste recevront le message l’en-tête contiendra l’adresse de la liste :

Répondre à: discuss@lists.example.org

De prime abord, cela semble être une bonne chose parce que quasiment tous les logiciels de messagerie inspectent le champ Répondre à : et donc quand quelqu’un enverra une réponse, elle sera automatiquement envoyée à la liste entière : pas uniquement à l’expéditeur du message auquel on répond. Bien sûr, la personne qui répond peut modifier à la main le destinataire du message, mais l’important est que, par défaut, les réponses sont directement envoyées à la liste. C’est un très bon exemple d’utilisation de la technologie pour encourager la collaboration.

Malheureusement, il existe quelques inconvénients. Le premier est connu sous le nom du problème de Je ne peux plus retrouver mon chemin : il se peut que l’expéditeur mette sa « véritable » adresse e-mail dans le champ Répondre à : parce que, pour une raison ou pour une autre, il a utilisé une autre adresse pour envoyer le message que celle qu’il utilise pour les recevoir. Les personnes qui expédient et lisent les messages à partir de la même adresse n’ont pas ce problème, et sont même parfois surprises d’apprendre son existence. Mais, pour ceux qui utilisent leurs comptes mail de manière particulière ou qui n’ont pas de contrôle sur le champ De : dans leurs courriers (parce qu’ils écrivent depuis leur travail, ou parce qu’ils n’ont pas assez d’influence sur le département informatique), utiliser le champ Répondre à : peut être la seule manière d’être sûr que les réponses leur parviennent. Quand une personne dans cette situation envoie un message à une liste de diffusion à laquelle il n’est pas abonné, l’adresse dans le champ Répondre à : devient une information essentielle. Si le logiciel remplace cette adresse, il se peut qu’il ne reçoive jamais de réponse.

Le deuxième inconvénient est lié aux attentes et, d’après moi, c’est l’argument qui a le plus de poids contre l’automatisation du champ Répondre à :. La majorité des gens ayant l’habitude de se servir des e-mails sont accoutumés à deux choix simples pour répondre : Répondre à tous et Répondre. Tous les logiciels modernes de messagerie possèdent deux boutons distincts pour ces deux fonctions. Les utilisateurs savent que pour répondre à tout le monde (ce qui inclut la liste), ils doivent utiliser le bouton « Répondre à tous », et que pour répondre en privé à l’auteur ils doivent utiliser le bouton « Répondre ». Même si votre but est d’encourager les gens à répondre à toute la liste le plus souvent possible, il y aura parfois des circonstances qui font qu’une réponse privée est plus appropriée, par exemple, si une personne veut dire quelque chose de confidentiel à l’auteur du message original, quelque chose qui n’aurait pas sa place sur la liste publique.

Maintenant penchons-nous sur le cas où la liste a ré-écrit le champ Répondre à :. La personne qui répond appuie sur le bouton « Répondre » en s’attendant à envoyer un message privé à l’auteur du courrier. Puisque c’est ce qu’il se passe normalement, il ne prendra pas forcément la peine de vérifier l’adresse du destinataire du message. Il écrit alors son message privé et confidentiel, où il pourrait dire des choses gênantes sur une autre personne de la liste, et appuie ensuite sur « Envoi ». Alors qu’il ne s’y attendait pas, quelques minutes plus tard son message apparaît sur la liste de diffusion ! Il aurait effectivement, en théorie, dû prendre le temps de regarder avec précaution le champ Destinataire, et n’aurait pas dû supposer qu’il n’avait pas à se soucier du champ Répondre à :. Mais les expéditeurs règlent quasiment à chaque fois le champ Répondre à : de telle sorte qu’ils reçoivent la réponse (ou pour être plus précis, c’est leur logiciel de messagerie qui le fait pour eux), et beaucoup d’utilisateurs expérimentés le prennent pour argent comptant. En fait, lorsqu’une personne met une autre adresse que celle de l’expéditeur dans le champ Répondre à :, comme celle de la liste par exemple, il prendra en général la peine de le notifier dans le message, ainsi les gens ne seront pas surpris par ce qui se passe lorsqu’ils appuient sur « Répondre ».

À cause des lourdes conséquences potentielles que cela peut entraîner, je préfère configurer le logiciel de gestion de liste de manière à ce qu’il ne modifie pas le champ Répondre à :. C’est l’un des cas où l’utilisation de la technologie pour encourager la collaboration peut, à mon sens, avoir des effets pervers. Mais l’autre camp a également d’excellents arguments à faire valoir. Quel que soit votre choix, des gens de temps à autre vous demanderont pourquoi vous n’avez pas fait l’autre choix. Comme c’est quelque chose que vous ne voulez pas voir prendre de trop grandes proportions, il vaut mieux que vous ayez une réponse toute prête, une réponse qui mettra un terme au débat plutôt que de l’encourager. Ne faites pas paraître votre décision, que vous choisissiez une solution ou l’autre, comme étant la seule, l’unique valable et la bonne (même si vous pensez que c’est le cas). Insistez plutôt sur le fait que c’est un très vieux débat, que les deux camps possèdent de bon arguments, mais qu’aucun choix ne peut satisfaire tous les utilisateurs, en conséquence, vous avez pris la décision qui vous semblait la meilleure. Demandez poliment à ce que le sujet ne soit pas ré-ouvert, à moins que quelqu’un ait quelque chose de vraiment nouveau à apporter au débat, puis ne participez plus à la discussion en espérant qu’elle s’éteigne d’elle-même.

Quelqu’un pourrait suggérer de voter pour choisir une méthode ou l’autre. Vous pouvez le faire si c’est votre choix, mais je ne pense pas qu’un vote à main levée soit la meilleure solution dans ce cas. Le risque que quelqu’un se fasse surprendre par la modification du champ Répondre à : est trop important et le désagrément pour chacun est plutôt faible (devoir rappeler occasionnellement aux gens de répondre à la liste entière) pour qu’une majorité, même si c’est la majorité, impose un tel risque à une minorité.

Je n’ai pas abordé, ici, tous les aspects de ce problème, seulement ceux qui semblaient les plus importants. Si le sujet vous intéresse je vous conseille de lire ces deux documents canoniques toujours cités dans ce débat :

Malgré cette légère préférence énoncée ci-dessus, je ne pense pas qu’il existe une vérité « transcendante » à sujet, et je participe gaiement à de nombreuses listes imposant le Répondre à :. Le mieux que vous puissiez faire, est d’opter assez tôt pour une solution ou pour l’autre, et d’éviter de vous faire attirer dans un débat par la suite.

Deux rêves

Un jour, quelqu’un aura cette idée brillante d’ajouter un bouton Répondre à la liste dans un logiciel de messagerie. Pour ce faire, il se servirait des en-têtes afin de déterminer l’adresse de la liste de diffusion, et enverrait donc la réponse directement à la liste, en ne se préoccupant pas des adresses d’autres destinataires puisqu’ils sont, de toute façon et pour la plupart déjà inscrits à la liste. Finalement, d’autres logiciels de messagerie reprendraient l’idée et le débat deviendrait obsolète (en fait, le logiciel de messagerie Mutt propose déjà cette fonctionnalité.[13])

Une meilleure solution : laisser le choix à chaque abonné. Ceux qui veulent que la liste remplisse le champ Répondre à : (que se soit pour leurs propres messages ou ceux des autres) pourraient régler cette option et ceux qui ne le veulent pas pourraient aussi choisir. Je ne connais cependant aucun logiciel de gestion de liste qui permette ce choix individuel. Pour le moment on doit faire avec une configuration identique pour tous.

L’archivage

Les détails techniques concernant la mise en place de l’archivage d’une liste de diffusion sont particuliers au logiciel qui fait fonctionner la liste et dépassent le cadre de ce livre. Lors du choix, ou de la configuration, de l’archive, vous devez prendre en compte les facteurs suivants :

Mise à jour rapide

Les gens se réfèreront souvent à un message archivé envoyé une ou deux heures auparavant. Si possible, le logiciel devrait archiver chaque message instantanément, dès qu’un message apparaît dans la liste de diffusion, afin qu’il soit aussi présent dans les archives. Si cette option n’est pas disponible, alors essayez de faire en sorte que l’archive soit remise à jour au moins toutes les heures (par défaut certains logiciels d’archives se mettent à jour automatiquement chaque nuit, mais dans la pratique ce délai est bien trop important pour une liste de diffusion active).

La stabilité du référentiel

Une fois qu’un message est archivé à une URL donnée, il devrait rester accessible par la même URL à tout jamais, ou le plus longtemps possible en tout cas. Même si les archives sont reconstruites, rétablies à partir d’une sauvegarde ou réparées d’une quelconque manière, aucune URL qui a été rendue publique ne devrait être modifiée. Des coordonnées stables rendent possible l’indexation des archives par les moteurs de recherche qui sont les meilleurs compagnons des utilisateurs en quête de réponses. Des coordonnées stables sont aussi importantes, car les messages et les sujets, dans les listes de diffusion, contiennent souvent des liens vers le système de suivi de bogues (voir la section nommée la section intitulée « Bug Tracker » plus loin dans ce chapitre) ou vers d’autres documents du projets.

Idéalement, les logiciels de listes de diffusion devraient inclure l’URL du message dans les archives, ou, au moins, une portion particulière de l’URL dans les en-têtes du message lorsqu’il est distribué aux destinataires. Ainsi, ceux qui ont une copie du message peuvent savoir où il est rangé dans les archives sans avoir à se rendre sur la page des archives. C’est utile en effet, car toute opération nécessitant l’utilisation d’un navigateur prend du temps. Par contre, je ne sais pas s’il existe un logiciel proposant cette fonctionnalité. Ceux que j’ai utilisés ne le faisaient pas. C’est tout de même une fonctionnalité que vous devriez chercher (ou, si vous écrivez un logiciel de gestion de listes de diffusion, c’est une fonctionnalité que vous devriez penser à ajouter, s’il vous plaît).

Les sauvegardes

La méthode de sauvegarde des archives devrait être plutôt évidente, et la manière de les restaurer ne devrait pas être trop compliquée non plus. En d’autres termes, ne voyez pas votre logiciel d’archivage comme une boîte noire. Tout le monde (vous ou quelqu’un de votre projet) devrait savoir où les messages sont classés, et comment recréer, si nécessaire, la page d’archives depuis les messages sauvegardés. Les archives sont des données précieuses ; un projet qui les perd voit disparaître une grande partie de sa mémoire collective.

La gestion des sujets

Il devrait être possible de naviguer depuis n’importe quel message vers le sujet (groupe de messages ayant un lien) auquel appartient ce message. Chaque sujet devrait avoir sa propre URL, différente de l’URL des messages composant ce sujet.

Recherche

Un logiciel d’archivage ne proposant pas d’outil de recherche, ni dans le corps des messages, ni par auteur ou par sujet, est quasiment inutile. Remarquez que certains logiciels proposent un outil de recherche simplement en sous-traitant le travail à un moteur de recherche tiers comme Google. Passons. Mais un outil de recherche natif est en général plus précis puisqu’il permet à l’utilisateur de spécifier, par exemple, que le mot doit se trouver dans le sujet et pas dans le corps du texte.

Ceci n’est qu’une liste technique pour vous aider à évaluer et mettre en place un outil d’archivage. Comment amener les gens à vraiment utiliser cet outil pour le bien du projet, est un sujet différent, qui sera abordé dans d’autres chapitres de cet ouvrage, en particulier dans la section intitulée la section intitulée « Conspicuous Use of Archives ».

Les logiciels

Voici une liste d’outils Open Source pour la gestion et l’archivage de listes. Si le site qui héberge votre projet propose déjà une configuration par défaut, vous n’aurez peut-être jamais à faire de choix. Mais si vous devez en installer un vous-même, en voici quelques-uns. Ceux que j’ai utilisés sont Mailman, Ezmlm, MHonArc et Hyper- mail, mais cela ne veut pas dire que les autres ne sont pas bons (et bien sûr, il en existe probablement sur lesquels je ne suis pas tombé ; cette liste n’est en rien exhaustive).

Les logiciels de gestion de listes de diffusion sont :

Les logiciels d’archivage sont :

Les logiciels de gestion de versions

Un logiciel de gestion de versions (ou logiciel de gestion des révisions) est un mélange de technologie et de bonnes pratiques pour traquer et contrôler les modifications apportées aux fichiers d’un projet, en particulier au code source, à la documentation et aux pages Web. Si vous n’avez jamais utilisé un logiciel de gestion de version, la première chose que vous devriez faire est de trouver qui en a l’expérience et la maîtrise, et le convaincre de rejoindre le projet. De nos jours, tout le monde s’attend au minimum à ce que le code source du projet soit sous la surveillance d’un logiciel de gestion de versions, et votre projet ne sera pas pris au sérieux s’il n’utilise pas efficacement un tel logiciel.

Les logiciels de gestion de versions sont devenus des standards, car ils fournissent une aide précieuse dans quasiment chaque do- maine d’un projet efficace : la communication entre développeurs, la gestion des sorties, la gestion des bogues, la stabilité du code, le développement expérimental, les attributions et les autorisations de modifications. La gestion de versions vous fournit un contrôle cen- tralisé sur tous ces domaines. Le cœur de la gestion de versions est la gestion des modifications: l’identification de chaque petit changement apporté aux fichiers du projet, l’annotation de chaque modification par des métadonnées comme la date du changement, son auteur, et la possibilité de ressortir ces données pour toute demande, quelle qu’en soit la manière. C’est un mécanisme de communication avec lequel le changement est l’unité de base de l’information.

Nous n’aborderons pas tous les aspects de l’utilisation d’un logiciel de gestion de versions dans cette partie. La gestion de versions étant un vaste sujet, nous l’étudierons au fur et à mesure, tout au long du livre. Ici, nous allons nous intéresser plus particulièrement au choix et à l’installation d’un logiciel de gestion de versions, avec comme objectif la promotion du développement collaboratif.

Vocabulaire de la gestion de versions

Ce livre ne vous enseignera pas l’emploi de la gestion de versions si vous ne l’avez pas déjà expérimenté auparavant, cependant il serait impossible d’aborder ce sujet sans quelques termes clés. Ces termes sont utiles, indépendamment de tout système de gestion de versions : ce sont les noms et verbes de base de la collaboration en réseau, et ils seront employés de manière générique tout au long de ce livre. Même s’il n’existait aucun système de gestion de versions, le problème de gestion des modifications serait quand même présent, et ces mots nous fournissent un langage pour en parler de manière concise.

Commit

Apporter une modification au projet, ou, plus formellement, enregistrer un changement dans la base de données de gestion de versions, pour qu’il puisse être ajouté dans une version future du projet. Commit peut être utilisé comme un verbe ou un nom. En tant que nom, il est surtout synonyme de « modification ». Par exemple : « Je viens juste d’enregistrer un correctif pour le bogue de crash de serveur que les gens ont rapporté sur Mac OS X. Jay, pourrais-tu, s’il te plaît, vérifier le commit, et t’assurer que je ne me trompe pas au sujet de l’allocation ? »

Messages enregistrés

Quelques commentaires joints à chaque commit, décrivant la nature et le but du commit. Les messages enregistrés font partie des documents les plus importants d’un projet : ils font le lien entre le langage très technique des modifications du code et le langage plus compréhensible qui se rapporte aux fonctionnalités, aux corrections de bogues et à la progression du projet. Par la suite, dans cette section, nous étudierons comment distribuer les messages enregistrés au bon public ; de plus, la section nommée la section intitulée « Codifying Tradition » in Chapitre 6, Communications dans le chapitre 6 aborde les manières d’encourager les participants à écrire des messages enregistrés concis et utiles.

Mise à jour

Demander que les autres modifications (commit) soient incorporées dans votre propre version du projet, c’est à dire, mettre votre copie à jour. C’est une opération de routine, la plupart des développeurs mettent à jour leur code plusieurs fois par jour. Ainsi, ils savent qu’ils utilisent sensiblement la même chose que les autres. En conséquence, s’ils détectent un bogue, il y a peu de chance qu’il ait déjà été corrigé. Par exemple : « Salut, j’ai remarqué que le code d’indexation ou- blie toujours le dernier nombre. Est-ce un nouveau bogue ? » « Oui, mais il a été réparé la semaine dernière, fais une mise à jour, il devrait disparaître. »

Dépôt

Une base de données au sein de laquelle les modifications sont stockées. Certains logiciels de gestion de versions sont centralisés : il y a un unique dépôt maître qui conserve toutes les modifications du projet. D’autres sont décentralisés : chaque développeur possède son propre dépôt et les modifications peuvent être partagées entre les dépôts de manière arbitraire. Le logiciel de gestion de versions conserve un suivi des dépendances entre les modifications. Au moment de la publication d’une nouvelle version, un ensemble particulier de modifications est approuvé pour la sortie. Quant à savoir quel système est le meilleur, centralisé ou décentralisé... Cette question est l’une des vieilles guerres du développement de logiciel, essayez de ne pas vous laisser entraîner dans ce débat sur l’une des listes du projet.

Retrait

L’obtention d’une copie du projet depuis le dépôt. Un retrait produit en général une arborescence de répertoires appelée « copie de travail » (voir ci-dessous), à partir de laquelle des changements peuvent être intégrés au dépôt originel. Pour certains logiciels décentralisés de gestion de versions, chaque copie de travail est, elle-même, un dépôt, et les modifications peuvent être envoyées (ou aspirées) vers les dépôts les acceptant.

Copie de travail

L’arborescence personnelle d’un développeur contient les fichiers du code source du projet, et peut éga- lement contenir pages Web et autres documents. Une copie de travail contient également quelques méta-données prises en charge par le logiciel de gestion de versions, indiquant à la copie de travail de quel dépôt elle provient, quelles « révisions » (voir ci-dessous) des fichiers sont présentes, etc. Généralement, chaque développeur possède sa propre copie de travail dans laquelle il réalise et teste les modifications, et à partir de laquelle il commit.

Révision, Modifications et Ensemble de modifications

Une « révision » est en général une incarnation précise d’un fichier ou dossier particulier. Par exemple, si le projet commence avec la révision 6 du fichier F et qu’ensuite quelqu’un modifie F on parlera alors de la révision 7 de F. Certains systèmes parlent aussi de « révision », « modification » ou « ensemble de modifications » pour se référer à un ensemble de modifications ajoutées en même temps comme une unité conceptuelle.

These terms occasionally have distinct technical meanings in different version control systems, but the general idea is always the same: they give a way to speak precisely about exact points in time in the history of a file or a set of files (say, immediately before and after a bug is fixed). For example: "Oh yes, she fixed that in revision 10" or "She fixed that in revision 10 of foo.c."

Ces termes ont parfois une signification technique distincte selon le logiciel de gestion de versions, mais l’idée générale est toujours la même : ils fournissent un moyen de parler sans ambiguïté d’un point précis dans l’histoire d’un fichier ou d’un ensemble de fichiers : par exemple, immédiatement avant ou après la correction d’un bogue, ou encore « Ah oui, elle a corrigé cela dans la révision 10 » ou bien « Elle a corrigé cela dans la révision 10 de foo.c. » Quand quelqu’un parle d’un fichier ou d’un ensemble de fichiers sans préciser de révision particulière, on comprend généralement qu’il s’agit de la révision la plus récente.

Diff

La représentation textuelle d’une modification. Un diff montre quelles lignes ont été modifiées et comment, en ajoutant quelques lignes de contexte d’un côté ou de l’autre. Pour un développeur déjà familier avec le code, la lecture d’un diff et du code suffisent, en général, à comprendre l’impact des modifications, voire à détecter des bogues.

Mot-clé

Une étiquette pour un ensemble de fichiers donnés à une révision donnée. Les mots-clés sont en général utilisés pour résumer les idées majeures du projet. Par exemple, un mot-clé est généralement utilisé pour chaque sortie publique afin qu’on puisse obtenir, directement depuis le logiciel de gestion de versions, l’ensemble exact des fichiers/révisions compris dans cette version. Des mots-clés courants sont Release_1_0, Delivery_00456, etc.

Branche

Une copie du projet, sous gestion de versions mais isolée, afin que les modifications de cette branche n’affectent pas le reste du projet (et vice versa) , sauf quand les modifications sont « fusionnées » volontairement dans un sens ou l’autre (voir plus bas). Les branches sont aussi connues sous le nom de « lignes de développement ». Même dans un projet n’ayant pas explicitement de branches, on considère toujours que le développement s’effectue sur la « branche principale », également connue sous le nom de « ligne principale » ou «tronc».

Les branches offrent la possibilité d’isoler différentes lignes de développement les unes des autres. Par exemple, une branche peut être employée pour faire du développement expérimental qui serait trop déstabilisant pour le tronc principal. Ou, à l’inverse, une branche peut être utilisée pour stabiliser une nouvelle version. Au cours du processus de sortie, le développement normal continue sans interruption dans la branche principale du dépôt, tandis que dans la branche de sortie aucun changement n’est accepté, sauf ceux approuvés par les responsables de la parution. Ainsi, la conception de la nouvelle version n’interfère pas avec le travail de développement en cours. Voir la section la section intitulée « Use branches to avoid bottlenecks » plus loin dans ce chapitre pour une discussion plus détaillée à propos des branches.

Fusion (ou port)

Transférer une modification d’une branche à une autre. Cela englobe la fusion du tronc principal vers d’autres branches et inversement. En fait, ce sont les types de fusion les plus courants, il est rare de porter une modifica- tion entre deux branches secondaires. Voir la section appelée la section intitulée « Singularity of information » pour en savoir plus sur ce type de fusion.

« Fusion » a un deuxième sens proche : c’est ce que fait le logiciel de gestion de versions quand il voit que deux personnes ont modifié le même fichier à des endroits différents. Puisque les deux modifications n’interfèrent pas entre elles, quand l’une des personnes met à jour sa copie du fichier (contenant déjà ses propres changements), les modifications de l’autre personne seront automatiquement fusionnées. C’est très courant, particulièrement dans les projets où plusieurs personnes travaillent sur le même code. Quand deux modifications différentes se chevauchent, il en résulte un « conflit », voir ci-dessous.

Conflit

C’est ce qui se passe quand deux personnes tentent de faire des changements au même endroit du code. Tous les systèmes de gestion de version détectent automatiquement les conflits, et avertissent au moins l’un des responsable de ces modifications conflictuelles. C’est alors à l’humain de régler le conflit, et d’envoyer la résolution au logiciel de gestion de version.

Verrouiller

Une manière de se réserver les modifications sur un fichier ou un dossier particulier. Par exemple : « Je ne peux pas envoyer de modifications des pages Web en ce moment. Il semblerait qu’Alfred les ait verrouillées pendant qu’il modifie leur image de fond. » Tous les systèmes de gestion de versions ne permettent pas ceci, et ceux qui l’autorisent n’imposent pas l’utilisation de cette fonctionnalité. C'est parce que le développement simultané, parallèle, est la norme, et le fait d'empêcher l'accès à des fichiers à d'autres personnes en utilisant le verrouillage est (habituellement) contraire à cet idéal.

On dit que les systèmes de gestion de version, imposant le verrouillage avant d’enregistrer des modifications, utilisent le modèle verrouillage-modification-déverrouillage. Ceux qui ne le font pas utilisent le modèle dit de copie-modification-fusion. Une excellente explication détaillée et une comparaison de ces deux modèles peut être trouvée à l'endroit suivant : http://svnbook.red-bean.com/svnbook-1.0/ch02s02.html. En général, le modèle copie-modification-fusion est plus adapté au développement Open Source, et tous les logiciels de gestion de versions abordés dans ce livre prennent en charge ce modèle.

Choisir un logiciel de gestion de versions

As of this writing, the two most popular version control systems in the free software world are Concurrent Versions System (CVS, http://www.cvshome.org/) and Subversion (SVN, http://subversion.tigris.org/).

CVS has been around for a long time. Most experienced developers are already familiar with it, it does more or less what you need, and since it's been popular for a long time, you probably won't end up in any long debates about whether or not it was the right choice. CVS has some disadvantages, however. It doesn't provide an easy way to refer to multi-file changes; it doesn't allow you to rename or copy files under version control (so if you need to reorganize your code tree after starting the project, it can be a real pain); it has poor merging support; it doesn't handle large files or binary files very well; and some operations are slow when large numbers of files are involved.

None of CVS's flaws is fatal, and it is still quite popular. However, in the last few years the more recent Subversion has been gaining ground, especially in newer projects.[14]. If you're starting a new project, I recommend Subversion.

On the other hand, since I'm involved in the Subversion project, my objectivity might reasonably by questioned. And in the last few years a number of new open-source version control systems have appeared. Annexe A, Free Version Control Systems lists all the ones I know of, in rough order of popularity. As the list makes clear, deciding on a version control system could easily become a lifelong research project. Possibly you will be spared the decision because it will be made for you by your hosting site. But if you must choose, consult with your other developers, ask around to see what people have experience with, then pick one and run with it. Any stable, production-ready version control system will do; you don't have to worry too much about making a drastically wrong decision. If you simply can't make up your mind, then go with Subversion. It's fairly easy to learn, and is likely to remain a standard for at least a few years.

Using the Version Control System

The recommendations in this section are not targeted toward a particular version control system, and should be simple to implement in any of them. Consult your specific system's documentation for details.

Version everything

Keep not only your project's source code under version control, but also its web pages, documentation, FAQ, design notes, and anything else that people might want to edit. Keep them right next to the source code, in the same repository tree. Any piece of information worth writing down is worth versioning—that is, any piece of information that could change. Things that don't change should be archived, not versioned. For example, an email, once posted, does not change; therefore, versioning it wouldn't make sense (unless it becomes part of some larger, evolving document).

The reason versioning everything together in one place is important is so people only have to learn one mechanism for submitting changes. Often a contributor will start out making edits to the web pages or documentation, and move to small code contributions later, for example. When the project uses the same system for all kinds of submissions, people only have to learn the ropes once. Versioning everything together also means that new features can be committed together with their documentation updates, that branching the code will branch the documentation too, etc.

Don't keep generated files under version control. They are not truly editable data, since they are produced programmatically from other files. For example, some build systems create configure based on the template configure.in. To make a change to the configure, one would edit configure.in and then regenerate; thus, only the template configure.in is an "editable file." Just version the templates—if you version the result files as well, people will inevitably forget to regenerate when they commit a change to a template, and the resulting inconsistencies will cause no end of confusion.[15]

The rule that all editable data should be kept under version control has one unfortunate exception: the bug tracker. Bug databases hold plenty of editable data, but for technical reasons generally cannot store that data in the main version control system. (Some trackers have primitive versioning features of their own, however, independent of the project's main repository.)

Browsability

The project's repository should be browsable on the Web. This means not only the ability to see the latest revisions of the project's files, but to go back in time and look at earlier revisions, view the differences between revisions, read log messages for selected changes, etc.

Browsability is important because it is a lightweight portal to project data. If the repository cannot be viewed through a web browser, then someone wanting to inspect a particular file (say, to see if a certain bugfix had made it into the code) would first have to install version control client software locally, which could turn their simple query from a two-minute task into a half-hour or longer task.

Browsability also implies canonical URLs for viewing specific revisions of files, and for viewing the latest revision at any given time. This can be very useful in technical discussions or when pointing people to documentation. For example, instead of saying "For tips on debugging the server, see the www/hacking.html file in your working copy," one can say "For tips on debugging the server, see http://subversion.apache.org/docs/community-guide/," giving a URL that always points to the latest revision of the hacking.html file. The URL is better because it is completely unambiguous, and avoids the question of whether the addressee has an up-to-date working copy.

Some version control systems come with built-in repository-browsing mechanisms, while others rely on third-party tools to do it. Three such tools are ViewCVS (http://viewcvs.sourceforge.net/), CVSWeb (http://www.freebsd.org/projects/cvsweb.html), and WebSVN (http://websvn.tigris.org/). The first works with both CVS and Subversion, the second with CVS only, and the third with Subversion only.

Commit emails

Every commit to the repository should generate an email showing who made the change, when they made it, what files and directories changed, and how they changed. The email should go to a special mailing list devoted to commit emails, separate from the mailing lists to which humans post. Developers and other interested parties should be encouraged to subscribe to the commits list, as it is the most effective way to keep up with what's happening in the project at the code level. Aside from the obvious technical benefits of peer review (see la section intitulée « Pratiquez la revue par pairs »), commit emails help create a sense of community, because they establish a shared environment in which people can react to events (commits) that they know are visible to others as well.

The specifics of setting up commit emails will vary depending on your version control system, but usually there's a script or other packaged facility for doing it. If you're having trouble finding it, try looking for documentation on hooks, specifically a post-commit hook, also called the loginfo hook in CVS. Post-commit hooks are a general means of launching automated tasks in response to commits. The hook is triggered by an individual commit, is fed all the information about that commit, and is then free to use that information to do anything—for example, to send out an email.

With pre-packaged commit email systems, you may want to modify some of the default behaviors:

  1. Some commit mailers don't include the actual diffs in the email, but instead provide a URL to view the change on the web using the repository browsing system. While it's good to provide the URL, so the change can be referred to later, it is also very important that the commit email include the diffs themselves. Reading email is already part of people's routine, so if the content of the change is visible right there in the commit email, developers will review the commit on the spot, without leaving their mail reader. If they have to click on a URL to review the change, most won't do it, because that requires a new action instead of a continuation of what they were already doing. Furthermore, if the reviewer wants to ask something about the change, it's vastly easier to hit reply-with-text and simply annotate the quoted diff than it is to visit a web page and laboriously cut-and-paste parts of the diff from web browser to email client.

    (Of course, if the diff is huge, such as when a large body of new code has been added to the repository, then it makes sense to omit the diff and offer only the URL. Most commit mailers can do this kind of limiting automatically. If yours can't, then it's still better to include diffs, and live with the occasional huge email, than to leave the diffs off entirely. Convenient reviewing and commenting is a cornerstone of cooperative development, much too important to do without.)

  2. The commit emails should set their Reply-to header to the regular development list, not the commit email list. That is, when someone reviews a commit and writes a response, their response should be automatically directed toward the human development list, where technical issues are normally discussed. There are a few reasons for this. First, you want to keep all technical discussion on one list, because that's where people expect it to happen, and because that way there's only one archive to search. Second, there might be interested parties not subscribed to the commit email list. Third, the commit email list advertises itself as a service for watching commits, not for watching commits and occasional technical discussions. Those who subscribed to the commit email list did not sign up for anything but commit emails; sending them other material via that list would violate an implicit contract. Fourth, people often write programs that read the commit email list and process the results (for display on a web page, for example). Those programs are prepared to handle consistently-formatted commit emails, but not inconsistent human-written mails.

    Note that this advice to set Reply-to does not contradict the recommendations in la section intitulée « Le grand débat du « Répondre à » » earlier in this chapter. It's always okay for the sender of a message to set Reply-to. In this case, the sender is the version control system itself, and it sets Reply-to in order to indicate that the appropriate place for replies is the development mailing list, not the commit list.

Use branches to avoid bottlenecks

Non-expert version control users are sometimes a bit afraid of branching and merging. This is probably a side effect of CVS's popularity: CVS's interface for branching and merging is somewhat counterintuitive, so many people have learned to avoid those operations entirely.

If you are among those people, resolve right now to conquer any fears you may have and take the time to learn how to do branching and merging. They are not difficult operations, once you get used to them, and they become increasingly important as a project acquires more developers.

Branches are valuable because they turn a scarce resource—working room in the project's code—into an abundant one. Normally, all developers work together in the same sandbox, constructing the same castle. When someone wants to add a new drawbridge, but can't convince everyone else that it would be an improvement, branching makes it possible for her to go to an isolated corner and try it out. If the effort succeeds, she can invite the other developers to examine the result. If everyone agrees that the result is good, they can tell the version control system to move ("merge") the drawbridge from the branch castle over to the main castle.

It's easy to see how this ability helps collaborative development. People need the freedom to try new things without feeling like they're interfering with others' work. Equally importantly, there are times when code needs to be isolated from the usual development churn, in order to get a bug fixed or a release stabilized (see la section intitulée « Stabilizing a Release » and la section intitulée « Maintaining Multiple Release Lines » in Chapitre 7, Packaging, Releasing, and Daily Development) without worrying about tracking a moving target.

Use branches liberally, and encourage others to use them. But also make sure that a given branch is only active for exactly as long as needed. Every active branch is a slight drain on the community's attention. Even those who are not working in a branch still maintain a peripheral awareness of what's going on in it. Such awareness is desirable, of course, and commit emails should be sent out for branch commits just as for any other commit. But branches should not become a mechanism for dividing the development community. With rare exceptions, the eventual goal of most branches should be to merge their changes back into the main line and disappear.

Singularity of information

Merging has an important corollary: never commit the same change twice. That is, a given change should enter the version control system exactly once. The revision (or set of revisions) in which the change entered is its unique identifier from then on. If it needs to be applied to branches other than the one on which it entered, then it should be merged from its original entry point to those other destinations—as opposed to committing a textually identical change, which would have the same effect in the code, but would make accurate bookkeeping and release management impossible.

The practical effects of this advice differ from one version control system to another. In some systems, merges are special events, fundamentally distinct from commits, and carry their own metadata with them. In others, the results of merges are committed the same way other changes are committed, so the primary means of distinguishing a "merge commit" from a "new change commit" is in the log message. In a merge's log message, don't repeat the log message of the original change. Instead, just indicate that this is a merge, and give the identifying revision of the original change, with at most a one-sentence summary of its effect. If someone wants to see the full log message, she should consult the original revision.

The reason it's important to avoid repeating the log message is that log messages are sometimes edited after they've been committed. If a change's log message were repeated at each merge destination, then even if someone edited the original message, she'd still leave all the repeats uncorrected—which would only cause confusion down the road.

The same principle applies to reverting a change. If a change is withdrawn from the code, then the log message for the reversion should merely state that some specific revision(s) is being reverted, not describe the actual code change that results from the reversion, since the semantics of the change can be derived by reading the original log message and change. Of course, the reversion's log message should also state the reason why the change is being reverted, but it should not duplicate anything from the original change's log message. If possible, go back and edit the original change's log message to point out that it was reverted.

All of the above implies that you should use a consistent syntax for referring to revisions. This is helpful not only in log messages, but in emails, the bug tracker, and elsewhere. If you're using CVS, I suggest "path/to/file/in/project/tree:REV", where REV is a CVS revision number such as "1.76". If you're using Subversion, the standard syntax for revision 1729 is "r1729" (file paths are not needed because Subversion uses global revision numbers). In other systems, there is usually a standard syntax for expressing the changeset name. Whatever the appropriate syntax is for your system, encourage people to use it when referring to changes. Consistent expression of change names makes project bookkeeping much easier (as we will see in Chapitre 6, Communications and Chapitre 7, Packaging, Releasing, and Daily Development), and since a lot of the bookkeeping will be done by volunteers, it needs to be as easy as possible.

See also la section intitulée « Releases and Daily Development » in Chapitre 7, Packaging, Releasing, and Daily Development.

Authorization

Most version control systems offer a feature whereby certain people can be allowed or disallowed from committing in specific sub-areas of the repository. Following the principle that when handed a hammer, people start looking around for nails, many projects use this feature with abandon, carefully granting people access to just those areas where they have been approved to commit, and making sure they can't commit anywhere else. (See la section intitulée « Committers » in Chapitre 8, Managing Volunteers for how projects decide who can commit where.)

There is probably little harm done by exercising such tight control, but a more relaxed policy is fine too. Some projects simply use an honor system: when a person is granted commit access, even for a sub-area of the repository, what they actually receive is a password that allows them to commit anywhere in the project. They're just asked to keep their commits in their area. Remember that there is no real risk here: in an active project, all commits are reviewed anyway. If someone commits where they're not supposed to, others will notice it and say something. If a change needs to be undone, that's simple enough—everything's under version control anyway, so just revert.

There are several advantages to the relaxed approach. First, as developers expand into other areas (which they usually will if they stay with the project), there is no administrative overhead to granting them wider privileges. Once the decision is made, the person can just start committing in the new area right away.

Second, expansion can be done in a more fine-grained manner. Generally, a committer in area X who wants to expand to area Y will start posting patches against Y and asking for review. If someone who already has commit access to area Y sees such a patch and approves of it, they can just tell the submitter to commit the change directly (mentioning the reviewer/approver's name in the log message, of course). That way, the commit will come from the person who actually wrote the change, which is preferable from both an information management standpoint and from a crediting standpoint.

Last, and perhaps most important, using the honor system encourages an atmosphere of trust and mutual respect. Giving someone commit access to a subdomain is a statement about their technical preparedness—it says: "We see you have expertise to make commits in a certain domain, so go for it." But imposing strict authorization controls says: "Not only are we asserting a limit on your expertise, we're also a bit suspicious about your intentions." That's not the sort of statement you want to make if you can avoid it. Bringing someone into the project as a committer is an opportunity to initiate them into a circle of mutual trust. A good way to do that is to give them more power than they're supposed to use, then inform them that it's up to them to stay within the stated limits.

The Subversion project has operated on the honor system way for more than four years, with 33 full and 43 partial committers as of this writing. The only distinction the system actually enforces is between committers and non-committers; further subdivisions are maintained solely by humans. Yet we've never had a problem with someone deliberately committing outside their domain. Once or twice there's been an innocent misunderstanding about the extent of someone's commit privileges, but it's always been resolved quickly and amiably.

Obviously, in situations where self-policing is impractical, you must rely on hard authorization controls. But such situations are rare. Even when there are millions of lines of code and hundreds or thousands of developers, a commit to any given code module should still be reviewed by those who work on that module, and they can recognize if someone committed there who wasn't supposed to. If regular commit review isn't happening, then the project has bigger problems to deal with than the authorization system anyway.

In summary, don't spend too much time fiddling with the version control authorization system, unless you have a specific reason to. It usually won't bring much tangible benefit, and there are advantages to relying on human controls instead.

None of this should be taken to mean that the restrictions themselves are unimportant, of course. It would be bad for a project to encourage people to commit in areas where they're not qualified. Furthermore, in many projects, full (unrestricted) commit access has a special status: it implies voting rights on project-wide questions. This political aspect of commit access is discussed more in la section intitulée « Who Votes? » in Chapitre 4, Social and Political Infrastructure.

Bug Tracker

Bug tracking is a broad topic; various aspects of it are discussed throughout this book. Here I'll try to concentrate mainly on setup and technical considerations, but to get to those, we have to start with a policy question: exactly what kind of information should be kept in a bug tracker?

The term bug tracker is misleading. Bug tracking systems are also frequently used to track new feature requests, one-time tasks, unsolicited patches—really anything that has distinct beginning and end states, with optional transition states in between, and that accrues information over its lifetime. For this reason, bug trackers are also called issue trackers, defect trackers, artifact trackers, request trackers, trouble ticket systems, etc. See Annexe B, Free Bug Trackers for a list of software.

In this book, I'll continue to use "bug tracker" for the software that does the tracking, because that's what most people call it, but will use issue to refer to a single item in the bug tracker's database. This allows us to distinguish between the behavior or misbehavior that the user encountered (that is, the bug itself), and the tracker's record of the bug's discovery, diagnosis, and eventual resolution. Keep in mind that although most issues are about actual bugs, issues can be used to track other kinds of tasks too.

The classic issue life cycle looks like this:

  1. Someone files the issue. They provide a summary, an initial description (including a reproduction recipe, if applicable; see la section intitulée « Treat Every User as a Potential Volunteer » in Chapitre 8, Managing Volunteers for how to encourage good bug reports), and whatever other information the tracker asks for. The person who files the issue may be totally unknown to the project—bug reports and feature requests are as likely to come from the user community as from the developers.

    Once filed, the issue is in what's called an open state. Because no action has been taken yet, some trackers also label it as unverified and/or unstarted. It is not assigned to anyone; or, in some systems, it is assigned to a fake user to represent the lack of real assignation. At this point, it is in a holding area: the issue has been recorded, but not yet integrated into the project's consciousness.

  2. Others read the issue, add comments to it, and perhaps ask the original filer for clarification on some points.

  3. The bug gets reproduced. This may be the most important moment in its life cycle. Although the bug is not actually fixed yet, the fact that someone besides the original filer was able to make it happen proves that it is genuine, and, no less importantly, confirms to the original filer that they've contributed to the project by reporting a real bug.

  4. The bug gets diagnosed: its cause is identified, and if possible, the effort required to fix it is estimated. Make sure these things get recorded in the issue; if the person who diagnosed the bug suddenly has to step away from the project for a while (as can often happen with volunteer developers), someone else should be able to pick up where she left off.

    In this stage, or sometimes the previous one, a developer may "take ownership" of the issue and assign it to herself (la section intitulée « Distinguish clearly between inquiry and assignment » in Chapitre 8, Managing Volunteers examines the assignment process in more detail). The issue's priority may also be set at this stage. For example, if it is so severe that it should delay the next release, that fact needs to be identified early, and the tracker should have some way of noting it.

  5. The issue gets scheduled for resolution. Scheduling doesn't necessarily mean naming a date by which it will be fixed. Sometimes it just means deciding which future release (not necessarily the next one) the bug should be fixed by, or deciding that it need not block any particular release. Scheduling may also be dispensed with, if the bug is quick to fix.

  6. The bug gets fixed (or the task completed, or the patch applied, or whatever). The change or set of changes that fixed it should be recorded in a comment in the issue, after which the issue is closed and/or marked as resolved.

There are some common variations on this life cycle. Sometimes an issue is closed very soon after being filed, because it turns out not to be a bug at all, but rather a misunderstanding on the part of the user. As a project acquires more users, more and more such invalid issues will come in, and developers will close them with increasingly short-tempered responses. Try to guard against the latter tendency. It does no one any good, as the individual user in each case is not responsible for all the previous invalid issues; the statistical trend is visible only from the developers' point of view, not the user's. (In la section intitulée « Pre-Filtering the Bug Tracker » later in this chapter, we'll look at techniques for reducing the number of invalid issues.) Also, if different users are experiencing the same misunderstanding over and over, it might mean that that aspect of the software needs to be redesigned. This sort of pattern is easiest to notice when there is an issue manager monitoring the bug database; see la section intitulée « Issue Manager » in Chapitre 8, Managing Volunteers.

Another common life cycle variation is for the issue to be closed as a duplicate soon after Step 1. A duplicate is when someone files an issue that's already known to the project. Duplicates are not confined to open issues: it's possible for a bug to come back after having been fixed (this is known as a regression), in which case the preferred course is usually to reopen the original issue and close any new reports as duplicates of the original one. The bug tracking system should keep track of this relationship bidirectionally, so that reproduction information in the duplicates is available to the original issue, and vice versa.

A third variation is for the developers to close the issue, thinking they have fixed it, only to have the original reporter reject the fix and reopen it. This is usually because the developers simply don't have access to the environment necessary to reproduce the bug, or because they didn't test the fix using the exact same reproduction recipe as the reporter.

Aside from these variations, there may be other small details of the life cycle that vary depending on the tracking software. But the basic shape is the same, and while the life cycle itself is not specific to open source software, it has implications for how open source projects use their bug trackers.

As Step 1 implies, the tracker is as much a public face of the project as the mailing lists or web pages. Anyone may file an issue, anyone may look at an issue, and anyone may browse the list of currently open issues. It follows that you never know how many people are waiting to see progress on a given issue. While the size and skill of the development community constrains the rate at which issues can be resolved, the project should at least try to acknowledge each issue the moment it appears. Even if the issue lingers for a while, a response encourages the reporter to stay involved, because she feels that a human has registered what she has done (remember that filing an issue usually involves more effort than, say, posting an email). Furthermore, once an issue is seen by a developer, it enters the project's consciousness, in the sense that that developer can be on the lookout for other instances of the issue, can talk about it with other developers, etc.

The need for timely reactions implies two things:

  • The tracker must be connected to a mailing list, such that every change to an issue, including its initial filing, causes a mail to go out describing what happened. This mailing list is usually different from the regular development list, since not all developers may want to receive automated bug mails, but (just as with commit mails) the Reply-to header should be set to the development mailing list.

  • The form for filing issues should capture the reporter's email address, so she can be contacted for more information. (However, it should not require the reporter's email address, as some people prefer to report issues anonymously. See la section intitulée « Anonymity and involvement » later in this chapter for more on the importance of anonymity.)

Interaction with Mailing Lists

Make sure the bug tracker doesn't turn into a discussion forum. Although it is important to maintain a human presence in the bug tracker, it is not fundamentally suited to real-time discussion. Think of it rather as an archiver, a way to organize facts and references to other discussions, primarily those that take place on mailing lists.

There are two reasons to make this distinction. First, the bug tracker is more cumbersome to use than the mailing lists (or than real-time chat forums, for that matter). This is not because bug trackers have bad user interface design, it's just that their interfaces were designed for capturing and presenting discrete states, not free-flowing discussions. Second, not everyone who should be involved in discussing a given issue is necessarily watching the bug tracker. Part of good issue management (see la section intitulée « Share Management Tasks as Well as Technical Tasks » in Chapitre 8, Managing Volunteers) is to make sure each issue is brought to the right peoples' attention, rather than requiring every developer to monitor all issues. In la section intitulée « No Conversations in the Bug Tracker » in Chapitre 6, Communications, we'll look at ways to make sure people don't accidentally siphon discussions out of appropriate forums and into the bug tracker.

Some bug trackers can monitor mailing lists and automatically log all emails that are about a known issue. Typically they do this by recognizing the issue's identifying number in the subject line of the mail, as part of a special string; developers learn to include these strings in their mails to attract the tracker's notice. The bug tracker may either save the entire email, or (even better) just record a link to the mail in the regular mailing list archive. Either way, this is a very useful feature; if your tracker has it, make sure both to turn it on and to remind people to take advantage of it.

Pre-Filtering the Bug Tracker

Most issue databases eventually suffer from the same problem: a crushing load of duplicate or invalid issues filed by well-meaning but inexperienced or ill-informed users. The first step in combatting this trend is usually to put a prominent notice on the front page of the bug tracker, explaining how to tell if a bug is really a bug, how to search to see if it's already been filed, and finally, how to effectively report it if one still thinks it's a new bug.

This will reduce the noise level for a while, but as the number of users increases, the problem will eventually come back. No individual user can be blamed for it. Each one is just trying to contribute to the project's well-being, and even if their first bug report isn't helpful, you still want to encourage them to stay involved and file better issues in the future. In the meantime, though, the project needs to keep the issue database as free of junk as possible.

The two things that will do the most to prevent this problem are: making sure there are people watching the bug tracker who have enough knowledge to close issues as invalid or duplicates the moment they come in, and requiring (or strongly encouraging) users to confirm their bugs with other people before filing them in the tracker.

The first technique seems to be used universally. Even projects with huge issue databases (say, the Debian bug tracker at http://bugs.debian.org/, which contained 315,929 issues as of this writing) still arrange things so that someone sees each issue that comes in. It may be a different person depending on the category of the issue. For example, the Debian project is a collection of software packages, so Debian automatically routes each issue to the appropriate package maintainers. Of course, users can sometimes misidentify an issue's category, with the result that the issue is sent to the wrong person initially, who may then have to reroute it. However, the important thing is that the burden is still shared—whether the user guesses right or wrong when filing, issue watching is still distributed more or less evenly among the developers, so each issue is able to receive a timely response.

The second technique is less widespread, probably because it's harder to automate. The essential idea is that every new issue gets "buddied" into the database. When a user thinks he's found a problem, he is asked to describe it on one of the mailing lists, or in an IRC channel, and get confirmation from someone that it is indeed a bug. Bringing in that second pair of eyes early can prevent a lot of spurious reports. Sometimes the second party is able to identify that the behavior is not a bug, or is fixed in recent releases. Or she may be familiar with the symptoms from a previous issue, and can prevent a duplicate filing by pointing the user to the older issue. Often it's enough just to ask the user "Did you search the bug tracker to see if it's already been reported?" Many people simply don't think of that, yet are happy to do the search once they know someone's expecting them to.

The buddy system can really keep the issue database clean, but it has some disadvantages too. Many people will file solo anyway, either through not seeing, or through disregarding, the instructions to find a buddy for new issues. Thus it is still necessary for volunteers to watch the issue database. Furthermore, because most new reporters don't understand how difficult the task of maintaining the issue database is, it's not fair to chide them too harshly for ignoring the guidelines. Thus the volunteers must be vigilant, and yet exercise restraint in how they bounce unbuddied issues back to their reporters. The goal is to train each reporter to use the buddying system in the future, so that there is an ever-growing pool of people who understand the issue-filtering system. On seeing an unbuddied issue, the ideal steps are:

  1. Immediately respond to the issue, politely thanking the user for filing, but pointing them to the buddying guidelines (which should, of course, be prominently posted on the web site).

  2. If the issue is clearly valid and not a duplicate, approve it anyway, and start it down the normal life cycle. After all, the reporter's now been informed about buddying, so there's no point wasting the work done so far by closing a valid issue.

  3. Otherwise, if the issue is not clearly valid, close it, but ask the reporter to reopen it if they get confirmation from a buddy. When they do, they should put a reference to the confirmation thread (e.g., a URL into the mailing list archives).

Remember that although this system will improve the signal/noise ratio in the issue database over time, it will never completely stop the misfilings. The only way to prevent misfilings entirely is to close off the bug tracker to everyone but developers—a cure that is almost always worse than the disease. It's better to accept that cleaning out invalid issues will always be part of the project's routine maintenance, and to try to get as many people as possible to help.

See also la section intitulée « Issue Manager » in Chapitre 8, Managing Volunteers.

IRC / Real-Time Chat Systems

Many projects offer real-time chat rooms using Internet Relay Chat (IRC), forums where users and developers can ask each other questions and get instant responses. While you can run an IRC server from your own web site, it is generally not worth the hassle. Instead, do what everyone else does: run your IRC channels at Freenode (http://freenode.net/). Freenode gives you the control you need to administer your project's IRC channels,[16] while sparing you the not-insignificant trouble of maintaining an IRC server yourself.

The first thing to do is choose a channel name. The most obvious choice is the name of your project—if that's available at Freenode, then use it. If not, try to choose something as close to your project's name, and as easy to remember, as possible. Advertise the channel's availabity from your project's web site, so a visitor with a quick question will see it right away. For example, this appears in a prominently placed box at the top of Subversion's home page:

If you're using Subversion, we recommend that you join the users@subversion.tigris.org mailing list, and read the Subversion Book and FAQ. You can also ask questions on IRC at irc.freenode.net channel #svn.

Some projects have multiple channels, one per subtopic. For example, one channel for installation problems, another for usage questions, another for development chat, etc. (la section intitulée « Handling Growth » in Chapitre 6, Communications discusses and how to divide into multiple channels). When your project is young, there should only be one channel, with everyone talking together. Later, as the user-to-developer ratio increases, separate channels may become necessary.

How will people know all the available channels, let alone which channel to talk in? And when they talk, how will they know what the local conventions are?

The answer is to tell them by setting the channel topic.[17] The channel topic is a brief message each user sees when they first enter the channel. It gives quick guidance to newcomers, and pointers to further information. For example:

You are now talking on #svn

Topic for #svn is Forum for Subversion user questions, see also
http://subversion.tigris.org/. || Development discussion happens in
#svn-dev. || Please don't paste long transcripts here, instead use
a pastebin site like http://pastebin.ca/. || NEWS: Subversion 1.1.0
is released, see http://svn110.notlong.com/ for details.

That's terse, but it tells newcomers what they need to know. It says exactly what the channel is for, gives the project home page (in case someone wanders into the channel without having first been to the project web site), mentions a related channel, and gives some guidance about pasting.

Bots

Many technically-oriented IRC channels have a non-human member, a so-called bot, that is capable of storing and regurgitating information in response to specific commands. Typically, the bot is addressed just like any other member of the channel, that is, the commands are delivered by "speaking to" the bot. For example:

<kfogel> ayita: learn diff-cmd = http://subversion.tigris.org/faq.html#diff-cmd
<ayita>  Thanks!

That told the bot (who is logged into the channel as ayita) to remember a certain URL as the answer to the query "diff-cmd". Now we can address ayita, asking the bot to tell another user about diff-cmd:

<kfogel> ayita: tell jrandom about diff-cmd
<ayita>  jrandom: http://subversion.tigris.org/faq.html#diff-cmd

The same thing can be accomplished via a convenient shorthand:

<kfogel> !a jrandom diff-cmd
<ayita>  jrandom: http://subversion.tigris.org/faq.html#diff-cmd

The exact command set and behaviors differ from bot to bot. The above example is with ayita (http://hix.nu/svn-public/alexis/trunk/), of which there is usually an instance running in #svn at freenode. Other bots include Dancer (http://dancer.sourceforge.net/) and Supybot (http://supybot.com/). Note that no special server privileges are required to run a bot. A bot is a client program; anyone can set one up and direct it to listen to a particular server/channel.

If your channel tends to get the same questions over and over, I highly recommend setting up a bot. Only a small percentage of channel users will acquire the expertise needed to manipulate the bot, but those users will answer a disproportionately high percentage of questions, because the bot enables them to respond so much more efficiently.

Archiving IRC

Although it is possible to archive everything that happens in an IRC channel, it's not necessarily expected. IRC conversations may be nominally public, but many people think of them as informal, semi-private conversations. Users may be careless with grammar, and often express opinions (for example, about other software or other programmers) that they wouldn't want preserved forever in an online archive.

Of course, there will sometimes be excerpts that should be preserved, and that's fine. Most IRC clients can log a conversation to a file at the user's request, or failing that, one can always just cut and paste the conversation from IRC into a more permanent forum (most often the bug tracker). But indiscriminate logging may make some users uneasy. If you do archive everything, make sure you state so clearly in the channel topic, and give a URL to the archive.

Wikis

A wiki is a web site that allows any visitor to edit or extend its content; the term "wiki" (from a Hawaiian word meaning "quick" or "super-fast") is also used to refer to the software that enables such editing. Wikis were invented in 1995, but their popularity has really started to take off since 2000 or 2001, boosted partly by the success of Wikipedia (http://www.wikipedia.org/), a wiki-based free-content encyclopedia. Think of a wiki as falling somewhere between IRC and web pages: wikis don't happen in realtime, so people get a chance to ponder and polish their contributions, but they are also very easy to add to, involving less interface overhead than editing a regular web page.

Wikis are not yet standard equipment for open source projects, but they probably will be soon. As they are relatively new technology, and people are still experimenting with different ways of using them, I will just offer a few words of caution here—at this stage, it's easier to analyze misuses of wikis than to analyze their successes.

If you decide to run a wiki, put a lot of effort into having a clear page organization and pleasing visual layout, so that visitors (i.e., potential editors) will instinctively know how to fit in their contributions. Equally important, post those standards on the wiki itself, so people have somewhere to go for guidance. Too often, wiki administrators fall victim to the fantasy that because hordes of visitors are individually adding high quality content to the site, the sum of all these contributions must therefore also be of high quality. That's not how web sites work. Each individual page or paragraph may be good when considered by itself, but it will not be good if embedded in a disorganized or confusing whole. Too often, wikis suffer from:

  • Lack of navigational principles. A well-organized web site makes visitors feel like they know where they are at any time. For example, if the pages are well-designed, people can intuitively tell the difference between a "table of contents" region and a "content" region. Contributors to a wiki will respect such differences too, but only if the differences are present to begin with.

  • Duplication of information. Wikis frequently end up with different pages saying similar things, because the individual contributors did not notice the duplications. This can be partly a consequence of the lack of navigational principles noted above, in that people may not find the duplicate content if it is not where they expect it to be.

  • Inconsistent target audience. To some degree this problem is inevitable when there are so many authors, but it can be lessened if there are written guidelines about how to create new content. It also helps to aggressively edit new contributions at the beginning, as an example, so that the standards start to sink in.

The common solution to all these problems is the same: have editorial standards, and demonstrate them not only by posting them, but by editing pages to adhere to them. In general, wikis will amplify any failings in their original material, since contributors imitate whatever patterns they see in front of them. Don't just set up the wiki and hope everything falls into place. You must also prime it with well-written content, so people have a template to follow.

The shining example of a well-run wiki is Wikipedia, though this may be partly because the content (encyclopedia entries) is naturally well-suited to the wiki format. But if you examine Wikipedia closely, you'll see that its administrators laid a very thorough foundation for cooperation. There is extensive documentation on how to write new entries, how to maintain an appropriate point of view, what sorts of edits to make, what edits to avoid, a dispute resolution process for contested edits (involving several stages, including eventual arbitration), and so forth. They also have authorization controls, so that if a page is the target of repeated inappropriate edits, they can lock it down until the problem is resolved. In other words, they didn't just throw some templates onto a web site and hope for the best. Wikipedia works because its founders thought carefully about how to get thousands of strangers to tailor their writing to a common vision. While you may not need the same level of preparedness to run a wiki for a free software project, the spirit is worth emulating.

For more information about wikis, see http://en.wikipedia.org/wiki/Wiki. Also, the first wiki remains alive and well, and contains a lot of discussion about running wikis: see http://www.c2.com/cgi/wiki?WelcomeVisitors, http://www.c2.com/cgi/wiki?WhyWikiWorks, and http://www.c2.com/cgi/wiki?WhyWikiWorksNot for various points of view.

Web Site

There is not much to say about setting up the project web site from a technical point of view: setting up a web server and writing web pages are fairly simple tasks, and most of the important things to say about layout and arrangement were covered in the previous chapter. The web site's main function is to present a clear and welcoming overview of the project, and to bind together the other tools (the version control system, bug tracker, etc.). If you don't have the expertise to set up a web server yourself, it's usually not hard to find someone who does and is willing to help out. Nonetheless, to save time and effort, people often prefer to use one of the canned hosting sites.

Canned Hosting

There are two main advantages to using a canned site. The first is server capacity and bandwidth: their servers are beefy boxes sitting on really fat pipes. No matter how successful your project gets, you're not going to run out of disk space or swamp the network connection. The second advantage is simplicity. They have already chosen a bug tracker, a version control system, a mailing list manager, an archiver, and everything else you need to run a site. They've configured the tools, and are taking care of backups for all the data stored in the tools. You don't need to make many decisions. All you have to do is fill in a form, press a button, and suddenly you've got a project web site.

These are pretty significant benefits. The disadvantage, of course, is that you must accept their choices and configurations, even if something different would be better for your project. Usually canned sites are adjustable within certain narrow parameters, but you will never get the fine-grained control you would have if you set up the site yourself and had full administrative access to the server.

A perfect example of this is the handling of generated files. Certain project web pages may be generated files—for example, there are systems for keeping FAQ data in an easy-to-edit master format, from which HTML, PDF, and other presentation formats can be generated. As explained in la section intitulée « Version everything » earlier in this chapter, you wouldn't want to version the generated formats, only the master file. But when your web site is hosted on someone else's server, it may be impossible to set up a custom hook to regenerate the online HTML version of the FAQ whenever the master file is changed. The only workaround is to version the generated formats too, so that they show up on the web site.

There can be larger consequences as well. You may not have as much control over presentation as you would wish. Some of the canned hosting sites allow you to customize your web pages, but the site's default layout usually ends up showing through in various awkward ways. For example, some projects that host themselves at SourceForge have completely customized home pages, but still point developers to their "SourceForge page" for more information. The SourceForge page is what would be the project's home page, had the project not used a custom home page. The SourceForge page has links to the bug tracker, the CVS repository, downloads, etc. Unfortunately, a SourceForge page also contains a great deal of extraneous noise. The top is a banner ad, often an animated image. The left side is a vertical arrangement of links of little relevance to someone interested in the project. The right side is often another advertisement. Only the center of the page is devoted to truly project-specific material, and even that is arranged in a confusing way that often makes visitors unsure of what to click on next.

Behind every individual aspect of SourceForge's design, there is no doubt a good reason—good from SourceForge's point of view, such as the advertisements. But from an individual project's point of view, the result can be a less-than-ideal web page. I don't mean to pick on SourceForge; similar concerns apply to many of the canned hosting sites. The point is that there's a tradeoff. You get relief from the technical burdens of running a project site, but only at the price of accepting someone else's way of running it.

Only you can decide whether canned hosting is best for your project. If you choose a canned site, leave open the option of switching to your own servers later, by using a custom domain name for the project's "home address". You can forward the URL to the canned site, or have a fully customized home page at the public URL and hand users off to the canned site for sophisticated functionality. Just make sure to arrange things such that if you later decide to use a different hosting solution, the project's address doesn't need to change.

Choosing a canned hosting site

The largest and most well-known hosting site is SourceForge. Two other sites providing the same or similar services are savannah.gnu.org and BerliOS.de. A few organizations, such as the Apache Software Foundation and Tigris.org[18], give free hosting to open source projects that fit well with their missions and their community of existing projects.

Haggen So did a thorough evaluation of various canned hosting sites, as part of the research for his Ph.D. thesis, Construction of an Evaluation Model for Free/Open Source Project Hosting (FOSPHost) sites. The results are at http://www.ibiblio.org/fosphost/, and see especially the very readable comparison chart at http://www.ibiblio.org/fosphost/exhost.htm.

Anonymity and involvement

A problem that is not strictly limited to the canned sites, but is most often found there, is the abuse of user login functionality. The functionality itself is simple enough: the site allows each visitor to register herself with a username and password. From then on it keeps a profile for that user, and project administrators can assign the user certain permissions, for example, the right to commit to the repository.

This can be extremely useful, and in fact it's one of the prime advantages of canned hosting. The problem is that sometimes user login ends up being required for tasks that ought to be permitted to unregistered visitors, specifically the ability to file issues in the bug tracker, and to comment on existing issues. By requiring a logged-in username for such actions, the project raises the involvement bar for what should be quick, convenient tasks. Of course, one wants to be able to contact someone who's entered data into the issue tracker, but having a field where she can enter her email address (if she wants to) is sufficient. If a new user spots a bug and wants to report it, she'll only be annoyed at having to fill out an account creation form before she can enter the bug into the tracker. She may simply decide not to file the bug at all.

The advantages of user management generally outweigh the disadvantages. But if you can choose which actions can be done anonymously, make sure not only that all read-only actions are permitted to non-logged-in visitors, but also some data entry actions, especially in the bug tracker and, if you have them, wiki pages.



[13] Peu après la publication de ce livre, Michael Bernstein m’a écrit pour m’informer que « Mutt n’est pas le seul logiciel de messagerie à proposer la fonction « Répondre à la liste ». Par exemple Evolution le propose grâce à un raccourci clavier, mais pas avec un bouton (Ctrl+L). »

[15] For a different opinion on the question of versioning configure files, see Alexey Makhotkin's post "configure.in and version control" at http://versioncontrolblog.com/2007/01/08/configurein-and-version-control/.

[16] There is no requirement or expectation that you donate to Freenode, but if you or your project can afford it, please consider a contribution. They are a tax-exempt charity in the U.S., and they perform a valuable service.

[17] To set a channel topic, use the /topic command. All commands in IRC start with "/". See http://www.irchelp.org/ if you're not familiar with IRC usage and administration; in particular, http://www.irchelp.org/irchelp/irctutorial.html is an excellent tutorial.

[18] Disclaimer: I am employed by CollabNet, which sponsors Tigris.org, and I use Tigris regularly.

Chapitre 4. Social and Political Infrastructure

The first questions people usually ask about free software are "How does it work? What keeps a project running? Who makes the decisions?" I'm always dissatisfied with bland responses about meritocracy, the spirit of cooperation, code speaking for itself, etc. The fact is, the question is not easy to answer. Meritocracy, cooperation, and running code are all part of it, but they do little to explain how projects actually run on a day-to-day basis, and say nothing about how conflicts are resolved.

This chapter tries to show the structural underpinnings successful projects have in common. I mean "successful" not just in terms of technical quality, but also operational health and survivability. Operational health is the project's ongoing ability to incorporate new code contributions and new developers, and to be responsive to incoming bug reports. Survivability is the project's ability to exist independently of any individual participant or sponsor—think of it as the likelihood that the project would continue even if all of its founding members were to move on to other things. Technical success is not hard to achieve, but without a robust developer base and social foundation, a project may be unable to handle the growth that initial success brings, or the departure of charismatic individuals.

There are various ways to achieve this kind of success. Some involve a formal governance structure, by which debates are resolved, new developers are invited in (and sometimes out), new features planned, and so on. Others involve less formal structure, but more conscious self-restraint, to produce an atmosphere of fairness that people can rely on as a de facto form of governance. Both ways lead to the same result: a sense of institutional permanence, supported by habits and procedures that are well understood by everyone who participates. These features are even more important in self-organizing systems than in centrally-controlled ones, because in self-organizing systems, everyone is conscious that a few bad apples can spoil the whole barrel, at least for a while.

Forkability

The indispensable ingredient that binds developers together on a free software project, and makes them willing to compromise when necessary, is the code's forkability: the ability of anyone to take a copy of the source code and use it to start a competing project, known as a fork. The paradoxical thing is that the possibility of forks is usually a much greater force in free software projects than actual forks, which are very rare. Because a fork is bad for everyone (for reasons examined in detail in la section intitulée « Forks » in Chapitre 8, Managing Volunteers), the more serious the threat of a fork becomes, the more willing people are to compromise to avoid it.

Forks, or rather the potential for forks, are the reason there are no true dictators in free software projects. This may seem like a surprising claim, considering how common it is to hear someone called the "dictator" or "tyrant" in a given open source project. But this kind of tyranny is special, quite different from the conventional understanding of the word. Imagine a king whose subjects could copy his entire kingdom at any time and move to the copy to rule as they see fit. Would not such a king govern very differently from one whose subjects were bound to stay under his rule no matter what he did?

This is why even projects that are not formally organized as democracies are, in practice, democracies when it comes to important decisions. Replicability implies forkability; forkability implies consensus. It may well be that everyone is willing to defer to one leader (the most famous example being Linus Torvalds in Linux kernel development), but this is because they choose to do so, in an entirely non-cynical and non-sinister way. The dictator has no magical hold over the project. A key property of all open source licenses is that they do not give one party more power than any other in deciding how the code can be changed or used. If the dictator were to suddenly start making bad decisions, there would be restlessness, followed eventually by revolt and a fork. Except, of course, things rarely get that far, because the dictator compromises first.

But just because forkability puts an upper limit on how much power anyone can exert in a project doesn't mean there aren't important differences in how projects are governed. You don't want every decision to come down to the last-resort question of who is considering a fork. That would get tiresome very quickly, and sap energy away from real work. The next two sections examine different ways to organize projects such that most decisions go smoothly. These two examples are somewhat idealized extremes; many projects fall somewhere along a continuum between them.

Benevolent Dictators

The benevolent dictator model is exactly what it sounds like: final decision-making authority rests with one person, who, by virtue of personality and experience, is expected to use it wisely.

Although "benevolent dictator" (or BD)is the standard term for this role, it would be better to think of it as "community-approved arbitrator" or "judge". Generally, benevolent dictators do not actually make all the decisions, or even most of the decisions. It's unlikely that one person could have enough expertise to make consistently good decisions across all areas of the project, and anyway, quality developers won't stay around unless they have some influence on the project's direction. Therefore, benevolent dictators commonly do not dictate much. Instead, they let things work themselves out through discussion and experimentation whenever possible. They participate in those discussions themselves, but as regular developers, often deferring to an area maintainer who has more expertise. Only when it is clear that no consensus can be reached, and that most of the group wants someone to guide the decision so that development can move on, do they put their foot down and say "This is the way it's going to be." Reluctance to make decisions by fiat is a trait shared by virtually all successful benevolent dictators; it is one of the reasons they manage to keep the role.

Who Can Be a Good Benevolent Dictator?

Being a BD requires a combination of traits. It needs, first of all, a well-honed sensitivity to one's own influence in the project, which in turn brings self-restraint. In the early stages of a discussion, one should not express opinions and conclusions with so much certainty that others feel like it's pointless to dissent. People must be free to air ideas, even stupid ideas. It is inevitable that the BD will post a stupid idea from time to time too, of course, and therefore the role also requires an ability to recognize and acknowledge when one has made a bad decision—though this is simply a trait that any good developer should have, especially if she stays with the project a long time. But the difference is that the BD can afford to slip from time to time without worrying about long-term damage to her credibility. Developers with less seniority may not feel so secure, so the BD should phrase critiques or contrary decisions with some sensitivity for how much weight her words carry, both technically and psychologically.

The BD does not need to have the sharpest technical skills of anyone in the project. She must be skilled enough to work on the code herself, and to understand and comment on any change under consideration, but that's all. The BD position is neither acquired nor held by virtue of intimidating coding skills. What is important is experience and overall design sense—not necessarily the ability to produce good design on demand, but the ability to recognize good design, whatever its source.

It is common for the benevolent dictator to be a founder of the project, but this is more a correlation than a cause. The sorts of qualities that make one able to successfully start a project—technical competence, ability to persuade other people to join, etc.—are exactly the qualities any BD would need. And of course, founders start out with a sort of automatic seniority, which can often be enough to make benevolent dictatorship appear the path of least resistance for all concerned.

Remember that the potential to fork goes both ways. A BD can fork a project just as easily as anyone else, and some have occasionally done so, when they felt that the direction they wanted to take the project was different from where the majority of other developers wanted to go. Because of forkability, it does not matter whether the benevolent dictator has root (system administrator privileges) on the project's main servers or not. People sometimes talk of server control as though it were the ultimate source of power in a project, but in fact it is irrelevant. The ability to add or remove people's commit passwords on one particular server affects only the copy of the project that resides on that server. Prolonged abuse of that power, whether by the BD or someone else, would simply lead to development moving to a different server.

Whether your project should have a benevolent dictator, or would run better with some less centralized system, largely depends on who is available to fill the role. As a general rule, if it's simply obvious to everyone who should be the BD, then that's the way to go. But if no candidate for BD is immediately obvious, then the project should probably use a decentralized decision-making process, as described in the next section.

Consensus-based Democracy

As projects get older, they tend to move away from the benevolent dictatorship model and toward more openly democratic systems. This is not necessarily out of dissatisfaction with a particular BD. It's simply that group-based governance is more "evolutionarily stable", to borrow a biological metaphor. Whenever a benevolent dictator steps down, or attempts to spread decision-making responsibility more evenly, it is an opportunity for the group to settle on a new, non-dictatorial system—establish a constitution, as it were. The group may not take this opportunity the first time, or the second, but eventually they will; once they do, the decision is unlikely ever to be reversed. Common sense explains why: if a group of N people were to vest one person with special power, it would mean that N - 1 people were each agreeing to decrease their individual influence. People usually don't want to do that. Even if they did, the resulting dictatorship would still be conditional: the group anointed the BD, clearly the group could depose the BD. Therefore, once a project has moved from leadership by a charismatic individual to a more formal, group-based system, it rarely moves back.

The details of how these systems work vary widely, but there are two common elements: one, the group works by consensus most of the time; two, there is a formal voting mechanism to fall back on when consensus cannot be reached.

Consensus merely means an agreement that everyone is willing to live with. It is not an ambiguous state: a group has reached consensus on a given question when someone proposes that consensus has been reached, and no one contradicts the assertion. The person proposing consensus should, of course, state specifically what the consensus is, and what actions would be taken in consequence of it, if they're not obvious.

Most conversation in a project is on technical topics, such as the right way to fix a certain bug, whether or not to add a feature, how strictly to document interfaces, etc. Consensus-based governance works well because it blends seamlessly with the technical discussion itself. By the end of a discussion, there is often general agreement on what course to take. Someone will usually make a concluding post, which is simultaneously a summary of what has been decided and an implicit proposal of consensus. This provides a last chance for someone else to say "Wait, I didn't agree to that. We need to hash this out some more."

For small, uncontroversial decisions, the proposal of consensus is implicit. For example, when a developer spontaneously commits a bugfix, the commit itself is a proposal of consensus: "I assume we all agree that this bug needs to be fixed, and that this is the way to fix it." Of course, the developer does not actually say that; she just commits the fix, and the others in the project do not bother to state their agreement, because silence is consent. If someone commits a change that turns out not to have consensus, the result is simply for the project to discuss the change as though it had not already been committed. The reason this works is the topic of the next section.

Version Control Means You Can Relax

The fact that the project's source code is kept under version control means that most decisions can be easily unmade. The most common way this happens is that someone commits a change mistakenly thinking everyone would be happy with it, only to be met with objections after the fact. It is typical for such objections to start out with an obligatory apology for having missed out on prior discussion, though this may be omitted if the objector finds no record of such a discussion in the mailing list archives. Either way, there is no reason for the tone of the discussion to be different after the change has been committed than before. Any change can be reverted, at least until dependent changes are introduced (i.e., new code that would break if the original change were suddenly removed). The version control system gives the project a way to undo the effects of bad or hasty judgement. This, in turn, frees people to trust their instincts about how much feedback is necessary before doing something.

This also means that the process of establishing consensus need not be very formal. Most projects handle it by feel. Minor changes can go in with no discussion, or with minimal discussion followed by a few nods of agreement. For more significant changes, especially ones with the potential to destabilize a lot of code, people should wait a day or two before assuming there is consensus, the rationale being that no one should be marginalized in an important conversation simply because he didn't check email frequently enough.

Thus, when someone is confident he knows what needs to be done, he should just go ahead and do it. This applies not only to software fixes, but to web site updates, documentation changes, and anything else unlikely to be controversial. Usually there will be only a few instances where an action needs to be undone, and these can be handled on a case-by-case basis. Of course, one shouldn't encourage people to be headstrong. There is still a psychological difference between a decision under discussion and one that has already taken effect, even if it is technically reversible. People always feel that momentum is allied to action, and will be slightly more reluctant to revert a change than to prevent it in the first place. If a developer abuses this fact by committing potentially controversial changes too quickly, however, people can and should complain, and hold that developer to a stricter standard until things improve.

When Consensus Cannot Be Reached, Vote

Inevitably, some debates just won't consense. When all other means of breaking a deadlock fail, the solution is to vote. But before a vote can be taken, there must be a clear set of choices on the ballot. Here, again, the normal process of technical discussion blends serendipitously with the project's decision-making procedures. The kinds of questions that come to a vote often involve complex, multifaceted issues. In any such complex discussion, there are usually one or two people playing the role of honest broker: posting periodic summaries of the various arguments and keeping track of where the core points of disagreement (and agreement) lie. These summaries help everyone measure how much progress has been made, and remind everyone of what issues remain to be addressed. Those same summaries can serve as prototypes for a ballot sheet, should a vote become necessary. If the honest brokers have been doing their job well, they will be able to credibly call for a vote when the time comes, and the group will be willing to use a ballot sheet based on their summary of the issues. The brokers themselves may be participants in the debate; it is not necessary for them to remain above the fray, as long as they can understand and fairly represent others' views, and not let their partisan sentiments prevent them from summarizing the state of the debate in a neutral fashion.

The actual content of the ballot is usually not controversial. By the time matters reach a vote, the disagreement has usually boiled down to a few key issues, with recognizable labels and brief descriptions. Occasionally a developer will object to the form of the ballot itself. Sometimes his concern is legitimate, for example, that an important choice was left off or not described accurately. But other times a developer may be merely trying to stave off the inevitable, perhaps knowing that the vote probably won't go his way. See la section intitulée « Difficult People » in Chapitre 6, Communications for how to deal with this sort of obstructionism.

Remember to specify the voting system, as there are many different kinds, and people might make wrong assumptions about which procedure is being used. A good choice in most cases is approval voting, whereby each voter can vote for as many of the choices on the ballot as he likes. Approval voting is simple to explain and to count, and unlike some other methods, it only involves one round of voting. See http://en.wikipedia.org/wiki/Voting_system#List_of_systems for more details about approval voting and other voting systems, but try to avoid getting into a long debate about which voting system to use (because, of course, you will then find yourself in a debate about which voting system to use to decide the voting system!). One reason approval voting is a good choice is that it's very hard for anyone to object to—it's about as fair as a voting system can be.

Finally, conduct votes in public. There is no need for secrecy or anonymity in a vote on matters that have been debated publicly anyway. Have each participant post her votes to the project mailing list, so that any observer can tally and check the results for herself, and so that everything is recorded in the archives.

When To Vote

The hardest thing about voting is determining when to do it. In general, taking a vote should be very rare—a last resort for when all other options have failed. Don't think of voting as a great way to resolve debates. It isn't. It ends discussion, and thereby ends creative thinking about the problem. As long as discussion continues, there is the possibility that someone will come up with a new solution everyone likes. This happens surprisingly often: a lively debate can produce a new way of thinking about the problem, and lead to a proposal that eventually satisfies everyone. Even when no new proposal arises, it's still usually better to broker a compromise than to hold a vote. After a compromise, everyone is a little bit unhappy, whereas after a vote, some people are unhappy while others are happy. From a political standpoint, the former sitation is preferable: at least each person can feel he extracted a price for his unhappiness. He may be dissatisfied, but so is everyone else.

Voting's main advantage is that it finally settles a question so everyone can move on. But it settles it by a head count, instead of by rational dialogue leading everyone to the same conclusion. The more experienced people are with open source projects, the less eager I find them to be to settle questions by vote. Instead they will try to explore previously unconsidered solutions, or compromise more severely than they'd originally planned. Various techniques are available to prevent a premature vote. The most obvious is simply to say "I don't think we're ready for a vote yet," and explain why not. Another is to ask for an informal (non-binding) show of hands. If the response clearly tends toward one side or another, this will make some people suddenly more willing to compromise, obviating the need for a formal vote. But the most effective way is simply to offer a new solution, or a new viewpoint on an old suggestion, so that people re-engage with the issues instead of merely repeating the same arguments.

In certain rare cases, everyone may agree that all the compromise solutions are worse than any of the non-compromise ones. When that happens, voting is less objectionable, both because it is more likely to lead to a superior solution and because people will not be overly unhappy no matter how it turns out. Even then, the vote should not be rushed. The discussion leading up to a vote is what educates the electorate, so stopping that discussion early can lower the quality of the result.

(Note that this advice to be reluctant to call votes does not apply to the change-inclusion voting described in la section intitulée « Stabilizing a Release » in Chapitre 7, Packaging, Releasing, and Daily Development. There, voting is more of a communications mechanism, a means of registering one's involvement in the change review process so that everyone can tell how much review a given change has received.)

Who Votes?

Having a voting system raises the question of electorate: who gets to vote? This has the potential to be a sensitive issue, because it forces the project to officially recognize some people as being more involved, or as having better judgement, than others.

The best solution is to simply take an existing distinction, commit access, and attach voting privileges to it. In projects that offer both full and partial commit access, the question of whether partial committers can vote largely depends on the process by which partial commit access is granted. If the project hands it out liberally, for example as a way of maintaining many third-party contributed tools in the repository, then it should be made clear that partial commit access is really just about committing, not voting. The reverse implication naturally holds as well: since full committers will have voting privileges, they must be chosen not only as programmers, but as members of the electorate. If someone shows disruptive or obstructionist tendencies on the mailing list, the group should be very cautious about making him a committer, even if the person is technically skilled.

The voting system itself should be used to choose new committers, both full and partial. But here is one of the rare instances where secrecy is appropriate. You can't have votes about potential committers posted to a public mailing list, because the candidate's feelings (and reputation) could be hurt. Instead, the usual way is that an existing committer posts to a private mailing list consisting only of the other committers, proposing that someone be granted commit access. The other committers speak their minds freely, knowing the discussion is private. Often there will be no disagreement, and therefore no vote necessary. After waiting a few days to make sure every committer has had a chance to respond, the proposer mails the candidate and offers him commit access. If there is disagreement, discussion ensues as for any other question, possibly resulting in a vote. For this process to be open and frank, the mere fact that the discussion is taking place at all should be secret. If the person under consideration knew it was going on, and then were never offered commit access, he could conclude that he had lost the vote, and would likely feel hurt. Of course, if someone explicitly asks for commit access, then there is no choice but to consider the proposal and explicitly accept or reject him. If the latter, then it should be done as politely as possible, with a clear explanation: "We liked your patches, but haven't seen enough of them yet," or "We appreciate all your patches, but they required considerable adjustments before they could be applied, so we don't feel comfortable giving you commit access yet. We hope that this will change over time, though." Remember, what you're saying could come as a blow, depending on the person's level of confidence. Try to see it from their point of view as you write the mail.

Because adding a new committer is more consequential than most other one-time decisions, some projects have special requirements for the vote. For example, they may require that the proposal receive at least n positive votes and no negative votes, or that a supermajority vote in favor. The exact parameters are not important; the main idea is to get the group to be careful about adding new committers. Similar, or even stricter, special requirements can apply to votes to remove a committer, though hopefully that will never be necessary. See la section intitulée « Committers » in Chapitre 8, Managing Volunteers for more on the non-voting aspects of adding and removing committers.

Polls Versus Votes

For certain kinds of votes, it may be useful to expand the electorate. For example, if the developers simply can't figure out whether a given interface choice matches the way people actually use the software, one solution is to ask to all the subscribers of the project's mailing lists to vote. These are really polls rather than votes, but the developers may choose to treat the result as binding. As with any poll, be sure to make it clear to the participants that there's a write-in option: if someone thinks of a better option not offered in the poll questions, her response may turn out to be the most important result of the poll.

Vetoes

Some projects allow a special kind of vote known as a veto. A veto is a way for a developer to put a halt to a hasty or ill-considered change, at least long enough for everyone to discuss it more. Think of a veto as somewhere between a very strong objection and a filibuster. Its exact meaning varies from one project to another. Some projects make it very difficult to override a veto; others allow them to be overridden by regular majority vote, perhaps after an enforced delay for more discussion. Any veto should be accompanied by a thorough explanation; a veto without such an explanation should be considered invalid on arrival.

With vetoes comes the problem of veto abuse. Sometimes developers are too eager to raise the stakes by casting a veto, when really all that was called for was more discussion. You can prevent veto abuse by being very reluctant to use vetoes yourself, and by gently calling it out when someone else uses her veto too often. If necessary, you can also remind the group that vetoes are binding for only as long as the group agrees they are—after all, if a clear majority of developers wants X, then X is going to happen one way or another. Either the vetoing developer will back down, or the group will decide to weaken the meaning of a veto.

You may see people write "-1" to express a veto. This usage comes from the Apache Software Foundation, which has a highly structured voting and veto process, described at http://www.apache.org/foundation/voting.html. The Apache standards have spread to other projects, and you will see their conventions used to varying degrees in a lot of places in the open source world. Technically, "-1" does not always indicate a formal veto even according to the Apache standards, but informally it is usually taken to mean a veto, or at least a very strong objection.

Like votes, vetoes can apply retroactively. It's not okay to object to a veto on the grounds that the change in question has already been committed, or the action taken (unless it's something irrevocable, like putting out a press release). On the other hand, a veto that arrives weeks or months late isn't likely to be taken very seriously, nor should it be.

Writing It All Down

At some point, the number of conventions and agreements floating around in your project may become so great that you need to record it somewhere. In order to give such a document legitimacy, make it clear that it is based on mailing list discussions and on agreements already in effect. As you compose it, refer to the relevant threads in the mailing list archives, and whenever there's a point you're not sure about, ask again. The document should not contain any surprises: it is not the source of the agreements, it is merely a description of them. Of course, if it is successful, people will start citing it as a source of authority in itself, but that just means it reflects the overall will of the group accurately.

This is the document alluded to in la section intitulée « Le guide du développeur » in Chapitre 2, Genèse d'un projet. Naturally, when the project is very young, you will have to lay down guidelines without the benefit of a long project history to draw on. But as the development community matures, you can adjust the language to reflect the way things actually turn out.

Don't try to be comprehensive. No document can capture everything people need to know about participating in a project. Many of the conventions a project evolves remain forever unspoken, never mentioned explicitly, yet adhered to by all. Other things are simply too obvious to be mentioned, and would only distract from important but non-obvious material. For example, there's no point writing guidelines like "Be polite and respectful to others on the mailing lists, and don't start flame wars," or "Write clean, readable bug-free code." Of course these things are desirable, but since there's no conceivable universe in which they might not be desirable, they are not worth mentioning. If people are being rude on the mailing list, or writing buggy code, they're not going to stop just because the project guidelines said to. Such situations need to be dealt with as they arise, not by blanket admonitions to be good. On the other hand, if the project has specific guidelines about how to write good code, such as rules about documenting every API in a certain format, then those guidelines should be written down as completely as possible.

A good way to determine what to include is to base the document on the questions that newcomers ask most often, and on the complaints experienced developers make most often. This doesn't necessarily mean it should turn into a FAQ sheet—it probably needs a more coherent narrative structure than FAQs can offer. But it should follow the same reality-based principle of addressing the issues that actually arise, rather than those you anticipate might arise.

If the project is a benevolent dictatorship, or has officers endowed with special powers (president, chair, whatever), then the document is also a good opportunity to codify succession procedures. Sometimes this can be as simple as naming specific people as replacements in case the BD suddenly leaves the project for any reason. Generally, if there is a BD, only the BD can get away with naming a successor. If there are elected officers, then the nomination and election procedure that was used to choose them in the first place should be described in the document. If there was no procedure originally, then get consensus on a procedure on the mailing lists before writing about it. People can sometimes be touchy about hierarchical structures, so the subject needs to be approached with sensitivity.

Perhaps the most important thing is to make it clear that the rules can be reconsidered. If the conventions described in the document start to hamper the project, remind everyone that it is supposed to be a living reflection of the group's intentions, not a source of frustration and blockage. If someone makes a habit of inappropriately asking for rules to be reconsidered every time the rules get in her way, you don't always need to debate it with her—sometimes silence is the best tactic. If other people agree with the complaints, they'll chime in, and it will be obvious that something needs to change. If no one else agrees, then the person won't get much response, and the rules will stay as they are.

Two good examples of project guidelines are the Subversion hacking.html file, at http://subversion.apache.org/docs/community-guide/, and the Apache Software Foundation governance documents, at http://www.apache.org/foundation/how-it-works.html and http://www.apache.org/foundation/voting.html. The ASF is really a collection of software projects, legally organized as a nonprofit corporation, so its documents tend to describe governance procedures more than development conventions. They're still worth reading, though, because they represent the accumulated experience of a lot of open source projects.

Chapitre 5. Money

This chapter examines how to bring funding into a free software environment. It is aimed not only at developers who are paid to work on free software projects, but also at their managers, who need to understand the social dynamics of the development environment. In the sections that follow, the addressee ("you") is presumed to be either a paid developer, or one who manages such developers. The advice will often be the same for both; when it's not, the intended audience will be made clear from context.

Corporate funding of free software development is not a new phenomenon. A lot of development has always been informally subsidized. When a system administrator writes a network analysis tool to help her do her job, then posts it online and gets bug fixes and feature contributions from other system administrators, what's happened is that an unofficial consortium has been formed. The consortium's funding comes from the sysadmins' salaries, and its office space and network bandwidth are donated, albeit unknowingly, by the organizations they work for. Those organizations benefit from the investment, of course, although they may not be institutionally aware of it at first.

The difference today is that many of these efforts are being formalized. Corporations have become conscious of the benefits of open source software, and started involving themselves more directly in its development. Developers too have come to expect that really important projects will attract at least donations, and possibly even long-term sponsors. While the presence of money has not changed the basic dynamics of free software development, it has greatly changed the scale at which things happen, both in terms of the number of developers and time-per-developer. It has also had effects on how projects are organized, and on how the parties involved in them interact. The issues are not merely about how the money is spent, or how return on investment is measured. They are also about management and process: how can the hierarchical command structures of corporations and the semi-decentralized volunteer communities of free software projects work productively with each other? Will they even agree on what "productively" means?

Financial backing is, in general, welcomed by open source development communities. It can reduce a project's vulnerability to the Forces of Chaos, which sweep away so many projects before they really get off the ground, and therefore it can make people more willing to give the software a chance—they feel they're investing their time into something that will still be around six months from now. After all, credibility is contagious, to a point. When, say, IBM backs an open source project, people pretty much assume the project won't be allowed to fail, and their resultant willingness to devote effort to it can make that a self-fulfilling prophecy.

However, funding also brings a perception of control. If not handled carefully, money can divide a project into in-group and out-group developers. If the unpaid volunteers get the feeling that design decisions or feature additions are simply available to the highest bidder, they'll head off to a project that seems more like a meritocracy and less like unpaid labor for someone else's benefit. They may never complain overtly on the mailing lists. Instead, there will simply be less and less noise from external sources, as the volunteers gradually stop trying to be taken seriously. The buzz of small-scale activity will continue, in the form of bug reports and occasional small fixes. But there won't be any large code contributions or outside participation in design discussions. People sense what's expected of them, and live up (or down) to those expectations.

Although money needs to be used carefully, that doesn't mean it can't buy influence. It most certainly can. The trick is that it can't buy influence directly. In a straightforward commercial transaction, you trade money for what you want. If you need a feature added, you sign a contract, pay for it, and it gets done. In an open source project, it's not so simple. You may sign a contract with some developers, but they'd be fooling themselves—and you—if they guaranteed that the work you paid for would be accepted by the development community simply because you paid for it. The work can only be accepted on its own merits and on how it fits into the community's vision for the software. You may have some say in that vision, but you won't be the only voice.

So money can't purchase influence, but it can purchase things that lead to influence. The most obvious example is programmers. If good programmers are hired, and they stick around long enough to get experience with the software and credibility in the community, then they can influence the project by the same means as any other member. They will have a vote, or if there are many of them, they will have a voting bloc. If they are respected in the project, they will have influence beyond just their votes. There is no need for paid developers to disguise their motives, either. After all, everyone who wants a change made to the software wants it for a reason. Your company's reasons are no less legitimate than anyone else's. It's just that the weight given to your company's goals will be determined by its representatives' status in the project, not by the company's size, budget, or business plan.

Types of Involvement

There are many different reasons open source projects get funded. The items in this list aren't mutually exclusive; often a project's financial backing will result from several, or even all, of these motivations:

Sharing the burden

Separate organizations with related software needs often find themselves duplicating effort, either by redundantly writing similar code in-house, or by purchasing similar products from proprietary vendors. When they realize what's going on, the organizations may pool their resources and create (or join) an open source project tailored to their needs. The advantages are obvious: the costs of development are divided, but the benefits accrue to all. Although this scenario seems most intuitive for nonprofits, it can make strategic sense even for for-profit competitors.

Examples: http://www.openadapter.org/, http://www.koha.org/

Augmenting services

When a company sells services which depend on, or are made more attractive by, particular open source programs, it is naturally in that company's interests to ensure those programs are actively maintained.

Example: CollabNet's support of http://subversion.tigris.org/ (disclaimer: that's my day job, but it's also a perfect example of this model).

Supporting hardware sales

The value of computers and computer components is directly related to the amount of software available for them. Hardware vendors—not just whole-machine vendors, but also makers of peripheral devices and microchips—have found that having high-quality free software to run on their hardware is important to customers.

Undermining a competitor

Sometimes companies support a particular open source project as a means of undermining a competitor's product, which may or may not be open source itself. Eating away at a competitor's market share is usually not the sole reason for getting involved with an open source project, but it can be a factor.

Example: http://www.openoffice.org/ (no, this isn't the only reason OpenOffice exists, but the software is at least partly a response to Microsoft Office).

Marketing

Having your company associated with a popular open source application can be simply good brand management.

Dual-licensing

Dual-licensing is the practice of offering software under a traditional proprietary license for customers who want to resell it as part of a proprietary application of their own, and simultaneously under a free license for those willing to use it under open source terms (see la section intitulée « Dual Licensing Schemes » in Chapitre 9, Licenses, Copyrights, and Patents). If the open source developer community is active, the software gets the benefits of wide-area debugging and development, yet the company still gets a royalty stream to support some full-time programmers.

Two well-known examples are MySQL, makers of the database software of the same name, and Sleepycat, which offers distributions and support for the Berkeley Database. It's no coincidence that they're both database companies. Database software tends to be integrated into applications rather than marketed directly to users, so it's very well-suited to the dual-licensing model.

Donations

A widely-used project can sometimes get significant contributions, from both individuals and organizations, just by having an online donation button, or sometimes by selling branded merchandise such as coffee mugs, T-shirts, mousepads, etc. A word of caution: if your project accepts donations, plan out how the money will be used before it comes in, and state the plans on the project's web site. Discussions about how to allocate money tend to go a lot more smoothly when held before there's actual money to spend; and anyway, if there are significant disagreements, it's better to find that out while it's still academic.

A funder's business model is not the only factor in how it relates to an open source community. The historical relationship between the two also matters: did the company start the project, or is it joining an existing development effort? In both cases, the funder will have to earn credibility, but, not surprisingly, there's a bit more earning to be done in the latter case. The organization needs to have clear goals with respect to the project. Is the company trying to keep a position of leadership, or simply trying to be one voice in the community, to guide but not necessarily govern the project's direction? Or does it just want to have a couple of committers around, able to fix customers' bugs and get the changes into the public distribution without any fuss?

Keep these questions in mind as you read the guidelines that follow. They are meant to apply to any sort of organizational involvement in a free software project, but every project is a human environment, and therefore no two are exactly alike. To some degree, you will always have to play by ear, but following these principles will increase the likelihood of things turning out the way you want.

Hire for the Long Term

If you're managing programmers on an open source project, keep them there long enough that they acquire both technical and political expertise—a couple of years, at a minimum. Of course, no project, whether open or closed-source, benefits from swapping programmers in and out too often. The need for a newcomer to learn the ropes each time would be a deterrent in any environment. But the penalty is even stronger in open source projects, because outgoing developers take with them not only their knowledge of the code, but also their status in the community and the human relationships they have made there.

The credibility a developer has accumulated cannot be transferred. To pick the most obvious example, an incoming developer can't inherit commit access from an outgoing one (see la section intitulée « Money Can't Buy You Love » later in this chapter), so if the new developer doesn't already have commit access, he will have to submit patches until he does. But commit access is only the most measurable manifestation of lost influence. A long-time developer also knows all the old arguments that have been hashed and rehashed on the discussion lists. A new developer, having no memory of those conversations, may try to raise the topics again, leading to a loss of credibility for your organization; the others might wonder "Can't they remember anything?" A new developer will also have no political feel for the project's personalities, and will not be able to influence development directions as quickly or as smoothly as one who's been around a long time.

Train newcomers through a program of supervised engagement. The new developer should be in direct contact with the public development community from the very first day, starting off with bug fixes and cleanup tasks, so he can learn the code base and acquire a reputation in the community, yet not spark any long and involved design discussions. All the while, one or more experienced developers should be available for questioning, and should be reading every post the newcomer makes to the development lists, even if they're in threads that the experienced developers normally wouldn't pay attention to. This will help the group spot potential rocks before the newcomer runs aground. Private, behind-the-scenes encouragement and pointers can also help a lot, especially if the newcomer is not accustomed to massively parallel peer review of his code.

When CollabNet hires a new developer to work on Subversion, we sit down together and pick some open bugs for the new person to cut his teeth on. We'll discuss the technical outlines of the solutions, and then assign at least one experienced developer to (publicly) review the patch that the new developer will (also publicly) post. We typically don't even look at the patch before the main development list sees it, although we could if there were some reason to. The important thing is that the new developer go through the process of public review, learning the code base while simultaneously becoming accustomed to receiving critiques from complete strangers. But we try to coordinate the timing so that our own review comes immediately after the posting of the patch. That way the first review the list sees is ours, which can help set the tone for the others' reviews. It also contributes to the idea that this new person is to be taken seriously: if others see that we're putting in the time to give detailed reviews, with thorough explanations and references into the archives where appropriate, they'll appreciate that a form of training is going on, and that it probably signifies a long-term investment. This can make them more positively disposed toward that developer, at least to the degree of spending a little extra time answering questions and reviewing patches.

Appear as Many, Not as One

Your developers should strive to appear in the project's public forums as individual participants, rather than as a monolithic corporate presence. This is not because there is some negative connotation inherent in monolithic corporate presences (well, perhaps there is, but that's not what this book is about). Rather, it's because individuals are the only sort of entity open source projects are structurally equipped to deal with. An individual contributor can have discussions, submit patches, acquire credibility, vote, and so forth. A company cannot.

Furthermore, by behaving in a decentralized manner, you avoid stimulating centralization of opposition. Let your developers disagree with each other on the mailing lists. Encourage them to review each other's code as often, and as publicly, as they would anyone else's. Discourage them from always voting as a bloc, because if they do, others may start to feel that, just on general principles, there should be an organized effort to keep them in check.

There's a difference between actually being decentralized and simply striving to appear that way. Under certain circumstances, having your developers behave in concert can be quite useful, and they should be prepared to coordinate behind the scenes when necessary. For example, when making a proposal, having several people chime in with agreement early on can help it along, by giving the impression of a growing consensus. Others will feel that the proposal has momentum, and that if they were to object, they'd be stopping that momentum. Thus, people will object only if they have a good reason to do so. There's nothing wrong with orchestrating agreement like this, as long as objections are still taken seriously. The public manifestations of a private agreement are no less sincere for having been coordinated beforehand, and are not harmful as long as they are not used to prejudicially snuff out opposing arguments. Their purpose is merely to inhibit the sort of people who like to object just to stay in shape; see la section intitulée « The Softer the Topic, the Longer the Debate » in Chapitre 6, Communications for more about them.

Be Open About Your Motivations

Be as open about your organization's goals as you can without compromising business secrets. If you want the project to acquire a certain feature because, say, your customers have been clamoring for it, just say so outright on the mailing lists. If the customers wish to remain anonymous, as is sometimes the case, then at least ask them if they can be used as unnamed examples. The more the public development community knows about why you want what you want, the more comfortable they'll be with whatever you're proposing.

This runs counter to the instinct—so easy to acquire, so hard to shake off—that knowledge is power, and that the more others know about your goals, the more control they have over you. But that instinct would be wrong here. By publicly advocating the feature (or bugfix, or whatever it is), you have already laid your cards on the table. The only question now is whether you will succeed in guiding the community to share your goal. If you merely state that you want it, but can't provide concrete examples of why, your argument is weak, and people will start to suspect a hidden agenda. But if you give just a few real-world scenarios showing why the proposed feature is important, that can have a dramatic effect on the debate.

To see why this is so, consider the alternative. Too frequently, debates about new features or new directions are long and tiresome. The arguments people advance often reduce to "I personally want X," or the ever-popular "In my years of experience as a software designer, X is extremely important to users / a useless frill that will please no one." Predictably, the absence of real-world usage data neither shortens nor tempers such debates, but instead allows them to drift farther and farther from any mooring in actual user experience. Without some countervailing force, the end result is as likely as not to be determined by whoever was the most articulate, or the most persistent, or the most senior.

As an organization with plentiful customer data available, you have the opportunity to provide just such a countervailing force. You can be a conduit for information that might otherwise have no means of reaching the development community. The fact that that information supports your desires is nothing to be embarrassed about. Most developers don't individually have very broad experience with how the software they write is used. Each developer uses the software in her own idiosyncratic way; as far as other usage patterns go, she's relying on intuition and guesswork, and deep down, she knows this. By providing credible data about a significant number of users, you are giving the public development community something akin to oxygen. As long as you present it right, they will welcome it enthusiastically, and it will propel things in the direction you want to go.

The key, of course, is presenting it right. It will never do to insist that simply because you deal with a large number of users, and because they need (or think they need) a given feature, therefore your solution ought to be implemented. Instead, you should focus your initial posts on the problem, rather than on one particular solution. Describe in great detail the experiences your customers are encountering, offer as much analysis as you have available, and as many reasonable solutions as you can think of. When people start speculating about the effectiveness of various solutions, you can continue to draw on your data to support or refute what they say. You may have one particular solution in mind all along, but don't single it out for special consideration at first. This is not deception, it is simply standard "honest broker" behavior. After all, your true goal is to solve the problem; a solution is merely a means to that end. If the solution you prefer really is superior, other developers will recognize that on their own eventually—and then they will get behind it of their own free will, which is much better than you browbeating them into implementing it. (There is also the possibility that they will think of a better solution.)

This is not to say that you can't ever come out in favor of a specific solution. But you must have the patience to see the analysis you've already done internally repeated on the public development lists. Don't post saying "Yes, we've been over all that here, but it doesn't work for reasons A, B, and C. When you get right down to it, the only way to solve this is..." The problem is not so much that it sounds arrogant as that it gives the impression that you have already devoted some unknown (but, people will presume, large) amount of analytical resources to the problem, behind closed doors. It makes it seem as though efforts have been going on, and perhaps decisions made, that the public is not privy to, and that is a recipe for resentment.

Naturally, you know how much effort you've devoted to the problem internally, and that knowledge is, in a way, a disadvantage. It puts your developers in a slightly different mental space than everyone else on the mailing lists, reducing their ability to see things from the point of view of those who haven't yet thought about the problem as much. The earlier you can get everyone else thinking about things in the same terms as you do, the smaller this distancing effect will be. This logic applies not only to individual technical situations, but to the broader mandate of making your goals as clear as you can. The unknown is always more destabilizing than the known. If people understand why you want what you want, they'll feel comfortable talking to you even when they disagree. If they can't figure out what makes you tick, they'll assume the worst, at least some of the time.

You won't be able to publicize everything, of course, and people won't expect you to. All organizations have secrets; perhaps for-profits have more of them, but nonprofits have them too. If you must advocate a certain course, but can't reveal anything about why, then simply offer the best arguments you can under that handicap, and accept the fact that you may not have as much influence as you want in the discussion. This is one of the compromises you make in order to have a development community not on your payroll.

Money Can't Buy You Love

If you're a paid developer on a project, then set guidelines early on about what the money can and cannot buy. This does not mean you need to post twice a day to the mailing lists reiterating your noble and incorruptible nature. It merely means that you should be on the lookout for opportunities to defuse the tensions that could be created by money. You don't need to start out assuming that the tensions are there; you do need to demonstrate an awareness that they have the potential to arise.

A perfect example of this came up in the Subversion project. Subversion was started in 2000 by CollabNet, which has been the project's primary funder since its inception, paying the salaries of several developers (disclaimer: I'm one of them). Soon after the project began, we hired another developer, Mike Pilato, to join the effort. By then, coding had already started. Although Subversion was still very much in the early stages, it already had a development community with a set of basic ground rules.

Mike's arrival raised an interesting question. Subversion already had a policy about how a new developer gets commit access. First, he submits some patches to the development mailing list. After enough patches have gone by for the other committers to see that the new contributor knows what he's doing, someone proposes that he just commit directly (that proposal is private, as described in la section intitulée « Committers »). Assuming the committers agree, one of them mails the new developer and offers him direct commit access to the project's repository.

CollabNet had hired Mike specifically to work on Subversion. Among those who already knew him, there was no doubt about his coding skills or his readiness to work on the project. Furthermore, the volunteer developers had a very good relationship with the CollabNet employees, and most likely would not have objected if we'd just given Mike commit access the day he was hired. But we knew we'd be setting a precedent. If we granted Mike commit access by fiat, we'd be saying that CollabNet had the right to ignore project guidelines, simply because it was the primary funder. While the damage from this would not necessarily be immediately apparent, it would gradually result in the non-salaried developers feeling disenfranchised. Other people have to earn their commit access—CollabNet just buys it.

So Mike agreed to start out his employment at CollabNet like any other volunteer developer, without commit access. He sent patches to the public mailing list, where they could be, and were, reviewed by everyone. We also said on the list that we were doing things this way deliberately, so there could be no missing the point. After a couple of weeks of solid activity by Mike, someone (I can't remember if it was a CollabNet developer or not) proposed him for commit access, and he was accepted, as we knew he would be.

That kind of consistency gets you a credibility that money could never buy. And credibility is a valuable currency to have in technical discussions: it's immunization against having one's motives questioned later. In the heat of argument, people will sometimes look for non-technical ways to win the battle. The project's primary funder, because of its deep involvement and obvious concern over the directions the project takes, presents a wider target than most. By being scrupulous to observe all project guidelines right from the start, the funder makes itself the same size as everyone else.

(See also Danese Cooper's blog at http://blogs.sun.com/roller/page/DaneseCooper/20040916 for a similar story about commit access. Cooper was then Sun Microsystem's "Open Source Diva"—I believe that was her official title—and in the blog entry, she describes how the Tomcat development community got Sun to hold its own developers to the same commit-access standards as the non-Sun developers.)

The need for the funders to play by the same rules as everyone else means that the Benevolent Dictatorship governance model (see la section intitulée « Benevolent Dictators » in Chapitre 4, Social and Political Infrastructure) is slightly harder to pull off in the presence of funding, particularly if the dictator works for the primary funder. Since a dictatorship has few rules, it is hard for the funder to prove that it's abiding by community standards, even when it is. It's certainly not impossible; it just requires a project leader who is able to see things from the point of view of the outside developers, as well as that of the funder, and act accordingly. Even then, it's probably a good idea to have a proposal for non-dictatorial governance sitting in your back pocket, ready to be brought out the moment there are any indications of widespread dissatisfaction in the community.

Contracting

Contracted work needs to be handled carefully in free software projects. Ideally, you want a contractor's work to be accepted by the community and folded into the public distribution. In theory, it wouldn't matter who the contractor is, as long as his work is good and meets the project's guidelines. Theory and practice can sometimes match, too: a complete stranger who shows up with a good patch will generally be able to get it into the software. The trouble is, it's very hard to produce a good patch for a non-trivial enhancement or new feature while truly being a complete stranger; one must first discuss it with the rest of the project. The duration of that discussion cannot be precisely predicted. If the contractor is paid by the hour, you may end up paying more than you expected; if he is paid a flat sum, he may end up doing more work than he can afford.

There are two ways around this. The preferred way is to make an educated guess about the length of the discussion process, based on past experience, add in some padding for error, and base the contract on that. It also helps to divide the problem into as many small, independent chunks as possible, to increase the predictability of each chunk. The other way is to contract solely for delivery of a patch, and treat the patch's acceptance into the public project as a separate matter. Then it becomes much easier to write the contract, but you're stuck with the burden of maintaining a private patch for as long as you depend on the software, or at least for as long as it takes you to get that patch or equivalent functionality into the mainline. Of course, even with the preferred way, the contract itself cannot require that the patch be accepted into the code, because that would involve selling something that's not for sale. (What if the rest of the project unexpectedly decides not to support the feature?) However, the contract can require a bona fide effort to get the change accepted by the community, and that it be committed to the repository if the community agrees with it. For example, if the project has written standards regarding code changes, the contract can reference those standards and specify that the work must meet them. In practice, this usually works out the way everyone hopes.

The best tactic for successful contracting is to hire one of the project's developers—preferably a committer—as the contractor. This may seem like a form of purchasing influence, and, well, it is. But it's not as corrupt as it might seem. A developer's influence in the project is due mainly to the quality of his code and to his interactions with other developers. The fact that he has a contract to get certain things done doesn't raise his status in any way, and doesn't lower it either, though it may make people scrutinize him more carefully. Most developers would not risk their long-term position in the project by backing an inappropriate or widely disliked new feature. In fact, part of what you get, or should get, when you hire such a contractor is advice about what sorts of changes are likely to be accepted by the community. You also get a slight shift in the project's priorities. Because prioritization is just a matter of who has time to work on what, when you pay for someone's time, you cause their work to move up in the priority queue a bit. This is a well-understood fact of life among experienced open source developers, and at least some of them will devote attention to the contractor's work simply because it looks like it's going to get done, so they want to help it get done right. Perhaps they won't write any of the code, but they'll still discuss the design and review the code, both of which can be very useful. For all these reasons, the contractor is best drawn from the ranks of those already involved with the project.

This immediately raises two questions: Should contracts ever be private? And when they're not, should you worry about creating tensions in the community by the fact that you've contracted with some developers and not others?

It's best to be open about contracts, when you can. Otherwise, the contractor's behavior may seem strange to others in the community—perhaps he's suddenly giving inexplicably high priority to features he's never shown interest in in the past. When people ask him why he wants them now, how can he answer convincingly if he can't talk about the fact that he's been contracted to write them?

At the same time, neither you nor the contractor should act as though others should treat your arrangement as a big deal. Too often I've seen contractors waltz onto a development list with the attitude that their posts should be taken more seriously simply because they're being paid. That kind of attitude signals to the rest of the project that the contractor regards the fact of the contract—as opposed to the code resulting from the contract—to be the important thing. But from the other developers' point of view, only the code matters. At all times, the focus of attention should be kept on technical issues, not on the details of who is paying whom. For example, one of the developers in the Subversion community handles contracting in a particularly graceful way. While discussing his code changes in IRC, he'll mention as an aside (often in a private remark, an IRC privmsg, to one of the other committers) that he's being paid for his work on this particular bug or feature. But he also consistently gives the impression that he'd want to be working on that change anyway, and that he's happy the money is making it possible for him to do that. He may or may not reveal his customer's identity, but in any case he doesn't dwell on the contract. His remarks about it are just an ornament to an otherwise technical discussion about how to get something done.

That example shows another reason why it's good to be open about contracts. There may be multiple organizations sponsoring contracts on a given open source project, and if each knows what the others are trying to do, they may be able to pool their resources. In the above case, the project's largest funder (CollabNet) is not involved in any way with these piecework contracts, but knowing that someone else is sponsoring certain bug fixes allows CollabNet to redirect its resources to other bugs, resulting in greater efficiency for the project as a whole.

Will other developers resent that some are paid for working on the project? In general, no, particularly when those who are paid are established, well-respected members of the community anyway. No one expects contract work to be distributed equally among all the committers. People understand the importance of long-term relationships: the uncertainties involved in contracting are such that once you find someone you can work reliably with, you would be reluctant to switch to a different person just for the sake of evenhandedness. Think of it this way: the first time you hire, there will be no complaints, because clearly you had to pick someone—it's not your fault you can't hire everyone. Later, when you hire the same person a second time, that's just common sense: you already know him, the last time was successful, so why take unnecessary risks? Thus, it's perfectly natural to have one or two go-to people in the community, instead of spreading the work around evenly.

Review and Acceptance of Changes

The community is still important to the success of contract work. Their involvement in the design and review process for sizeable changes cannot be an afterthought. It must be considered part of the work, and fully embraced by the contractor. Don't think of community scrutiny as an obstacle to be overcome—think of it as a free design board and QA department. It is a benefit to be aggressively pursued, not merely endured.

Case study: the CVS password-authentication protocol

In 1995, I was one half of a partnership that provided support and enhancements for CVS (the Concurrent Versions System; see http://www.cvshome.org/). My partner Jim and I were, informally, the maintainers of CVS by that point. But we'd never thought carefully about how we ought to relate to the existing, mostly volunteer CVS development community. We just assumed that they'd send in patches, and we'd apply them, and that was pretty much how it worked.

Back then, networked CVS could be done only over a remote login program such as rsh. Using the same password for CVS access as for login access was an obvious security risk, and many organizations were put off by it. A major investment bank hired us to add a new authentication mechanism, so they could safely use networked CVS with their remote offices.

Jim and I took the contract and sat down to design the new authentication system. What we came up with was pretty simple (the United States had export controls on cryptographic code at the time, so the customer understood that we couldn't implement strong authentication), but as we were not experienced in designing such protocols, we still made a few gaffes that would have been obvious to an expert. These mistakes would easily have been caught had we taken the time to write up a proposal and run it by the other developers for review. But we never did so, because it didn't occur to us to think of the development list as a resource to be used. We knew that people were probably going to accept whatever we committed, and—because we didn't know what we didn't know—we didn't bother to do the work in a visible way, e.g., posting patches frequently, making small, easily digestible commits to a special branch, etc. The resulting authentication protocol was not very good, and of course, once it became established, it was difficult to improve, because of compatibility concerns.

The root of the problem was not lack of experience; we could easily have learned what we needed to know. The problem was our attitude toward the volunteer development community. We regarded acceptance of the changes as a hurdle to leap, rather than as a process by which the quality of the changes could be improved. Since we were confident that almost anything we did would be accepted (as it was), we made little effort to get others involved.

Obviously, when you're choosing a contractor, you want someone with the right technical skills and experience for the job. But it's also important to choose someone with a track record of constructive interaction with the other developers in the community. That way you're getting more than just a single person; you're getting an agent who will be able to draw on a network of expertise to make sure the work is done in a robust and maintainable way.

Funding Non-Programming Activities

Programming is only part of the work that goes on in an open source project. From the point of view of the project's volunteers, it's the most visible and glamorous part. This unfortunately means that other activities, such as documentation, formal testing, etc., can sometimes be neglected, at least compared to the amount of attention they often receive in proprietary software. Corporate organizations are sometimes able to make up for this, by devoting some of their internal software development infrastructure to open source projects.

The key to doing this successfully is to translate between the company's internal processes and those of the public development community. Such translation is not effortless: often the two are not a close match, and the differences can only be bridged via human intervention. For example, the company may use a different bug tracker than the public project. Even if they use the same tracking software, the data stored in it will be very different, because the bug-tracking needs of a company are very different from those of a free software community. A piece of information that starts in one tracker may need to be reflected in the other, with confidential portions removed or, in the other direction, added.

The sections that follow are about how to build and maintain such bridges. The end result should be that the open source project runs more smoothly, the community recognizes the company's investment of resources, and yet does not feel that the company is inappropriately steering things toward its own goals.

Quality Assurance (i.e., Professional Testing)

In proprietary software development, it is normal to have teams of people dedicated solely to quality assurance: bug hunting, performance and scalability testing, interface and documentation checking, etc. As a rule, these activities are not pursued as vigorously by the volunteer community on a free software project. This is partly because it's hard to get volunteer labor for unglamorous work like testing, partly because people tend to assume that having a large user community gives the project good testing coverage, and, in the case of performance and scalability testing, partly because volunteers often don't have access to the necessary hardware resources anyway.

The assumption that having many users is equivalent to having many testers is not entirely baseless. Certainly there's little point assigning testers for basic functionality in common environments: bugs there will quickly be found by users in the natural course of things. But because users are just trying to get work done, they do not consciously set out to explore uncharted edge cases in the program's functionality, and are likely to leave certain classes of bugs unfound. Furthermore, when they discover a bug with an easy workaround, they often silently implement the workaround without bothering to report the bug. Most insidiously, the usage patterns of your customers (the people who drive your interest in the software) may differ in statistically significant ways from the usage patterns of the Average User In The Street.

A professional testing team can uncover these sorts of bugs, and can do so as easily with free software as with proprietary software. The challenge is to convey the testing team's results back to the public in a useful form. In-house testing departments usually have their own way of reporting test results, involving company-specific jargon, or specialized knowledge about particular customers and their data sets. Such reports would be inappropriate for the public bug tracker, both because of their form and because of confidentiality concerns. Even if your company's internal bug tracking software were the same as that used by the public project, management might need to make company-specific comments and metadata changes to the issues (for example, to raise an issue's internal priority, or schedule its resolution for a particular customer). Usually such notes are confidential—sometimes they're not even shown to the customer. But even when they're not confidential, they're of no concern to the public project, and therefore the public should not be distracted with them.

Yet the core bug report itself is important to the public. In fact, a bug report from your testing department is in some ways more valuable than one received from users at large, since the testing department probes for things that other users won't. Given that you're unlikely to get that particular bug report from any other source, you definitely want to preserve it and make it available to the public project.

To do this, either the QA department can file issues directly in the public issue tracker, if they're comfortable with that, or an intermediary (usually one of the developers) can "translate" the testing department's internal reports into new issues in the public tracker. Translation simply means describing the bug in a way that makes no reference to customer-specific information (the reproduction recipe may use customer data, assuming the customer approves it, of course).

It is somewhat preferable to have the QA department filing issues in the public tracker directly. That gives the public a more direct appreciation of your company's involvement with the project: useful bug reports add to your organization's credibility just as any technical contribution would. It also gives developers a direct line of communication to the testing team. For example, if the internal QA team is monitoring the public issue tracker, a developer can commit a fix for a scalability bug (which the developer may not have the resources to test herself), and then add a note to the issue asking the QA team to see if the fix had the desired effect. Expect a bit of resistance from some of the developers; programmers have a tendency to regard QA as, at best, a necessary evil. The QA team can easily overcome this by finding significant bugs and filing comprehensible reports; on the other hand, if their reports are not at least as good as those coming from the regular user community, then there's no point having them interact directly with the development team.

Either way, once a public issue exists, the original internal issue should simply reference the public issue for technical content. Management and paid developers may continue to annotate the internal issue with company-specific comments as necessary, but use the public issue for information that should be available to everyone.

You should go into this process expecting extra overhead. Maintaining two issues for one bug is, naturally, more work than maintaining one issue. The benefit is that many more coders will see the report and be able to contribute to a solution.

Legal Advice and Protection

Corporations, for-profit or nonprofit, are almost the only entities that ever pay attention to complex legal issues in free software. Individual developers often understand the nuances of various open source licenses, but they generally do not have the time or resources to follow copyright, trademark, and patent law in detail. If your company has a legal department, it can help a project by vetting the copyright status of the code, and helping developers understand possible patent and trademark issues. The exact forms this help could take are discussed in Chapitre 9, Licenses, Copyrights, and Patents. The main thing is to make sure that communications between the legal department and the development community, if they happen at all, happen with a mutual appreciation of the very different universes the parties are coming from. On occasion, these two groups talk past each other, each side assuming domain-specific knowledge that the other does not have. A good strategy is to have a liaison (usually a developer, or else a lawyer with technical expertise) stand in the middle and translate for as long as needed.

Documentation and Usability

Documentation and usability are both famous weak spots in open source projects, although I think, at least in the case of documentation, that the difference between free and proprietary software is frequently exaggerated. Nevertheless, it is empirically true that much open source software lacks first-class documentation and usability research.

If your organization wants to help fill these gaps for a project, probably the best thing it can do is hire people who are not regular developers on the project, but who will be able to interact productively with the developers. Not hiring regular developers is good for two reasons: one, that way you don't take development time away from the project; two, those closest to the software are usually the wrong people to write documentation or investigate usability anyway, because they have trouble seeing the software from an outsider's point of view.

However, it will still be necessary for whoever works on these problems to communicate with the developers. Find people who are technical enough to talk to the coding team, but not so expert in the software that they can't empathize with regular users anymore.

A medium-level user is probably the right person to write good documentation. In fact, after the first edition of this book was published, I received the following email from an open source developer named Dirk Reiners:

One comment on Money::Documentation and Usability: when we had some 
money to spend and decided that a beginner's tutorial was the most 
critical piece that we needed we hired a medium-level user to write it. 
He had gone through the induction to the system recently enough to 
remember the problems, but he had gotten past them so he knew how to 
describe them. That allowed him to write something that needed only 
minor fixes by the core developers for the things that he hadn't gotten 
right, but still covering the 'obvious' stuff devs would have missed.

His case was even better, as it had been his job to introduce a bunch of 
other people (students) to the system, so he combined the experience of 
many people, which is something that was just a lucky occurrence and is 
probably hard to get in most cases.

Providing Hosting/Bandwidth

For a project that's not using one of the free canned hosting sites (see la section intitulée « Canned Hosting » in Chapitre 3, L'infrastructure technique), providing a server and network connection—and most importantly, system administration help—can be of significant assistance. Even if this is all your organization does for the project, it can be a moderately effective way to obtain good public relations karma, though it will not bring any influence over the direction of the project.

You can probably expect a banner ad or an acknowledgment on the project's home page, thanking your company for providing hosting. If you set up the hosting so that the project's web address is under your company's domain name, then you will get some additional association just through the URL. This will cause most users to think of the software as having something to do with your company, even if you don't contribute to development at all. The problem is, the developers are aware of this associative tendency too, and may not be very comfortable with having the project in your domain unless you're contributing more resources than just bandwidth. After all, there are a lot of places to host these days. The community may eventually feel that the implied misallocation of credit is not worth the convenience brought by hosting, and take the project elsewhere. So if you want to provide hosting, do so—but either plan to get even more involved soon, or be circumspect about how much involvement you claim.

Marketing

Although most open source developers would probably hate to admit it, marketing works. A good marketing campaign can create buzz around an open source product, even to the point where hardheaded coders find themselves having vaguely positive thoughts about the software for reasons they can't quite put their finger on. It is not my place here to dissect the arms-race dynamics of marketing in general. Any corporation involved in free software will eventually find itself considering how to market themselves, the software, or their relationship to the software. The advice below is about how to avoid common pitfalls in such an effort; see also la section intitulée « Publicity » in Chapitre 6, Communications.

Remember That You Are Being Watched

For the sake of keeping the volunteer developer community on your side, it is very important not to say anything that isn't demonstrably true. Audit all claims carefully before making them, and give the public the means to check your claims on their own. Independent fact checking is a major part of open source, and it applies to more than just the code.

Naturally no one would advise companies to make unverifiable claims anyway. But with open source activities, there is an unusually high quantity of people with the expertise to verify claims—people who are also likely to have high-bandwidth Internet access and the right social contacts to publicize their findings in a damaging way, should they choose to. When Global Megacorp Chemical Industries pollutes a stream, that's verifiable, but only by trained scientists, who can then be refuted by Global Megacorp's scientists, leaving the public scratching their heads and wondering what to think. On the other hand, your behavior in the open source world is not only visible and recorded; it is also easy for many people to check it independently, come to their own conclusions, and spread those conclusions by word of mouth. These communications networks are already in place; they are the essence of how open source operates, and they can be used to transmit any sort of information. Refutation is usually difficult, if not impossible, especially when what people are saying is true.

For example, it's okay to refer to your organization as having "founded project X" if you really did. But don't refer to yourself as the "makers of X" if most of the code was written by outsiders. Conversely, don't claim to have a deeply involved volunteer developer community if anyone can look at your repository and see that there are few or no code changes coming from outside your organization.

Not too long ago, I saw an announcement by a very well-known computer company, stating that they were releasing an important software package under an open source license. When the initial announcement came out, I took a look at their now-public version control repository and saw that it contained only three revisions. In other words, they had done an initial import of the source code, but hardly anything had happened since then. That in itself was not worrying—they'd just made the announcement, after all. There was no reason to expect a lot of development activity right away.

Some time later, they made another announcement. Here is what it said, with the name and release number replaced by pseudonyms:

We are pleased to announce that following rigorous testing by the Singer Community, Singer 5 for Linux and Windows are now ready for production use.

Curious to know what the community had uncovered in "rigorous testing," I went back to the repository to look at its recent change history. The project was still on revision 3. Apparently, they hadn't found a single bug worth fixing before the release! Thinking that the results of the community testing must have been recorded elsewhere, I next examined the bug tracker. There were exactly six open issues, four of which had been open for several months already.

This beggars belief, of course. When testers pound on a large and complex piece of software for any length of time, they will find bugs. Even if the fixes for those bugs don't make it into the upcoming release, one would still expect some version control activity as a result of the testing process, or at least some new issues. Yet to all appearances, nothing had happened between the announcement of the open source license and the first open source release.

The point is not that the company was lying about the community testing. I have no idea if they were or not. But they were oblivious to how much it looked like they were lying. Since neither the version control repository nor the issue tracker gave any indication that the alleged rigorous testing had occurred, the company should either not have made the claim in the first place, or provided a clear link to some tangible result of that testing ("We found 278 bugs; click here for details"). The latter would have allowed anyone to get a handle on the level of community activity very quickly. As it was, it only took me a few minutes to determine that whatever this community testing was, it had not left traces in any of the usual places. That's not a lot of effort, and I'm sure I'm not the only one who took the trouble.

Transparency and verifiability are also an important part of accurate crediting, of course. See la section intitulée « Credit » in Chapitre 8, Managing Volunteers for more on this.

Don't Bash Competing Open Source Products

Refrain from giving negative opinions about competing open source software. It's perfectly okay to give negative facts—that is, easily confirmable assertions of the sort often seen in good comparison charts. But negative characterizations of a less rigorous nature are best avoided, for two reasons. First, they are liable to start flame wars that detract from productive discussion. Second, and more importantly, some of the volunteer developers in your project may turn out to work on the competing project as well. This is more likely than it at first might seem: the projects are already in the same domain (that's why they're in competition), and developers with expertise in that domain may make contributions wherever their expertise is applicable. Even when there is no direct developer overlap, it is likely that developers on your project are at least acquainted with developers on related projects. Their ability to maintain constructive personal ties could be hampered by overly negative marketing messages.

Bashing competing closed-source products seems to be more widely accepted in the open source world, especially when those products are made by Microsoft. Personally, I deplore this tendency (though again, there's nothing wrong with straightforward factual comparisons), not merely because it's rude, but also because it's dangerous for a project to start believing its own hype and thereby ignore the ways in which the competition may actually be superior. In general, watch out for the effect that marketing statements can have on your own development community. People may be so excited at being backed by marketing dollars that they lose objectivity about their software's true strengths and weaknesses. It is normal, and even expected, for a company's developers to exhibit a certain detachment toward marketing statements, even in public forums. Clearly, they should not come out and contradict the marketing message directly (unless it's actually wrong, though one hopes that sort of thing would have been caught earlier). But they may poke fun at it from time to time, as a way of bringing the rest of the development community back down to earth.

Chapitre 6. Communications

The ability to write clearly is perhaps the most important skill one can have in an open source environment. In the long run it matters more than programming talent. A great programmer with lousy communications skills can get only one thing done at a time, and even then may have trouble convincing others to pay attention. But a lousy programmer with good communications skills can coordinate and persuade many people to do many different things, and thereby have a significant effect on a project's direction and momentum.

There does not seem to be much correlation, in either direction, between the ability to write good code and the ability to communicate with one's fellow human beings. There is some correlation between programming well and describing technical issues well, but describing technical issues is only a tiny part of the communications in a project. Much more important is the ability to empathize with one's audience, to see one's own posts and comments as others see them, and to cause others to see their own posts with similar objectivity. Equally important is noticing when a given medium or communications method is no longer working well, perhaps because it doesn't scale as the number of users increases, and taking the time to do something about it.

All of which is obvious in theory—what makes it hard in practice is that free software development environments are bewilderingly diverse both in audiences and in communications mechanisms. Should a given thought be expressed in a post to the mailing list, as an annotation in the bug tracker, or as a comment in the code? When answering a question in a public forum, how much knowledge can you assume on the part of the reader, given that "the reader" is not only the one who asked the question in the first place, but all those who might see your response? How can the developers stay in constructive contact with the users, without getting swamped by feature requests, spurious bug reports, and general chatter? How do you tell when a medium has reached the limits of its capacity, and what do you do about it?

Solutions to these problems are usually partial, because any particular solution is eventually made obsolete by project growth or changes in project structure. They are also often ad hoc, because they're improvised responses to dynamic situations. All participants need to be aware of when and how communications can become bogged down, and be involved in solutions. Helping people do this is a big part of managing an open source project. The sections that follow discuss both how to conduct your own communications, and how to make maintenance of communications mechanisms a priority for everyone in the project.[19]

You Are What You Write

Consider this: the only thing anyone knows about you on the Internet comes from what you write, or what others write about you. You may be brilliant, perceptive, and charismatic in person—but if your emails are rambling and unstructured, people will assume that's the real you. Or perhaps you really are rambling and unstructured in person, but no one need ever know it, if your posts are lucid and informative.

Devoting some care to your writing will pay off hugely. Long-time free software hacker Jim Blandy tells the following story:

Back in 1993, I was working for the Free Software Foundation, and we were beta-testing version 19 of GNU Emacs. We'd make a beta release every week or so, and people would try it out and send us bug reports. There was this one guy whom none of us had met in person but who did great work: his bug reports were always clear and led us straight to the problem, and when he provided a fix himself, it was almost always right. He was top-notch.

Now, before the FSF can use code written by someone else, we have them do some legal paperwork to assign their copyright interest to that code to the FSF. Just taking code from complete strangers and dropping it in is a recipe for legal disaster.

So I emailed the guy the forms, saying, "Here's some paperwork we need, here's what it means, you sign this one, have your employer sign that one, and then we can start putting in your fixes. Thanks very much."

He sent me back a message saying, "I don't have an employer."

So I said, "Okay, that's fine, just have your university sign it and send it back."

After a bit, he wrote me back again, and said, "Well, actually... I'm thirteen years old and I live with my parents."

Because that kid didn't write like a thirteen-year-old, no one knew that's what he was. Following are some ways to make your writing give a good impression too.

Structure and Formatting

Don't fall into the trap of writing everything as though it were a cell phone text message. Write in complete sentences, capitalizing the first word of each sentence, and use paragraph breaks where needed. This is most important in emails and other composed writings. In IRC or similarly ephemeral forums, it's generally okay to leave out capitalization, use compressed forms of common expressions, etc. Just don't carry those habits over into more formal, persistent forums. Emails, documentation, bug reports, and other pieces of writing that are intended to have a permanent life should be written using standard grammar and spelling, and have a coherent narrative structure. This is not because there's anything inherently good about following arbitrary rules, but rather that these rules are not arbitrary: they evolved into their present forms because they make text more readable, and you should adhere to them for that reason. Readability is desirable not only because it means more people will understand what you write, but because it makes you look like the sort of person who takes the time to communicate clearly: that is, someone worth paying attention to.

For email in particular, experienced open source developers have settled on certain conventions:

Send plain text mails only, not HTML, RichText, or other formats that might be opaque to text-only mail readers. Format your lines to be around 72 columns long. Don't exceed 80 columns, which has become the de facto standard terminal width (that is, some people may use wider terminals, but no one uses a narrower one). By making your lines a little less than 80 columns, you leave room for a few levels of quoting characters to be added in others' replies without forcing a rewrapping of your text.

Use real line breaks. Some mailers do a kind of fake line wrapping, whereby when you're composing an email, the display shows line breaks that aren't actually there. When the mail goes out, it may not have the line breaks you thought it had, and it will wrap awkwardly on some people's screens. If your mailer might use fake line breaks, look for a setting you can tweak to make it show the true line breaks as you compose.

When including screen output, snippets of code, or other preformatted text, offset it clearly, so that even a lazy eye can easily see the boundaries between your prose and the material you're quoting. (I never expected to write that advice when I started this book, but on a number of open source mailing lists lately, I've seen people mix texts from different sources without making it clear which is which. The effect is very frustrating. It makes their posts significantly harder to understand, and frankly makes those people look a little bit disorganized.)

When quoting someone else's mail, insert your responses where they're most appropriate, at several different places if necessary, and trim off the parts of their mail you didn't use. If you're writing a quick comment that applies to their entire post, it's okay to top-post (that is, to put your response above the quoted text of their mail); otherwise, you should quote the relevant portion of the original text first, followed by your response.

Construct the subject lines of new mails carefully. It's the most important line in your mail, because it allows each other person in the project to decide whether or not to read more. Modern mail reading software organizes groups of related messages into threads, which can be defined not only by a common subject, but by various other headers (which are sometimes not displayed). It follows that if a thread starts to drift to a new topic, you can—and should—adjust the subject line accordingly when replying. The thread's integrity will be preserved, due to those other headers, but the new subject will help people looking at an overview of the thread know that the topic has drifted. Likewise, if you really want to start a new topic, do it by posting a fresh mail, not by replying to an existing mail and changing the subject. Otherwise, your mail would still be grouped in to the same thread as what you're replying to, and thus fool people into thinking it's about something it's not. Again, the penalty would not only be the waste of their time, but the slight dent in your credibility as someone fluent in using communications tools.

Content

Well-formatted mails attract readers, but content keeps them. No set of fixed rules can guarantee good content, of course, but there are some principles that make it more likely.

Make things easy for your readers. There's a ton of information floating around in any active open source project, and readers cannot be expected to be familiar with most of it—indeed, they cannot always be expected to know how to become familiar. Wherever possible, your posts should provide information in the form most convenient for readers. If you have to spend an extra two minutes to dig up the URL to a particular thread in the mailing list archives, in order to save your readers the trouble of doing so, it's worth it. If you have to spend an extra 5 or 10 minutes summarizing the conclusions so far of a complex thread, in order to give people context in which to understand your post, then do so. Think of it this way: the more successful a project, the higher the reader-to-writer ratio in any given forum. If every post you make is seen by n people, then as n rises, the worthwhileness of expending extra effort to save those people time rises with it. And as people see you imposing this standard on yourself, they will work to match it in their own communications. The result is, ideally, an increase in the global efficiency of the project: when there is a choice between n people making an effort and one person doing so, the project prefers the latter.

Don't engage in hyperbole. Exaggerating in online posts is a classic arms race. For example, a person reporting a bug may worry that the developers will not pay sufficient attention, so he'll describe it as a severe, showstopper problem that is preventing him (and all his friends/coworkers/cousins) from using the software productively, when it's actually only a mild annoyance. But exaggeration is not limited to users—programmers often do the same thing during technical debates, particularly when the disagreement is over a matter of taste rather than correctness:

"Doing it that way would make the code totally unreadable. It'd be a maintenance nightmare, compared to J. Random's proposal..."

The same sentiment actually becomes stronger when phrased less sharply:

"That works, but it's less than ideal in terms of readability and maintainability, I think. J. Random's proposal avoids those problems because it..."

You will not be able to get rid of hyperbole completely, and in general it's not necessary to do so. Compared to other forms of miscommunication, hyperbole is not globally damaging—it hurts mainly the perpetrator. The recipients can compensate, it's just that the sender loses a little more credibility each time. Therefore, for the sake of your own influence in the project, try to err on the side of moderation. That way, when you do need to make a strong point, people will take you seriously.

Edit twice. For any message longer than a medium-sized paragraph, reread it from top to bottom before sending it but after you think it's done the first time. This is familiar advice to anyone who's taken a composition class, but it's especially important in online discussion. Because the process of online composition tends to be highly discontinuous (in the course of writing a message, you may need to go back and check other mails, visit certain web pages, run a command to capture its debugging output, etc.), it's especially easy to lose your sense of narrative place. Messages that were composed discontinuously and not checked before being sent are often recognizable as such, much to the chagrin (or so one would hope) of their authors. Take the time to review what you send. The more your posts hold together structurally, the more they will be read.

Tone

After writing thousands of messages, you will probably find your style tending toward the terse. This seems to be the norm in most technical forums, and there's nothing wrong with it per se. A degree of terseness that would be unacceptable in normal social interactions is simply the default for free software hackers. Here's a response I once drew on a mailing list about some free content management software, quoted in full:

Can you possibly elaborate a bit more on exactly what problems
you ran into, etc?

Also:

What version of Slash are you using? I couldn't tell from your
original message.

Exactly how did you build the apache/mod_perl source?

Did you try the Apache 2.0 patch that was posted about on
slashcode.com?

  Shane

Now that's terse! No greeting, no sign-off other than his name, and the message itself is just a series of questions phrased as compactly as possible. His one declarative sentence was an implicit criticism of my original message. And yet, I was happy to see Shane's mail, and didn't take his terseness as a sign of anything other than him being a busy person. The mere fact that he was asking questions, instead of ignoring my post, meant that he was willing to spend some time on my problem.

Will all readers react positively to this style? Not necessarily; it depends on the person and the context. For example, if someone has just posted acknowledging that she made a mistake (perhaps she wrote a bug), and you know from past experience that this person tends to be a bit insecure, then while you may still write a compact response, you should make sure to leaven it with some sort of acknowledgment of her feelings. The bulk of your response might be a brief, engineer's-eye analysis of the situation, as terse as you want. But at the end, sign off with something indicating that your terseness is not to be taken as coldness. For example, if you've just given reams of advice about exactly how the person should fix the bug, then sign off with "Good luck, <your name here>" to indicate that you wish them well and are not mad. A strategically placed smiley face or other emoticlue can often be enough to reassure an interlocutor, too.

It may seem odd to focus as much on the participant's feelings as on the surface of what they say, but to put it baldly, feelings affect productivity. Feelings are important for other reasons too, but even confining ourselves to purely utilitarian grounds, we may note that unhappy people write worse software, and less of it. Given the restricted nature of most electronic media, though, there will often be no overt clue as to how a person is feeling. You will have to make an educated guess based on a) how most people would feel in that situation, and b) what you know of this particular person from past interactions. Some people prefer a more hands-off attitude, and simply deal with everyone at face value, the idea being that if a participant doesn't say outright that she feels a particular way, then one has no business treating her as though she does. I don't buy this approach, for a couple of reasons. One, people don't behave that way in real life, so why would they online? Two, since most interactions take place in public forums, people tend to be even more restrained in expressing emotions than they might be in private. To be more precise, they are often willing to express emotions directed at others, such as gratitude or outrage, but not emotions directed inwardly, such as insecurity or pride. Yet most humans work better when they know that others are aware of their state of mind. By paying attention to small clues, you can usually guess right most of the time, and motivate people to stay involved to a greater degree than they otherwise might.

I don't mean, of course, that your role is to be a group therapist, constantly helping everyone to get in touch with their feelings. But by paying careful attention to long-term patterns in people's behavior, you will begin to get a sense of them as individuals even if you never meet them face-to-face. And by being sensitive to the tone of your own writing, you can have a surprising amount of influence over how others feel, to the ultimate benefit of the project.

Recognizing Rudeness

One of the defining characteristics of open source culture is its distinctive notions of what does and does not constitute rudeness. While the conventions described below are not unique to free software development, nor even to software in general—they would be familiar to anyone working in mathematics, the hard sciences, or engineering disciplines—free software, with its porous boundaries and constant influx of newcomers, is an environment where these conventions are especially likely to be encountered by people unfamiliar with them.

Let's start with the things that are not rude:

Technical criticism, even when direct and unpadded, is not rude. Indeed, it can be a form of flattery: the critic is saying, by implication, that the target is worth taking seriously, and is worth spending some time on. That is, the more viable it would have been to simply ignore someone's post, the more of a compliment it becomes to take the time to criticize it (unless the critique descends into an ad hominem attack or some other form of obvious rudeness, of course).

Blunt, unadorned questions, such as Shane's questions to me in the previously quoted email, are not rude either. Questions that in other contexts might seem cold, rhetorical, or even mocking, are often intended seriously, and have no hidden agenda other than eliciting information as quickly as possible. The famous technical support question "Is your computer plugged in?" is a classic example of this. The support person really does need to know if your computer is plugged in, and after the first few days on the job, has gotten tired of prefixing her question with polite blandishments ("I beg your pardon, I just want to ask a few simple questions to rule out some possibilities. Some of these might seem pretty basic, but bear with me..."). At this point, she doesn't bother with the padding anymore, she just asks straight out: is it plugged in or not? Equivalent questions are asked all the time on free software mailing lists. The intent is not to insult the recipient, but to quickly rule out the most obvious (and perhaps most common) explanations. Recipients who understand this and react accordingly win points for taking a broad-minded view without prompting. But recipients who react badly should not be reprimanded, either. It's just a collision of cultures, not anyone's fault. Explain amiably that your question (or criticism) had no hidden meanings; it was just meant to get (or transmit) information as efficiently as possible, nothing more.

So what is rude?

By the same principle under which detailed technical criticism is a form of flattery, failure to provide quality criticism can be a kind of insult. I don't mean simply ignoring someone's work, be it proposal, code change, new issue filing, or whatever. Unless you explicitly promised a detailed reaction in advance, it's usually okay to simply not react at all. People will assume you just didn't have time to say anything. But if you do react, don't skimp: take the time to really analyze things, provide concrete examples where appropriate, dig around in the archives to find related posts from the past, etc. Or if you don't have time to put in that kind of effort, but still need to write some sort of brief response, then state the shortcoming openly in your message ("I think there's an issue filed for this, but unfortunately didn't have time to search for it, sorry"). The main thing is to recognize the existence of the cultural norm, either by fulfilling it or by openly acknowledging that one has fallen short this time. Either way, the norm is strengthened. But failing to meet that norm, while at the same time not explaining why you failed to meet it, is like saying the topic (and those participating in it) was not worth much of your time. Better to show that your time is valuable by being terse than by being lazy.

There are many other forms of rudeness, of course, but most of them are not specific to free software development, and common sense is a good enough guide to avoid them. See also la section intitulée « Tuez l'agressivité dans l'oeuf » in Chapitre 2, Genèse d'un projet, if you haven't already.

Face

There is a region in the human brain devoted specifically to recognizing faces. It is known informally as the "fusiform face area," and its capabilities are mostly inborn, not learned. It turns out that recognizing individual people is such a crucial survival skill that we have evolved specialized hardware to do it.

Internet-based collaboration is therefore psychologically odd, because it involves tight cooperation between human beings who almost never get to identify each other by the most natural, intuitive methods: facial recognition first of all, but also sound of voice, posture, etc. To compensate for this, try to use a consistent screen name everywhere. It should be the front part of your email address (the part before the @-sign), your IRC username, your repository committer name, your issue tracker username, and so on. This name is your online "face": a short identifying string that serves some of the same purpose as your real face, although it does not, unfortunately, stimulate the same built-in hardware in the brain.

The screen name should be some intuitive permutation of your real name (mine, for example, is "kfogel"). In some situations it will be accompanied by your full name anyway, for example in mail headers:

From: "Karl Fogel" <kfogel@whateverdomain.com>

Actually, there are two things going on in that example. As mentioned earlier, the screen name matches the real name in an intuitive way. But also, the real name is real. That is, it's not some made-up appellation like:

From: "Wonder Hacker" <wonderhacker@whateverdomain.com>

There's a famous cartoon by Paul Steiner, from the July 5, 1993 issue of The New Yorker, that shows one dog logged into a computer terminal, looking down and telling another conspiratorially: "On the Internet, nobody knows you're a dog." This kind of thought probably lies behind a lot of the self-aggrandizing, meant-to-be-hip online identities people give themselves—as if calling oneself "Wonder Hacker" will actually cause people to believe one is a wonderful hacker. But the fact remains: even if no one knows you're a dog, you're still a dog. A fantastical online identity never impresses readers. Instead, it makes them think you're more into image than substance, or that you're simply insecure. Use your real name for all interactions, or if for some reason you require anonymity, then make up a name that sounds like a perfectly normal real name, and use it consistently.

In addition to keeping your online face consistent, there are some things you can do to make it more attractive. If you have an official title (e.g., "doctor", "professor", "director"), don't flaunt it, nor even mention it except when it's directly relevant to the conversation. Hackerdom in general, and free software culture in particular, tends to view title displays as exclusionary and a sign of insecurity. It's okay if your title appears as part of a standard signature block at the end of every mail you send, just don't ever use it as a tool to bolster your position in a discussion—the attempt is guaranteed to backfire. You want folks to respect the person, not the title.

Speaking of signature blocks: keep them small and tasteful, or better yet, nonexistent. Avoid large legal disclaimers tacked on to the end of every mail, especially when they express sentiments incompatible with participation in a free software project. For example, the following classic of the genre appears at the end of every post a particular user makes to a public mailing list I'm on:

IMPORTANT NOTICE

If you have received this e-mail in error or wish to read our e-mail
disclaimer statement and monitoring policy, please refer to the
statement below or contact the sender.

This communication is from Deloitte & Touche LLP.  Deloitte &
Touche LLP is a limited liability partnership registered in England
and Wales with registered number OC303675.  A list of members' names
is available for inspection at Stonecutter Court, 1 Stonecutter
Street, London EC4A 4TR, United Kingdom, the firm's principal place of
business and registered office.  Deloitte & Touche LLP is
authorised and regulated by the Financial Services Authority.

This communication and any attachments contain information which is
confidential and may also be privileged.  It is for the exclusive use
of the intended recipient(s).  If you are not the intended
recipient(s) please note that any form of disclosure, distribution,
copying or use of this communication or the information in it or in
any attachments is strictly prohibited and may be unlawful.  If you
have received this communication in error, please return it with the
title "received in error" to IT.SECURITY.UK@deloitte.co.uk then delete
the email and destroy any copies of it.

E-mail communications cannot be guaranteed to be secure or error free,
as information could be intercepted, corrupted, amended, lost,
destroyed, arrive late or incomplete, or contain viruses.  We do not
accept liability for any such matters or their consequences.  Anyone
who communicates with us by e-mail is taken to accept the risks in
doing so.

When addressed to our clients, any opinions or advice contained in
this e-mail and any attachments are subject to the terms and
conditions expressed in the governing Deloitte & Touche LLP client
engagement letter.

Opinions, conclusions and other information in this e-mail and any
attachments which do not relate to the official business of the firm
are neither given nor endorsed by it.

For someone who's just showing up to ask a question now and then, that huge disclaimer looks a bit silly but probably doesn't do any lasting harm. However, if this person wanted to participate actively in the project, that legal boilerplate would start to have a more insidious effect. It would send at least two potentially destructive signals: first, that this person doesn't have full control over his tools—he's trapped inside some corporate mailer that tacks an annoying message to the end of every email, and he hasn't got any way to route around it—and second, that he has little or no organizational support for his free software activities. True, the organization has clearly not banned him outright from posting to public lists, but it has made his posts look distinctly unwelcoming, as though the risk of letting out confidential information must trump all other priorities.

If you work for an organization that insists on adding such signature blocks to all outgoing mail, then consider getting a free email account from, for example, gmail.google.com, www.hotmail.com, or www.yahoo.com, and using that as your address for the project.

Avoiding Common Pitfalls

Don't Post Without a Purpose

A common pitfall in online project participation is to think that you have to respond to everything. You don't. First of all, there will usually be more threads going on than you can keep track of, at least after the project is past its first few months. Second, even in the threads that you have decided to engage in, much of what people say will not require a response. Development forums in particular tend to be dominated by three kinds of messages:

  1. Messages proposing something non-trivial

  2. Messages expressing support or opposition to something someone else has said

  3. Summing-up messages

None of these inherently requires a response, particularly if you can be fairly sure, based on watching the thread so far, that someone else is likely to say what you would have said anyway. (If you're worried that you'll be caught in a wait-wait loop because all the others are using this tactic too, don't be; there's almost always someone out there who'll feel like jumping into the fray.) A response should be motivated by a definite purpose. Ask yourself first: do you know what you want to accomplish? And second: will it not get accomplished unless you say something?

Two good reasons to add your voice to a thread are a) when you see a flaw in a proposal and suspect that you're the only one who sees it, and b) when you see that miscommunication is happening between others, and know that you can fix it with a clarifying post. It's also generally fine to post just to thank someone for doing something, or to say "Me too!", because a reader can tell right away that such posts do not require any response or further action, and therefore the mental effort demanded by the post ends cleanly when the reader reaches the last line of the mail. But even then, think twice before saying something; it's always better to leave people wishing you'd post more than wishing you'd post less. (See the second half of Annexe C, Why Should I Care What Color the Bikeshed Is? for more thoughts about how to behave on a busy mailing list.)

Productive vs Unproductive Threads

On a busy mailing list, you have two imperatives. One, obviously, is to figure out what you need to pay attention to and what you can ignore. The other is to behave in a way that avoids causing noise: not only do you want your own posts to have a high signal/noise ratio, you also want them to be the sorts of messages that stimulate other people to either post with a similarly high signal/noise ratio, or not post at all.

To see how to do that, let's consider the context in which it is done. What are some of the hallmarks of an unproductive thread?

  • Arguments that have been made already start being repeated, as though the poster thinks no one heard them the first time.

  • Increasing levels of hyperbole and involvement as the stakes get smaller and smaller.

  • A majority of comments coming from people who do little or nothing, while the people who tend to get things done are silent.

  • Many ideas discussed without clear proposals ever being made. (Of course, any interesting idea starts out as an imprecise vision; the important question is what direction it goes from there. Does the thread seem to be turning the vision into something more concrete, or is it spinning off into sub-visions, side-visions, and ontological disputes?)

Just because a thread is not productive at first doesn't mean it's a waste of time. It might be about an important topic, in which case the fact that it's not making any headway is all the more troublesome.

Guiding a thread toward usefulness without being pushy is an art. It won't work to simply admonish people to stop wasting their time, or to ask them not to post unless they have something constructive to say. You may, of course, think these things privately, but if you say them out loud then you will be offensive. Instead, you have to suggest conditions for further progress—give people a route, a path to follow that leads to the results you want, yet without sounding like you're dictating conduct. The distinction is largely one of tone. For example, this is bad:

This discussion is going nowhere. Can we please drop this topic until someone has a patch to implement one of these proposals? There's no reason to keep going around and around saying the same things. Code speaks louder than words, folks.

Whereas this is good:

Several proposals have been floated in this thread, but none have had all the details fleshed out, at least not enough for an up-or-down vote. Yet we're also not saying anything new now; we're just reiterating what has been said before. So the best thing at this point would probably be for further posts to contain either a complete specification for the proposed behavior, or a patch. Then at least we'd have a definite action to take (i.e., get consensus on the specification, or apply and test the patch).

Contrast the second approach with the first. The second way does not draw a line between you and the others, or accuse them of taking the discussion into a spiral. It talks about "we", which is important whether or not you actually participated in the thread before now, because it reminds everyone that even those who have been silent thus far still have a stake in the thread's outcome. It describes why the thread is going nowhere, but does so without pejoratives or judgements—it just dispassionately states some facts. Most importantly, it offers a positive course of action, so that instead of people feeling like discussion is being closed off (a restriction against which they can only be tempted to rebel), they will feel as if they're being offered a way to take the conversation to a more constructive level. This is a standard people will naturally want to meet.

You won't always want a thread to make it to the next level of constructiveness—sometimes you'll want it to just go away. The purpose of your post, then, is to make it do one or the other. If you can tell from the way the thread has gone so far that no one is actually going to take the steps you suggested, then your post effectively shuts down the thread without seeming to do so. Of course, there isn't any foolproof way to shut down a thread, and even if there were, you wouldn't want to use it. But asking participants to either make visible progress or stop posting is perfectly defensible, if done diplomatically. Be wary of quashing threads prematurely, however. Some amount of speculative chatter can be productive, depending on the topic, and asking for it to be resolved too quickly will stifle the creative process, as well as make you look impatient.

Don't expect any thread to stop on a dime. There will probably still be a few posts after yours, either because mails got crossed in the pipe, or because people want to have the last word. This is nothing to worry about, and you don't need to post again. Just let the thread peter out, or not peter out, as the case may be. You can't have complete control; on the other hand, you can expect to have a statistically significant effect across many threads.

The Softer the Topic, the Longer the Debate

Although discussion can meander in any topic, the probability of meandering goes up as the technical difficulty of the topic goes down. After all, the greater the technical difficulty, the fewer participants can really follow what's going on. Those who can are likely to be the most experienced developers, who have already taken part in such discussions thousands of times before, and know what sort of behavior is likely to lead to a consensus everyone can live with.

Thus, consensus is hardest to achieve in technical questions that are simple to understand and easy to have an opinion about, and in "soft" topics such as organization, publicity, funding, etc. People can participate in those arguments forever, because there are no qualifications necessary for doing so, no clear ways to decide (even afterward) if a decision was right or wrong, and because simply outwaiting other discussants is sometimes a successful tactic.

The principle that the amount of discussion is inversely proportional to the complexity of the topic has been around for a long time, and is known informally as the Bikeshed Effect. Here is Poul-Henning Kamp's explanation of it, from a now-famous post made to BSD developers:

It's a long story, or rather it's an old story, but it is quite short actually. C. Northcote Parkinson wrote a book in the early 1960'ies, called "Parkinson's Law", which contains a lot of insight into the dynamics of management.

[...]

In the specific example involving the bike shed, the other vital component is an atomic power-plant, I guess that illustrates the age of the book.

Parkinson shows how you can go in to the board of directors and get approval for building a multi-million or even billion dollar atomic power plant, but if you want to build a bike shed you will be tangled up in endless discussions.

Parkinson explains that this is because an atomic plant is so vast, so expensive and so complicated that people cannot grasp it, and rather than try, they fall back on the assumption that somebody else checked all the details before it got this far. Richard P. Feynmann gives a couple of interesting, and very much to the point, examples relating to Los Alamos in his books.

A bike shed on the other hand. Anyone can build one of those over a weekend, and still have time to watch the game on TV. So no matter how well prepared, no matter how reasonable you are with your proposal, somebody will seize the chance to show that he is doing his job, that he is paying attention, that he is here.

In Denmark we call it "setting your fingerprint". It is about personal pride and prestige, it is about being able to point somewhere and say "There! I did that." It is a strong trait in politicians, but present in most people given the chance. Just think about footsteps in wet cement.

(His complete post is very much worth reading, too. See Annexe C, Why Should I Care What Color the Bikeshed Is?; see also http://bikeshed.com.)

Anyone who's ever taken regular part in group decision-making will recognize what Kamp is talking about. However, it is usually impossible to persuade everyone to avoid painting bikesheds. The best you can do is point out that the phenomenon exists, when you see it happening, and persuade the senior developers—the people whose posts carry the most weight—to drop their paintbrushes early, so at least they're not contributing to the noise. Bikeshed painting parties will never go away entirely, but you can make them shorter and less frequent by spreading an awareness of the phenomenon in the project's culture.

Avoid Holy Wars

A holy war is a dispute, often but not always over a relatively minor issue, which is not resolvable on the merits of the arguments, but where people feel passionate enough to continue arguing anyway in the hope that their side will prevail. Holy wars are not quite the same as bikeshed paintings. People painting bikesheds are usually quick to jump in with an opinion (because they can), but they won't necessarily feel strongly about it, and indeed will sometimes express other, incompatible opinions, to show that they understand all sides of the issue. In a holy war, on the other hand, understanding the other sides is a sign of weakness. In a holy war, everyone knows there is One Right Answer; they just don't agree on what it is.

Once a holy war has started, it generally cannot be resolved to everyone's satisfaction. It does no good to point out, in the midst of a holy war, that a holy war is going on. Everyone knows that already. Unfortunately, a common feature of holy wars is disagreement on the very question of whether the dispute is resolvable by continued discussion. Viewed from outside, it is clear that neither side is changing the other's mind. Viewed from inside, the other side is being obtuse and not thinking clearly, but they might come around if browbeaten enough. Now, I am not saying there's never a right side in a holy war. Sometimes there is—in the holy wars I've participated in, it's always been my side, of course. But it doesn't matter, because there's no algorithm for convincingly demonstrating that one side or the other is right.

A common, but unsatisfactory, way people try to resolve holy wars is to say "We've already spent far more time and energy discussing this than it's worth! Can we please just drop it?" There are two problems with this. First, that time and energy has already been spent and can never be recovered—the only question now is, how much more effort remains? If some people feel that just a little more discussion will bring the issue to a close, then it still makes sense (from their point of view) to continue.

The other problem with asking for the matter to be dropped is that this is often equivalent to allowing one side, the status quo, to declare victory by inaction. And in some cases, the status quo is known to be unacceptable anyway: everyone agrees that some decision must be made, some action taken. Dropping the subject would be worse for everyone than simply giving up the argument would be for anyone. But since that dilemma applies to all equally, it's still possible to end up arguing forever about what to do.

So how should you handle holy wars?

The first answer is, try to set things up so they don't happen. This is not as hopeless as it sounds:

You can anticipate certain standard holy wars: they tend to come up over programming languages, licenses (see la section intitulée « The GPL and License Compatibility » in Chapitre 9, Licenses, Copyrights, and Patents), reply-to munging (see la section intitulée « Le grand débat du « Répondre à » » in Chapitre 3, L'infrastructure technique), and a few other topics. Each project usually has a holy war or two all its own, as well, which longtime developers will quickly become familiar with. The techniques for stopping holy wars, or at least limiting their damage, are pretty much the same everywhere. Even if you are positive your side is right, try to find some way to express sympathy and understanding for the points the other side is making. Often the problem in a holy war is that because each side has built its walls as high as possible, and made it clear that any other opinion is sheer foolishness, the act of surrendering or changing one's mind becomes psychologically unbearable: it would be an admission not just of being wrong, but of having been certain and still being wrong. The way you can make this admission palatable for the other side is to express some uncertainty yourself—precisely by showing that you understand the arguments they are making and find them at least sensible, if not finally persuasive. Make a gesture that provides space for a reciprocal gesture, and usually the situation will improve. You are no more or less likely to get the technical result you wanted, but at least you can avoid unnecessary collateral damage to the project's morale.

When a holy war can't be avoided, decide early how much you care, and then be willing to publicly give up. When you do so, you can say that you're backing out because the holy war isn't worth it, but don't express any bitterness and don't take the opportunity for a last parting shot at the opposing side's arguments. Giving up is effective only when done gracefully.

Programming language holy wars are a bit of a special case, because they are often highly technical, yet many people feel qualified to take part in them, and the stakes are very high, since the result may determine what language a good portion of the project's code is written in. The best solution is to choose the language early, with buy-in from influential initial developers, and then defend it on the grounds that it's what you are all comfortable writing in, not on the grounds that it's better than some other language that could have been used instead. Never let the conversation degenerate into an academic comparison of programming languages (this seems to happen especially often when someone brings up Perl, for some reason); that's a death topic that you must simply refuse to be drawn into.

For more historical background on holy wars, see http://catb.org/~esr/jargon/html/H/holy-wars.html, and the paper by Danny Cohen that popularized the term, http://www.ietf.org/rfc/ien/ien137.txt.

The "Noisy Minority" Effect

In any mailing list discussion, it's easy for a small minority to give the impression that there is a great deal of dissent, by flooding the list with numerous lengthy emails. It's a bit like a filibuster, except that the illusion of widespread dissent is even more powerful, because it's divided across an arbitrary number of discrete posts and most people won't bother to keep track of who said what, when. They'll just have an instinctive impression that the topic is very controversial, and wait for the fuss to die down.

The best way to counteract this effect is to point it out very clearly and provide supporting evidence showing how small the actual number of dissenters is, compared to those in agreement. In order to increase the disparity, you may want to privately poll people who have been mostly silent, but who you suspect would agree with the majority. Don't say anything that suggests the dissenters were deliberately trying to inflate the impression they were making. Chances are they weren't, and even if they were, there would be no strategic advantage to pointing it out. All you need do is show the actual numbers in a side-by-side comparison, and people will realize that their intuition of the situation does not match reality.

This advice doesn't just apply to issues with clear for-and-against positions. It applies to any discussion where a fuss is being made, but it's not clear that most people consider the issue under discussion to be a real problem. After a while, if you agree that the issue is not worthy of action, and can see that it has failed to get much traction (even if it has generated a lot of mails), you can just observe publicly that it's not getting traction. If the "Noisy Minority" effect has been at work, your post will seem like a breath of fresh air. Most people's impression of the discussion up to that point will have been somewhat murky: "Huh, it sure feels like there's some big deal here, because there sure are a lot of posts, but I can't see any clear progress happening." By explaining how the form of the discussion made it appear more turbulent than it really was, you retrospectively give it a new shape, through which people can recast their understanding of what transpired.

Difficult People

Difficult people are no easier to deal with in electronic forums than they are in person. By "difficult" I don't mean "rude". Rude people are annoying, but they're not necessarily difficult. This book has already discussed how to handle them: comment on the rudeness the first time, and from then on, either ignore them or treat them the same as anyone else. If they continue being rude, they will usually make themselves so unpopular as to have no influence on others in the project, so they are a self-containing problem.

The really difficult cases are people who are not overtly rude, but who manipulate or abuse the project's processes in a way that ends up costing other people time and energy, yet do not bring any benefit to the project. Such people often look for wedgepoints in the project's procedures, to give themselves more influence than they might otherwise have. This is much more insidious than mere rudeness, because neither the behavior nor the damage it causes is apparent to casual observers. A classic example is the filibuster, in which someone (always sounding as reasonable as possible, of course) keeps claiming that the matter under discussion is not ready for resolution, and offers more and more possible solutions, or new viewpoints on old solutions, when what is really going on is that he senses that a consensus or a ballot is about to form, and doesn't like where it is probably headed. Another example is when there's a debate that won't converge on consensus, but the group tries to at least clarify the points of disagreement and produce a summary for everyone to refer to from then on. The obstructionist, who knows the summary may lead to a result he doesn't like, will often try to delay even the summary, by relentlessly complicating the question of what should be in it, either by objecting to reasonable suggestions or by introducing unexpected new items.

Handling Difficult People

To counteract such behavior, it helps to understand the mentality of those who engage in it. People generally do not do it consciously. No one wakes up in the morning and says to himself: "Today I'm going to cynically manipulate procedural forms in order to be an irritating obstructionist." Instead, such actions are often preceded by a semi-paranoid feeling of being shut out of group interactions and decisions. The person feels he is not being taken seriously, or (in the more severe cases) that there is almost a conspiracy against him—that the other project members have decided to form an exclusive club, of which he is not a member. This then justifies, in his mind, taking rules literally and engaging in a formal manipulation of the project's procedures, in order to make everyone else take him seriously. In extreme cases, the person can even believe that he is fighting a lonely battle to save the project from itself.

It is the nature of such an attack from within that not everyone will notice it at the same time, and some people may not see it at all unless presented with very strong evidence. This means that neutralizing it can be quite a bit of work. It's not enough to persuade yourself that it's happening; you have to marshal enough evidence to persuade others too, and then you have to distribute that evidence in a thoughtful way.

Given that it's so much work to fight, it's often better just to tolerate it for a while. Think of it like a parasitic but mild disease: if it's not too debilitating, the project can afford to remain infected, and medicine might have harmful side effects. However, if it gets too damaging to tolerate, then it's time for action. Start gathering notes on the patterns you see. Make sure to include references to public archives—this is one of the reasons the project keeps records, so you might as well use them. Once you've got a good case built, start having private conversations with other project participants. Don't tell them what you've observed; instead, first ask them what they've observed. This may be your last chance to get unfiltered feedback about how others see the troublemaker's behavior; once you start openly talking about it, opinion will become polarized and no one will be able to remember what he formerly thought about the matter.

If private discussions indicate that at least some others see the problem too, then it's time to do something. That's when you have to get really cautious, because it's very easy for this sort of person to try to make it appear as though you're picking on them unfairly. Whatever you do, never accuse them of maliciously abusing the project's procedures, of being paranoid, or, in general, of any of the other things that you suspect are probably true. Your strategy should be to look both more reasonable and more concerned with the overall welfare of the project, with the goal of either reforming the person's behavior, or getting them to go away permanently. Depending on the other developers, and your relationship with them, it may be advantageous to gather allies privately first. Or it may not; that might just create ill will behind the scenes, if people think you're engaging in an improper whispering campaign.

Remember that although the other person may be the one behaving destructively, you will be the one who appears destructive if you make a public charge that you can't back up. Be sure to have plenty of examples to demonstrate what you're saying, and say it as gently as possible while still being direct. You may not persuade the person in question, but that's okay as long as you persuade everyone else.

Case study

I remember only one situation, in more than 10 years of working in free software, where things got so bad that we actually had to ask someone to stop posting altogether. As is so often the case, he was not rude, and sincerely wanted only to be helpful. He just didn't know when to post and when not to post. Our lists were open to the public, and he was posting so often, and asking questions on so many different topics, that it was getting to be a noise problem for the community. We'd already tried asking him nicely to do a little more research for answers before posting, but that had no effect.

The strategy that finally worked is a perfect example of how to build a strong case on neutral, quantitative data. One of our developers did some digging in the archives, and then sent the following message privately to a few developers. The offender (the third name on the list below, shown here as "J. Random") had very little history with the project, and had contributed no code or documentation. Yet he was the third most active poster on the mailing lists:

From: "Brian W. Fitzpatrick" <fitz@collab.net>
To: [... recipient list omitted for anonymity ...]
Subject: The Subversion Energy Sink
Date: Wed, 12 Nov 2003 23:37:47 -0600

In the last 25 days, the top 6 posters to the svn [dev|users] list have
been:

    294  kfogel@collab.net
    236  "C. Michael Pilato" <cmpilato@collab.net>
    220  "J. Random" <jrandom@problematic-poster.com>
    176  Branko Čibej <brane@xbc.nu>
    130  Philip Martin <philip@codematters.co.uk>
    126  Ben Collins-Sussman <sussman@collab.net>

I would say that five of these people are contributing to Subversion
hitting 1.0 in the near future.

I would also say that one of these people is consistently drawing time
and energy from the other 5, not to mention the list as a whole, thus
(albeit unintentionally) slowing the development of Subversion.  I did
not do a threaded analysis, but vgrepping my Subversion mail spool tells
me that every mail from this person is responded to at least once by at
least 2 of the other 5 people on the above list.

I think some sort of radical intervention is necessary here, even if we
do scare the aforementioned person away.  Niceties and kindness have
already proven to have no effect.

dev@subversion is a mailing list to facilitate development of a version
control system, not a group therapy session.

-Fitz, attempting to wade through three days of svn mail that he let
 pile up

Though it might not seem so at first, J. Random's behavior was a classic case of abusing project procedures. He wasn't doing something obvious like trying to filibuster a vote, but he was taking advantage of the mailing list's policy of relying on self-moderation by its members. We left it to each individual's judgement when to post and on what topics. Thus, we had no procedural recourse for dealing with someone who either did not have, or would not exercise, such judgement. There was no rule one could point to and say the fellow was violating it, yet everyone knew that his frequent posting was getting to be a serious problem.

Fitz's strategy was, in retrospect, masterful. He gathered damning quantitative evidence, but then distributed it discreetly, sending it first to a few people whose support would be key in any drastic action. They agreed that some sort of action was necessary, and in the end we called J. Random on the phone, described the problem to him directly, and asked him to simply stop posting. He never really did understand the reasons why; if he had been capable of understanding, he probably would have exercised appropriate judgement in the first place. But he agreed to stop posting, and the mailing lists became useable again. Part of the reason this strategy worked was, perhaps, the implicit threat that we could start restricting his posts via the moderation software normally used for preventing spam (see la section intitulée « Se prémunir du spam » in Chapitre 3, L'infrastructure technique). But the reason we were able to have that option in reserve was that Fitz had gathered the necessary support from key people first.

Handling Growth

The price of success is heavy in the open source world. As your software gets more popular, the number of people who show up looking for information increases dramatically, while the number of people able to provide information increases much more slowly. Furthermore, even if the ratio were evenly balanced, there is still a fundamental scalability problem with the way most open source projects handle communications. Consider mailing lists, for example. Most projects have a mailing list for general user questions—sometimes the list's name is "users", "discuss", "help", or something else. Whatever its name, the purpose of the list is always the same: to provide a place where people can get their questions answered, while others watch and (presumably) absorb knowledge from observing these exchanges.

These mailing lists work very well up to a few thousand users and/or a couple of hundred posts a day. But somewhere after that, the system starts to break down, because every subscriber sees every post; if the number of posts to the list begins to exceed what any individual reader can process in a day, the list becomes a burden to its members. Imagine, for instance, if Microsoft had such a mailing list for Windows XP. Windows XP has hundreds of millions of users; if even one-tenth of one percent of them had questions in a given twenty-four hour period, then this hypothetical list would get hundreds of thousands of posts per day! Such a list could never exist, of course, because no one would stay subscribed to it. This problem is not limited to mailing lists; the same logic applies to IRC channels, online discussion forums, indeed to any system in which a group hears questions from individuals. The implications are ominous: the usual open source model of massively parallelized support simply does not scale to the levels needed for world domination.

There will be no explosion when forums reach the breaking point. There is just a quiet negative feedback effect: people unsubscribe from the lists, or leave the IRC channel, or at any rate stop bothering to ask questions, because they can see they won't be heard in all the noise. As more and more people make this highly rational choice, the forum's activity will seem to stay at a manageable level. But it is staying manageable precisely because the rational (or at least experienced) people have started looking elsewhere for information—while the inexperienced people stay behind and continue posting. In other words, one side effect of continuing to use unscalable communications models as the project grows is that the average quality of both questions and answers tends to go down, which makes it look like new users are dumber than they used to be, when in fact they're probably not. It's just that the benefit/cost ratio of using those high-population forums goes down, so naturally those with the experience to do so start to look elsewhere for answers first. Adjusting communications mechanisms to cope with project growth therefore involves two related strategies:

  1. Recognizing when particular parts of a forum are not suffering unbounded growth, even if the forum as a whole is, and separating those parts off into new, more specialized forums (i.e., don't let the good be dragged down by the bad).

  2. Making sure there are many automated sources of information available, and that they are kept organized, up-to-date, and easy to find.

Strategy (1) is usually not too hard. Most projects start out with one main forum: a general discussion mailing list, on which feature ideas, design questions, and coding problems can all be hashed out. Everyone involved with the project is on the list. After a while, it usually becomes clear that the list has evolved into several distinct topic-based sublists. For example, some threads are clearly about development and design; others are user questions of the "How do I do X?" variety; maybe there's a third topic family centered around processing bug reports and enhancement requests; and so on. A given individual, of course, might participate in many different thread types, but the important thing is that there is not a lot of overlap between the types themselves. They could be divided into separate lists without causing any harmful balkanization, because the threads rarely cross topic boundaries.

Actually doing this division is a two-step process. You create the new list (or IRC channel, or whatever it is to be), and then you spend whatever time is necessary gently nagging and reminding people to use the new forums appropriately. That latter step can last for weeks, but eventually people will get the idea. You simply have to make a point of always telling the sender when a post is sent to the wrong destination, and do so visibly, so that other people are encouraged to help out with routing. It's also useful to have a web page providing a guide to all the lists available; your responses can simply reference that web page and, as a bonus, the recipient may learn something about looking for guidelines before posting.

Strategy (2) is an ongoing process, lasting the lifetime of the project and involving many participants. Of course it is partly a matter of having up-to-date documentation (see la section intitulée « La documentation » in Chapitre 2, Genèse d'un projet) and making sure to point people there. But it is also much more than that; the sections that follow discuss this strategy in detail.

Conspicuous Use of Archives

Typically, all communications in an open source project (except sometimes IRC conversations) are archived. The archives are public and searchable, and have referential stability: that is, once a given piece of information is recorded at a particular address, it stays at that address forever.

Use those archives as much as possible, and as conspicuously as possible. Even when you know the answer to some question off the top of your head, if you think there's a reference in the archives that contains the answer, spend the time to dig it up and present it. Every time you do that in a publicly visible way, some people learn for the first time that the archives are there, and that searching in them can produce answers. Also, by referring to the archives instead of rewriting the advice, you reinforce the social norm against duplicating information. Why have the same answer in two different places? When the number of places it can be found is kept to a minimum, people who have found it before are more likely to remember what to search for to find it again. Well-placed references also contribute to the quality of search results in general, because they strengthen the targeted resource's ranking in Internet search engines.

There are times when duplicating information makes sense, however. For example, suppose there's a response already in the archives, not from you, saying:

It appears that your Scanley indexes have become frobnicated.  To
unfrobnicate them, run these steps:

1. Shut down the Scanley server.
2. Run the 'defrobnicate' program that ships with Scanley.
3. Start up the server.

Then, months later, you see another post indicating that someone's indexes have become frobnicated. You search the archives and come up with the old response above, but you realize it's missing some steps (perhaps by mistake, or perhaps because the software has changed since that post was written). The classiest way to handle this is to post a new, more complete set of instructions, and explicitly obsolete the old post by mentioning it:

It appears that your Scanley indexes have become frobnicated.  We
saw this problem back in July, and J. Random posted a solution at
http://blahblahblah/blah.  Below is a more complete description of
how to unfrobnicate your indexes, based on J. Random's instructions
but extending them a bit:

1. Shut down the Scanley server.
2. Become the user the Scanley server normally runs as.
3. As that user, run the 'defrobnicate' program on the indexes.
4. Run Scanley by hand to see if the indexes work now.
5. Restart the server.

(In an ideal world, it would be possible to attach a note to the old post, saying that there is newer information available and pointing to the new post. However, I don't know of any archiving software that offers an "obsoleted by" feature, perhaps because it would be mildly tricky to implement in a way that doesn't violate the archives' integrity as a verbatim record. This is another reason why creating dedicated web pages with answers to common questions is a good idea.)

Archives are probably most often searched for answers to technical questions, but their importance to the project goes well beyond that. If a project's formal guidelines are its statutory law, the archives are its common law: a record of all decisions made and how they were arrived at. In any recurring discussion, it's pretty much obligatory nowadays to start with an archive search. This allows you to begin the discussion with a summary of the current state of things, anticipate objections, prepare rebuttals, and possibly discover angles you hadn't thought of. Also, the other participants will expect you to have done an archive search. Even if the previous discussions went nowhere, you should include pointers to them when you re-raise the topic, so people can see for themselves a) that they went nowhere, and b) that you did your homework, and therefore are probably saying something now that has not been said before.

Treat all resources like archives

All of the preceding advice applies to more than just mailing list archives. Having particular pieces of information at stable, conveniently findable addresses should be an organizing principle for all of the project's information. Let's take the project FAQ as a case study.

How do people use a FAQ?

  1. They want to search in it for specific words and phrases.

  2. They want to browse it, soaking up information without necessarily looking for answers to specific questions.

  3. They expect search engines such as Google to know about the FAQ's content, so that searches can result in FAQ entries.

  4. They want to be able to refer other people directly to specific items in the FAQ.

  5. They want to be able to add new material to the FAQ, but note that this happens much less often than answers are looked up—FAQs are far more often read from than written to.

Point 1 implies that the FAQ should be available in some sort of textual format. Points 2 and 3 imply that the FAQ should be available as an HTML page, with point 2 additionally indicating that the HTML should be designed for readability (i.e., you'll want some control over its look and feel), and should have a table of contents. Point 4 means that each individual entry in the FAQ should be assigned an HTML named anchor, a tag that allows people to reach a particular location on the page. Point 5 means the source files for the FAQ should be available in a convenient way (see la section intitulée « Version everything » in Chapitre 3, L'infrastructure technique), in a format that's easy to edit.

Formatting the FAQ like this is just one example of how to make a resource presentable. The same properties—direct searchability, availability to major Internet search engines, browsability, referential stability, and (where applicable) editability—apply to other web pages, the source code tree, the bug tracker, etc. It just happens that most mailing list archiving software long ago recognized the importance of these properties, which is why mailing lists tend to have this functionality natively, while other formats may require some extra effort on the maintainer's part (Chapitre 8, Managing Volunteers discusses how to spread that maintenance burden across many volunteers).

Codifying Tradition

As a project acquires history and complexity, the amount of data each incoming participant must absorb increases. Those who have been with the project a long time were able to learn, and invent, the project's conventions as they went along. They will often not be consciously aware of what a huge body of tradition has accumulated, and may be surprised at how many missteps recent newcomers seem to make. Of course, the issue is not that the newcomers are of any lower quality than before; it's that they face a bigger acculturation burden than newcomers did in the past.

The traditions a project accumulates are as much about how to communicate and preserve information as they are about coding standards and other technical minutae. We've already looked at both sorts of standards, in la section intitulée « La documentation développeurs » in Chapitre 2, Genèse d'un projet and la section intitulée « Writing It All Down » in Chapitre 4, Social and Political Infrastructure respectively, and examples are given there. What this section is about is how to keep such guidelines up-to-date as the project evolves, especially guidelines about how communications are managed, because those are the ones that change the most as the project grows in size and complexity.

First, watch for patterns in how people get confused. If you see the same situations coming up over and over, especially with new participants, chances are there is a guideline that needs to be documented but isn't. Second, don't get tired of saying the same things over and over again, and don't sound like you're tired of saying them. You and other project veterans will have to repeat yourselves often; this is an inevitable side effect of the arrival of newcomers.

Every web page, every mailing list message, and every IRC channel should be considered advertising space—not for commercial advertisements, but for ads about your project's own resources. What you put in that space depends on the demographics of those likely to read it. An IRC channel for user questions, for example, is likely to get people who have never interacted with the project before—often someone who has just installed the software, and has a question he'd like answered immediately (after all, if it could wait, he'd have sent it to a mailing list instead, which would probably use less of his total time, although it would take longer for an answer to come back). People usually don't make a permanent investment in the IRC channel; they'll show up, ask their question, and leave.

Therefore, the channel topic should be aimed at people looking for technical answers about the software right now, rather than at, say, people who might get involved with the project in a long term way and for whom community interaction guidelines might be more appropriate. Here's how a really busy channel handles it (compare this with the earlier example in la section intitulée « IRC / Real-Time Chat Systems » in Chapitre 3, L'infrastructure technique):

You are now talking on #linuxhelp

Topic for #linuxhelp is Please READ
http://www.catb.org/~esr/faqs/smart-questions.html &&
http://www.tldp.org/docs.html#howto BEFORE asking questions | Channel
rules are at http://www.nerdfest.org/lh_rules.html | Please consult
http://kerneltrap.org/node/view/799 before asking about upgrading to a
2.6.x kernel | memory read possible: http://tinyurl.com/4s6mc ->
update to 2.6.8.1 or 2.4.27 | hash algo disaster: http://tinyurl.com/6w8rf
| reiser4 out

With mailing lists, the "ad space" is a tiny footer appended to every message. Most projects put subscription/unsubscription instructions there, and perhaps a pointer to the project's home page or FAQ page as well. You might think that anyone subscribed to the list would know where to find those things, and they probably do—but many more people than just subscribers see those mailing list messages. An archived post may be linked to from many places; indeed, some posts become so widely known that they eventually have more readers off the list than on it.

Formatting can make a big difference. For example, in the Subversion project, we were having limited success using the bug-filtering technique described in la section intitulée « Pre-Filtering the Bug Tracker » in Chapitre 3, L'infrastructure technique. Many bogus bug reports were still being filed by inexperienced people, and each time it happened, the filer had to be educated in exactly the same way as the 500 people before him. One day, after one of our developers had finally gotten to the end of his rope and flamed some poor user who didn't read the issue tracker guidelines carefully enough, another developer decided this pattern had gone on long enough. He suggested that we reformat the issue tracker front page so that the most important part, the injunction to discuss the bug on the mailing lists or IRC channels before filing, would stand out in huge, bold red letters, on a bright yellow background, centered prominently above everything else on the page. We did so (you can see the results at http://subversion.tigris.org/project_issues.html), and the result was a noticeable drop in the rate of bogus issue filings. We still get them, of course—we always will—but the rate has slowed considerably, even as the number of users increases. The outcome is not only that the bug database contains less junk, but that those who respond to issue filings stay in a better mood, and are more likely to remain friendly when responding to one of the now-rare bogus filings. This improves both the project's image and the mental health of its volunteers.

The lesson for us was that merely writing up the guidelines was not enough. We also had to put them where they'd be seen by those who need them most, and format them in such a way that their status as introductory material would be immediately clear to people unfamiliar with the project.

Static web pages are not the only venue for advertising the project's customs. A certain amount of interactive policing (in the friendly-reminder sense, not the handcuffs-and-jail sense) is also required. All peer review, even the commit reviews described in la section intitulée « Pratiquez la revue par pairs » in Chapitre 2, Genèse d'un projet, should include review of people's conformance or non-conformance with project norms, especially with regard to communications conventions.

Another example from the Subversion project: we settled on a convention of "r12908" to mean "revision 12908 in the version control repository." The lower-case "r" prefix is easy to type, and because it's half the height of the digits, it makes an easily-recognizable block of text when combined with the digits. Of course, settling on the convention doesn't mean that everyone will begin using it consistently right away. Thus, when a commit mail comes in with a log message like this:

------------------------------------------------------------------------
r12908 | qsimon | 2005-02-02 14:15:06 -0600 (Wed, 02 Feb 2005) | 4 lines

Patch from J. Random Contributor <jrcontrib@gmail.com>

* trunk/contrib/client-side/psvn/psvn.el:
  Fixed some typos from revision 12828.
------------------------------------------------------------------------

...part of reviewing that commit is to say "By the way, please use 'r12828', not 'revision 12828' when referring to past changes." This isn't just pedantry; it's important as much for automatic parsability as for human readership.

By following the general principle that there should be canonical referral methods for common entities, and that these referral methods should be used consistently everywhere, the project in effect exports certain standards. Those standards enable people to write tools that present the project's communications in more useable ways—for example, a revision formatted as "r12828" could be transformed into a live link into the repository browsing system. This would be harder to do if the revision were written as "revision 12828", both because that form could be divided across a line break, and because it's less distinct (the word "revision" will often appear alone, and groups of numbers will often appear alone, whereas the combination "r12828" can only mean a revision number). Similar concerns apply to issue numbers, FAQ items (hint: use a URL with a named anchor, as described in Named Anchors and ID Attributes), etc.

Even for entities where there is not an obvious short, canonical form, people should still be encouraged to provide key pieces of information consistently. For example, when referring to a mailing list message, don't just give the sender and subject; also give the archive URL and the Message-ID header. The last allows people who have their own copy of the mailing list (people sometimes keep offline copies, for example to use on a laptop while traveling) to unambiguously identify the right message even if they don't have access to the archives. The sender and subject wouldn't be enough, because the same person might make several posts in the same thread, even on the same day.

The more a project grows, the more important this sort of consistency becomes. Consistency means that everywhere people look, they see the same patterns being followed, so they know to follow those patterns themselves. This, in turn, reduces the number of questions they need to ask. The burden of having a million readers is no greater than that of having one; scalability problems start to arise only when a certain percentage of those readers ask questions. As a project grows, therefore, it must reduce that percentage by increasing the density and accessibility of information, so that any given person is more likely to find what he needs without having to ask.

No Conversations in the Bug Tracker

In any project that's making active use of its bug tracker, there is always a danger of the tracker turning into a discussion forum itself, even though the mailing lists would really be better. Usually it starts off innocently enough: someone annotates an issue with, say, a proposed solution, or a partial patch. Someone else sees this, realizes there are problems with the solution, and attaches another annotation pointing out the problems. The first person responds, again by appending to the issue...and so it goes.

The problem with this is, first, that the bug tracker is a pretty cumbersome place to have a discussion, and second, that other people may not be paying attention—after all, they expect development discussion to happen on the development mailing list, so that's where they look for it. They may not be subscribed to the issue changes list at all, and even if they are, they may not follow it very closely.

But exactly where in the process did something go wrong? Was it when the original person attached her solution to the issue—should she have posted it to the list instead? Or was it when the second person responded in the issue, instead of on the list?

There isn't one right answer, but there is a general principle: if you're just adding data to an issue, then do it in the tracker, but if you're starting a conversation, then do it on the mailing list. You may not always be able to tell which is the case, but just use your best judgement. For example, when attaching a patch that contains a potentially controversial solution, you might be able to anticipate that people are going to have questions about it. So even though you would normally attach the patch to the issue (assuming you don't want to or can't commit the change directly), in this case you might choose to post it to a mailing list instead. At any rate, there eventually will come a point in the exchange where one party or the other can tell that it is about to go from mere appending of data to an actual conversation—in the example that started this section, that would be the second respondent, who on realizing that there were problems with the patch, could predict that a real conversation is about to ensue, and therefore that it should be held in the appropriate medium.

To use a mathematical analogy, if the information looks like it will be quickly convergent, then put it directly in the bug tracker; if it looks like it will be divergent, then a mailing list or IRC channel would be a better place.

This doesn't mean there should never be any exchanges in the bug tracker. Asking for more details of the reproduction recipe from the original reporter tends to be a highly convergent process, for instance. The person's response is unlikely to raise new issues; it's simply going to flesh out information already filed. There's no need to distract the mailing list with that process; by all means, take care of it with a series of comments in the tracker. Likewise, if you're fairly sure that the bug has been misreported (i.e., is not a bug), then you can simply say so right in the issue. Even pointing out a minor problem with a proposed solution is fine, assuming the problem is not a showstopper for the entire solution.

On the other hand, if you're raising philosophical issues about the bug's scope or the software's proper behavior, you can be pretty sure other developers will want to be involved. The discussion is likely to diverge for a while before it converges, so do it on the mailing list.

Always link to the mailing list thread from the issue, when you choose to post to the mailing list. It's still important for someone following the issue to be able to reach the discussion, even if the issue itself isn't the forum of discussion. The person who starts the thread may find this laborious, but open source is fundamentally a writer-responsible culture: it's much more important to make things easy for the tens or hundreds of people who may read the bug than for the three or five people writing about it.

It's fine to take important conclusions or summaries from the list discussion and paste them into the issue, if that will make things convenient for readers. A common idiom is to start a list discussion, put a link to the thread in the issue, and then when the discussion finishes, paste the final summary into the issue (along with a link to the message containing that summary), so someone browsing the issue can easily see what conclusion was reached without having to click to somewhere else. Note that the usual "two masters" data duplication problem does not exist here, because both archives and issue comments are usually static, unchangeable data anyway.

Publicity

In free software, there is a fairly smooth continuum between purely internal discussions and public relations statements. This is partly because the target audience is always ill-defined: given that most or all posts are publicly accessible, the project doesn't have full control over the impression the world gets. Someone—say, a slashdot.org editor—may draw millions of readers' attention to a post that no one ever expected to be seen outside the project. This is a fact of life that all open source projects live with, but in practice, the risk is usually small. In general, the announcements that the project most wants publicized are the ones that will be most publicized, assuming you use the right mechanisms to indicate relative newsworthiness to the outside world.

For major announcements, there tend to be four or five main channels of distribution, on which announcements should be made as nearly simultaneously as possible:

  1. Your project's front page is probably seen by more people than any other part of the project. If you have a really major announcement, put a blurb there. The blurb should be a very brief synopsis that links to the press release (see below) for more information.

  2. At the same time, you should also have a "News" or "Press Releases" area of the web site, where the announcement can be written up in detail. Part of the purpose of a press release is to provide a single, canonical "announcement object" that other sites can link to, so make sure it is structured accordingly: either as one web page per release, as a discrete blog entry, or as some other kind of entity that can be linked to while still being kept distinct from other press releases in the same area.

  3. If your project has an RSS feed, make sure the announcement goes out there too. This may happen automatically when you create the press release, depending on how things are set up at your site. (RSS is a mechanism for distributing meta-data-rich news summaries to "subscribers", that is, people who have indicated an interest in receiving those summaries. See http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html for more information about RSS.)

  4. If the announcement is about a new release of the software, then update your project's entry on http://freshmeat.net/ (see la section intitulée « Annoncer le projet » about creating the entry in the first place). Every time you update a Freshmeat entry, that entry goes onto the Freshmeat change list for the day. The change list is updated not only on Freshmeat itself, but on various portal sites (including slashdot.org) which are watched eagerly by hordes of people. Freshmeat also offers the same data via RSS feed, so people who are not subscribed to your project's own RSS feed might still see the announcement via Freshmeat's.

  5. Send a mail to your project's announcement mailing list. This list's name should actually be "announce", that is, announce@yourprojectdomain.org, because that's a fairly standard convention now, and the list's charter should make it clear that it is very low-traffic, reserved for major project announcements. Most of those announcements will be about new releases of the software, but occasionally other events, such as a fundraising drive, the discovery of a security vulnerability (see la section intitulée « Announcing Security Vulnerabilities ») later in this chapter, or a major shift in project direction may be posted there as well. Because it is low traffic and used only for important things, the announce list typically has the highest subscribership of any mailing list in the project (of course, this means you shouldn't abuse it—consider carefully before posting). To avoid random people making announcements, or worse, spam getting through, the announce list must always be moderated.

Try to make the announcements in all these places at the same time, as nearly as possible. People might get confused if they see an announcement on the mailing list but then don't see it reflected on the project's home page or in its press releases area. If you get the various changes (emails, web page edits, etc.) queued up and then send them all in a row, you can keep the window of inconsistency very small.

For a less important event, you can eliminate some or all of the above outlets. The event will still be noticed by the outside world in direct proportion to its importance. For example, while a new release of the software is a major event, merely setting the date of the next release, while still somewhat newsworthy, is not nearly as important as the release itself. Setting a date is worth an email to the daily mailing lists (not the announce list), and an update of the project's timeline or status web page, but no more.

However, you might still see that date appearing in discussions elsewhere on the Internet, wherever there are people interested in the project. People who are lurkers on your mailing lists, just listening and never saying anything, are not necessarily silent elsewhere. Word of mouth gives very broad distribution; you should count on it, and construct even minor announcements in such a way as to encourage accurate informal transmission. Specifically, posts that you expect to be quoted should have a clearly meant-to-be-quoted portion, just as though you were writing a formal press release. For example:

Just a progress update: we're planning to release version 2.0 of Scanley in mid-August 2005. You can always check http://www.scanley.org/status.html for updates. The major new feature will be regular-expression searches.

Other new features include: ... There will also be various bugfixes, including: ...

The first paragraph is short, gives the two most important pieces of information (the release date and the major new feature), and a URL to visit for further news. If that paragraph is the only thing that crosses someone's screen, you're still doing pretty well. The rest of the mail could be lost without affecting the gist of the content. Of course, sometimes people will link to the entire mail anyway, but just as often, they'll quote only a small part. Given that the latter is a possibility, you might as well make it easy for them, and in the bargain get some influence over what gets quoted.

Announcing Security Vulnerabilities

Handling a security vulnerability is different from handling any other kind of bug report. In free software, doing things openly and transparently is normally almost a religious credo. Every step of the standard bug-handling process is visible to all who care to watch: the arrival of the initial report, the ensuing discussion, and the eventual fix.

Security bugs are different. They can compromise users' data, and possibly users' entire computers. To discuss such a problem openly would be to advertise its existence to the entire world—including to all the parties who might make malicious use of the bug. Even merely committing a fix effectively announces the bug's existence (there are potential attackers who watch the commit logs of public projects, systematically looking for changes that indicate security problems in the pre-change code). Most open source projects have settled on approximately the same set of steps to handle this conflict between openness and secrecy, based on the these basic guidelines:

  1. Don't talk about the bug publicly until a fix is available; then supply the fix at exactly the same moment you announce the bug.

  2. Come up with that fix as fast as you can—especially if someone outside the project reported the bug, because then you know there's at least one person outside the project who is able to exploit the vulnerability.

In practice, those principles lead to a fairly standardized series of steps, which are described in the sections below.

Receive the report

Obviously, a project needs the ability to receive security bug reports from anyone. But the regular bug reporting address won't do, because it can be watched by anyone too. Therefore, have a separate mailing list for receiving security bug reports. That mailing list must not have publicly readable archives, and its subscribership must be strictly controlled—only long-time, trusted developers can be on the list. If you need a formal definition of "trusted", you can use "anyone who has had commit access for two years or more" or something like that, to avoid favoritism. This is the group that will handle security bugs.

Ideally, the security list should not be spam-protected or moderated, since you don't want an important report to get filtered out or delayed just because no moderators happened to be online that weekend. If you do use automated spam-protection software, try to configure it with high-tolerance settings; it's better to let a few spams through than to miss a report. For the list to be effective, you must advertise its address, of course; but given that it will be unmoderated and, at most, lightly spam-protected, try to never to post its address without some sort of address hiding transformation, as described in la section intitulée « Masquer les adresses dans les archives » in Chapitre 3, L'infrastructure technique. Fortunately, address-hiding need not make the address illegible; see http://subversion.tigris.org/security.html, and view that page's HTML source, for an example.

Develop the fix quietly

So what does the security list do when it receives a report? The first task is to evaluate the problem's severity and urgency:

  1. How serious is the vulnerability? Does it allow a malicious attacker to take over the computer of someone who uses your software? Or does it, say, merely leak information about the sizes of some of their files?

  2. How easy is it to exploit the vulnerability? Can an attack be scripted, or does it require circumstantial knowledge, educated guessing, and luck?

  3. Who reported the problem to you? The answer to this question doesn't change the nature of the vulnerability, of course, but it does give you an idea of how many other people might know about it. If the report comes from one of the project's own developers, you can breathe a little easier (but only a little), because you can trust them not to have told anyone else about it. On the other hand, if it came in an email from anonymous14@globalhackerz.net, then you'd better act as fast as you can. The person did you a favor by informing you of the problem at all, but you have no idea how many other people she's told, or how long she'll wait before exploiting the vulnerability on live installations.

Note that the difference we're talking about here is often just a narrow range between urgent and extremely urgent. Even when the report comes from a known, friendly source, there could be other people on the Net who discovered the bug long ago and just haven't reported it. The only time things aren't urgent is when the bug inherently does not compromise security very severely.

The "anonymous14@globalhackerz.net" example is not facetious, by the way. You really may get bug reports from identity-cloaked people who, by their words and behavior, never quite clarify whether they're on your side or not. It doesn't matter: if they've reported the security hole to you, they'll feel they've done you a good turn, and you should respond in kind. Thank them for the report, give them a date on or before which you plan to release a public fix, and keep them in the loop. Sometimes they may give you a date—that is, an implicit threat to publicize the bug on a certain date, whether you're ready or not. This may feel like a bullying power play, but it's more likely a preëmptive action resulting from past disappointment with unresponsive software producers who didn't take security reports seriously enough. Either way, you can't afford to tick this person off. After all, if the bug is severe, he has knowledge that could cause your users big problems. Treat such reporters well, and hope that they treat you well.

Another frequent reporter of security bugs is the security professional, someone who audits code for a living and keeps up on the latest news of software vulnerabilities. These people usually have experience on both sides of the fence—they've both received and sent reports, probably more than most developers in your project have. They too will usually give a deadline for fixing a vulnerability before going public. The deadline may be somewhat negotiable, but that's up to the reporter; deadlines have become recognized among security professionals as pretty much the only reliable way to get organizations to address security problems promptly. So don't treat the deadline as rude; it's a time-honored tradition, and there are good reasons for it.

Once you know the severity and urgency, you can start working on a fix. There is sometimes a tradeoff between doing a fix elegantly and doing it speedily; this is why you must agree on the urgency before you start. Keep discussion of the fix restricted to the security list members, of course, plus the original reporter (if she wants to be involved) and any developers who need to be brought in for technical reasons.

Do not commit the fix to the repository. Keep it in patch form until the go-public date. If you were to commit it, even with an innocent-looking log message, someone might notice and understand the change. You never know who is watching your repository and why they might be interested. Turning off commit emails wouldn't help; first of all, the gap in the commit mail sequence would itself look suspicious, and anyway, the data would still be in the repository. Just do all development in a patch and keep the patch in some private place, perhaps a separate, private repository known only to the people already aware of the bug. (If you use a decentralized version control system like Arch or SVK, you can do the work under full version control, and just keep that repository inaccessible to outsiders.)

CAN/CVE numbers

You may have seen a CAN number or a CVE number associated with security problems. These numbers usually look like "CAN-2004-0397" or "CVE-2002-0092", for example.

Both kinds of numbers represent the same type of entity: an entry in the list of "Common Vulnerabilities and Exposures" list maintained at http://cve.mitre.org/. The purpose of the list is to provide standardized names for all known security problems, so that everyone has a unique, canonical name to use when discussing one, and a central place to go to find out more information. The only difference between a "CAN" number and a "CVE" number is that the former represents a candidate entry, not yet approved for inclusion in the official list by the CVE Editorial Board, and the latter represents an approved entry. However, both types of entries are visible to the public, and an entry's number does not change when it is approved—the "CAN" prefix is simply replaced with "CVE".

A CAN/CVE entry does not itself contain a full description of the bug and how to protect against it. Instead, it contains a brief summary, and a list of references to external resources (such as mailing list archives) where people can go to get more detailed information. The real purpose of http://cve.mitre.org/ is to provide a well-organized space in which every vulnerability can have a name and a clear route to more data. See http://cve.mitre.org/cgi-bin/cvename.cgi?name=2002-0092 for an example of an entry. Note that the references can be very terse, with sources appearing as cryptic abbreviations. A key to those abbreviations is at http://cve.mitre.org/cve/refs/refkey.html.

If your vulnerability meets the CVE criteria, you may wish to acquire it a CAN number. The process for doing so is deliberately gated: basically, you have to know someone, or know someone who knows someone. This is not as crazy as it might sound. In order for the CVE Editorial Board to avoid being overwhelmed with spurious or poorly written submissions, they take submissions only from sources they already know and trust. In order to get your vulnerability listed, therefore, you need to find a path of acquaintance from your project to the CVE Editorial Board. Ask around among your developers; one of them will probably know someone else who has either done the CAN process before, or knows someone who has, etc. The advantage of doing it this way is also that somewhere along the chain, someone may know enough to tell you that a) it wouldn't count as a vulnerability or exposure according to MITRE's criteria, so there is no point submitting it, or b) the vulnerability already has a CAN or CVE number. The latter can happen if the bug has already been published on another security advisory list, for example at http://www.cert.org/ or on the BugTraq mailing list at http://www.securityfocus.com/. (If that happened without your project hearing about it, then you should worry what else might be going on that you don't know about.)

If you get a CAN/CVE number at all, you usually want to get it in the early stages of your bug investigation, so that all further communications can refer to that number. CAN entries are embargoed until the go-public date; the entry will exist as an empty placeholder (so you don't lose the name), but it won't reveal any information about the vulnerability until the date on which you will be announcing the bug and the fix.

More information about the CAN/CVE process may be found at http://cve.mitre.org/about/candidates.html, and a particularly clear exposition of one open source project's use of CAN/CVE numbers is at http://www.debian.org/security/cve-compatibility.

Pre-notification

Once your security response team (that is, those developers who are on the security mailing list, or who have been brought in to deal with a particular report) has a fix ready, you need to decide how to distribute it.

If you simply commit the fix to your repository, or otherwise announce it to the world, you effectively force everyone using your software to upgrade immediately or risk being hacked. It is sometimes appropriate, therefore, to do pre-notification for certain important users. This is particularly true with client/server software, where there may be well-known servers that are tempting targets for attackers. Those servers' administrators would appreciate having an extra day or two to do the upgrade, so that they are already protected by the time the exploit becomes public knowledge.

Pre-notification simply means sending mails to those administrators before the go-public date, telling them of the vulnerability and how to fix it. You should send pre-notification only to people you trust to be discreet with the information. That is, the qualification for receiving pre-notification is twofold: the recipient must run a large, important server where a compromise would be a serious matter, and the recipient must be known to be someone who won't blab about the security problem before the go-public date.

Send each pre-notification mail individually (one at a time) to each recipient. Do not send to the entire list of recipients at once, because then they would see each others' names—meaning that you would essentially be alerting each recipient to the fact that each other recipient may have a security hole in her server. Sending it to them all via blind CC (BCC) isn't a good solution either, because some admins protect their inboxes with spam filters that either block or reduce the priority of BCC'd mail, since so much spam is sent via BCC these days.

Here's a sample pre-notification mail:

From: Your Name Here
To: admin@large-famous-server.com
Reply-to: Your Name Here (not the security list's address)
Subject: Confidential Scanley vulnerability notification.


This email is a confidential pre-notification of a security alert
in the Scanley server.

Please *do not forward* any part of this mail to anyone.  The public
announcement is not until May 19th, and we'd like to keep the
information embargoed until then.

You are receiving this mail because (we think) you run a Scanley
server, and would want to have it patched before this security hole is
made public on May 19th.

References:
===========

   CAN-2004-1771: Scanley stack overflow in queries

Vulnerability:
==============

   The server can be made to run arbitrary commands if the server's
   locale is misconfigured and the client sends a malformed query.

Severity:
=========

   Very severe, can involve arbitrary code execution on the server.

Workarounds:
============

   Setting the 'natural-language-processing' option to 'off' in
   scanley.conf closes this vulnerability.

Patch:
======

   The patch below applies to Scanley 3.0, 3.1, and 3.2.

   A new public release (Scanley 3.2.1) will be made on or just before
   May 19th, so that it is available at the same time as this
   vulnerability is made public.  You can patch now, or just wait for
   the public release.  The only difference between 3.2 and 3.2.1 will
   be this patch.

[...patch goes here...]

If you have a CAN number, include it in the pre-notification (as shown above), even though the information is still embargoed and therefore the MITRE page will show nothing. Including the CAN number allows the recipient to know with certainty that the bug they were pre-notified about is the same one they later hear about through public channels, so they don't have to worry whether further action is necessary or not, which is precisely the point of CAN/CVE numbers.

Distribute the fix publicly

The last step in handling a security bug is to distribute the fix publicly. In a single, comprehensive announcement, you should describe the problem, give the CAN/CVE number if any, describe how to work around it, and how to permanently fix it. Usually "fix" means upgrading to a new version of the software, though sometimes it can mean applying a patch, particularly if the software is normally run in source form anyway. If you do make a new release, it should differ from some existing release by exactly the security patch. That way, conservative admins can upgrade without worrying about what else they might be affecting; they also don't have to worry about future upgrades, because the security fix will be in all future releases as a matter of course. (Details of release procedures are discussed in la section intitulée « Security Releases » in Chapitre 7, Packaging, Releasing, and Daily Development.)

Whether or not the public fix involves a new release, do the announcement with roughly the same priority as you would a new release: send a mail to the project's announce list, make a new press release, update the Freshmeat entry, etc. While you should never try to play down the existence of a security bug out of concern for the project's reputation, you may certainly set the tone and prominence of a security announcement to match the actual severity of the problem. If the security hole is just a minor information exposure, not an exploit that allows the user's entire computer to be taken over, then it may not warrant a lot of fuss. You may even decide not to distract the announce list with it. After all, if the project cries wolf every time, users might end up thinking the software is less secure than it actually is, and also might not believe you when you have a really big problem to announce. See http://cve.mitre.org/about/terminology.html for a good introduction to the problem of judging severity.

In general, if you're unsure how to treat a security problem, find someone with experience and talk to them about it. Assessing and handling vulnerabilities is very much an acquired skill, and it's easy to make missteps the first few times.



[19] There has been some interesting academic research on this topic; for example, see Group Awareness in Distributed Software Development by Gutwin, Penner, and Schneider (this used to be available online, but seems to have disappeared, at least temporarily; use a search engine to find it).

Chapitre 7. Packaging, Releasing, and Daily Development

This chapter is about how free software projects package and release their software, and how overall development patterns organize around those goals.

A major difference between open source projects and proprietary ones is the lack of centralized control over the development team. When a new release is being prepared, this difference is especially stark: a corporation can ask its entire development team to focus on an upcoming release, putting aside new feature development and non-critical bug fixing until the release is done. Volunteer groups are not so monolithic. People work on the project for all sorts of reasons, and those not interested in helping with a given release still want to continue regular development work while the release is going on. Because development doesn't stop, open source release processes tend to take longer, but be less disruptive, than commercial release processes. It's a bit like highway repair. There are two ways to fix a road: you can shut it down completely, so that a repair crew can swarm all over it at full capacity until the problem is solved, or you can work on a couple of lanes at a time, while leaving the others open to traffic. The first way is very efficient for the repair crew, but not for anyone else—the road is entirely shut down until the job is done. The second way involves much more time and trouble for the repair crew (now they have to work with fewer people and less equipment, in cramped conditions, with flaggers to slow and direct traffic, etc.), but at least the road remains useable, albeit not at full capacity.

Open source projects tend to work the second way. In fact, for a mature piece of software with several different release lines being maintained simultaneously, the project is sort of in a permanent state of minor road repair. There are always a couple of lanes closed; a consistent but low level of background inconvenience is always being tolerated by the development group as a whole, so that releases get made on a regular schedule.

The model that makes this possible generalizes to more than just releases. It's the principle of parallelizing tasks that are not mutually interdependent—a principle that is by no means unique to open source development, of course, but one which open source projects implement in their own particular way. They cannot afford to annoy either the roadwork crew or the regular traffic too much, but they also cannot afford to have people dedicated to standing by the orange cones and flagging traffic along. Thus they gravitate toward processes that have flat, constant levels of administrative overhead, rather than peaks and valleys. Volunteers are generally willing to work with small but consistent amounts of inconvenience; the predictability allows them to come and go without worrying about whether their schedule will clash with what's happening in the project. But if the project were subject to a master schedule in which some activities excluded other activities, the result would be a lot of developers sitting idle a lot of the time—which would be not only inefficient but boring, and therefore dangerous, in that a bored developer is likely to soon be an ex-developer.

Release work is usually the most noticeable non-development task that happens in parallel with development, so the methods described in the following sections are geared mostly toward enabling releases. However, note that they also apply to other parallelizable tasks, such as translations and internationalization, broad API changes made gradually across the entire code base, etc.

Release Numbering

Before we talk about how to make a release, let's look at how to name releases, which requires knowing what releases actually mean to users. A release means that:

  • Old bugs have been fixed. This is probably the one thing users can count on being true of every release.

  • New bugs have been added. This too can usually be counted on, except sometimes in the case of security releases or other one-offs (see la section intitulée « Security Releases » later in this chapter).

  • New features may have been added.

  • New configuration options may have been added, or the meanings of old options may have changed subtly. The installation procedures may have changed slightly since the last release too, though one always hopes not.

  • Incompatible changes may have been introduced, such that the data formats used by older versions of the software are no longer useable without undergoing some sort of (possibly manual) one-way conversion step.

As you can see, not all of these are good things. This is why experienced users approach new releases with some trepidation, especially when the software is mature and was already mostly doing what they wanted (or thought they wanted). Even the arrival of new features is a mixed blessing, in that it may mean the software will now behave in unexpected ways.

The purpose of release numbering, therefore, is twofold: obviously the numbers should unambiguously communicate the ordering of releases (i.e., by looking at any two releases' numbers, one can know which came later), but also they should indicate as compactly as possible the degree and nature of the changes in the release.

All that in a number? Well, more or less, yes. Release numbering strategies are one of the oldest bikeshed discussions around (see la section intitulée « The Softer the Topic, the Longer the Debate » in Chapitre 6, Communications), and the world is unlikely to settle on a single, complete standard anytime soon. However, a few good strategies have emerged, along with one universally agreed-on principle: be consistent. Pick a numbering scheme, document it, and stick with it. Your users will thank you.

Release Number Components

This section describes the formal conventions of release numbering in detail, and assumes very little prior knowledge. It is intended mainly as a reference. If you're already familiar with these conventions, you can skip this section.

Release numbers are groups of digits separated by dots:

Scanley 2.3
Singer 5.11.4

...and so on. The dots are not decimal points, they are merely separators; "5.3.9" would be followed by "5.3.10". A few projects have occasionally hinted otherwise, most famously the Linux kernel with its "0.95", "0.96"... "0.99" sequence leading up to Linux 1.0, but the convention that the dots are not decimals is now firmly established and should be considered a standard. There is no limit to the number of components (digit portions containing no dots), but most projects do not go beyond three or four. The reasons why will become clear later.

In addition to the numeric components, projects sometimes tack on a descriptive label such as "Alpha" or "Beta" (see Alpha et Bêta), for example:

Scanley 2.3.0 (Alpha)
Singer 5.11.4 (Beta)

An Alpha or Beta qualifier means that this release precedes a future release that will have the same number without the qualifier. Thus, "2.3.0 (Alpha)" leads eventually to "2.3.0". In order to allow several such candidate releases in a row, the qualifiers themselves can have meta-qualifiers. For example, here is a series of releases in the order that they would be made available to the public:

Scanley 2.3.0 (Alpha 1)
Scanley 2.3.0 (Alpha 2)
Scanley 2.3.0 (Beta 1)
Scanley 2.3.0 (Beta 2)
Scanley 2.3.0 (Beta 3)
Scanley 2.3.0

Notice that when it has the "Alpha" qualifier, Scanley "2.3" is written as "2.3.0". The two numbers are equivalent—trailing all-zero components can always be dropped for brevity—but when a qualifier is present, brevity is out the window anyway, so one might as well go for completeness instead.

Other qualifiers in semi-regular use include "Stable", "Unstable", "Development", and "RC" (for "Release Candidate"). The most widely used ones are still "Alpha" and "Beta", with "RC" running a close third place, but note that "RC" always includes a numeric meta-qualifier. That is, you don't release "Scanley 2.3.0 (RC)", you release "Scanley 2.3.0 (RC 1)", followed by RC2, etc.

Those three labels, "Alpha", "Beta", and "RC", are pretty widely known now, and I don't recommend using any of the others, even though the others might at first glance seem like better choices because they are normal words, not jargon. But people who install software from releases are already familiar with the big three, and there's no reason to do things gratuitously differently from the way everyone else does them.

Although the dots in release numbers are not decimal points, they do indicate place-value significance. All "0.X.Y" releases precede "1.0" (which is equivalent to "1.0.0", of course). "3.14.158" immediately precedes "3.14.159", and non-immediately precedes "3.14.160" as well as "3.15.anything", and so.

A consistent release numbering policy enables a user to look at two release numbers for the same piece of software and tell, just from the numbers, the important differences between those two releases. In a typical three-component system, the first component is the major number, the second is the minor number, and the third is the micro number. For example, release "2.10.17" is the seventeenth micro release in the tenth minor release line within the second major release series. The words "line" and "series" are used informally here, but they mean what one would expect. A major series is simply all the releases that share the same major number, and a minor series (or minor line) consists of all the releases that share the same minor and major number. That is, "2.4.0" and "3.4.1" are not in the same minor series, even though they both have "4" for their minor number; on the other hand, "2.4.0" and "2.4.2" are in the same minor line, though they are not adjacent if "2.4.1" was released between them.

The meanings of these numbers are exactly what you'd expect: an increment of the major number indicates that major changes happened; an increment of the minor number indicates minor changes; and an increment of the micro number indicates really trivial changes. Some projects add a fourth component, usually called the patch number, for especially fine-grained control over the differences between their releases (confusingly, other projects use "patch" as a synonym for "micro" in a three-component system). There are also projects that use the last component as a build number, incremented every time the software is built and representing no change other than that build. This helps the project link every bug report with a specific build, and is probably most useful when binary packages are the default method of distribution.

Although there are many different conventions for how many components to use, and what the components mean, the differences tend to be minor—you get a little leeway, but not a lot. The next two sections discuss some of the most widely used conventions.

The Simple Strategy

Most projects have rules about what kinds of changes are allowed into a release if one is only incrementing the micro number, different rules for the minor number, and still different ones for the major number. There is no set standard for these rules yet, but here I will describe a policy that has been used successfully by multiple projects. You may want to just adopt this policy in your own project, but even if you don't, it's still a good example of the kind of information release numbers should convey. This policy is adapted from the numbering system used by the APR project, see http://apr.apache.org/versioning.html.

  1. Changes to the micro number only (that is, changes within the same minor line) must be both forward- and backward-compatible. That is, the changes should be bug fixes only, or very small enhancements to existing features. New features should not be introduced in a micro release.

  2. Changes to the minor number (that is, within the same major line) must be backward-compatible, but not necessarily forward-compatible. It's normal to introduce new features in a minor release, but usually not too many new features at once.

  3. Changes to the major number mark compatibility boundaries. A new major release can be forward- and backward-incompatible. A major release is expected to have new features, and may even have entire new feature sets.

What backward-compatible and forward-compatible mean, exactly, depends on what your software does, but in context they are usually not open to much interpretation. For example, if your project is a client/server application, then "backward-compatible" means that upgrading the server to 2.6.0 should not cause any existing 2.5.4 clients to lose functionality or behave differently than they did before (except for bugs that were fixed, of course). On the other hand, upgrading one of those clients to 2.6.0, along with the server, might make new functionality available for that client, functionality that 2.5.4 clients don't know how to take advantage of. If that happens, then the upgrade is not "forward-compatible": clearly you can't now downgrade that client back to 2.5.4 and keep all the functionality it had at 2.6.0, since some of that functionality was new in 2.6.0.

This is why micro releases are essentially for bug fixes only. They must remain compatible in both directions: if you upgrade from 2.5.3 to 2.5.4, then change your mind and downgrade back to 2.5.3, no functionality should be lost. Of course, the bugs fixed in 2.5.4 would reappear after the downgrade, but you wouldn't lose any features, except insofar as the restored bugs prevent the use of some existing features.

Client/server protocols are just one of many possible compatibility domains. Another is data formats: does the software write data to permanent storage? If so, the formats it reads and writes need to follow the compatibility guidelines promised by the release number policy. Version 2.6.0 needs to be able to read the files written by 2.5.4, but may silently upgrade the format to something that 2.5.4 cannot read, because the ability to downgrade is not required across a minor number boundary. If your project distributes code libraries for other programs to use, then APIs are a compatibility domain too: you must make sure that source and binary compatibility rules are spelled out in such a way that the informed user need never wonder whether or not it's safe to upgrade in place. She will be able to look at the numbers and know instantly.

In this system, you don't get a chance for a fresh start until you increment the major number. This can often be a real inconvenience: there may be features you wish to add, or protocols that you wish to redesign, that simply cannot be done while maintaining compatibility. There's no magic solution to this, except to try to design things in an extensible way in the first place (a topic easily worth its own book, and certainly outside the scope of this one). But publishing a release compatibility policy, and adhering to it, is an inescapable part of distributing software. One nasty surprise can alienate a lot of users. The policy just described is good partly because it's already quite widespread, but also because it's easy to explain and to remember, even for those not already familiar with it.

It is generally understood that these rules do not apply to pre-1.0 releases (although your release policy should probably state so explicitly, just to be clear). A project that is still in initial development can release 0.1, 0.2, 0.3, and so on in sequence, until it's ready for 1.0, and the differences between those releases can be arbitrarily large. Micro numbers in pre-1.0 releases are optional. Depending on the nature of your project and the differences between the releases, you might find it useful to have 0.1.0, 0.1.1, etc., or you might not. Conventions for pre-1.0 release numbers are fairly loose, mainly because people understand that strong compatibility constraints would hamper early development too much, and because early adopters tend to be forgiving anyway.

Remember that all these injunctions only apply to this particular three-component system. Your project could easily come up with a different three-component system, or even decide it doesn't need such fine granularity and use a two-component system instead. The important thing is to decide early, publish exactly what the components mean, and stick to it.

The Even/Odd Strategy

Some projects use the parity of the minor number component to indicate the stability of the software: even means stable, odd means unstable. This applies only to the minor number, not the major and micro numbers. Increments in the micro number still indicate bug fixes (no new features), and increments in the major number still indicate big changes, new feature sets, etc.

The advantage of the even/odd system, which has been used by the Linux kernel project among others, is that it offers a way to release new functionality for testing without subjecting production users to potentially unstable code. People can see from the numbers that "2.4.21" is okay to install on their live web server, but that "2.5.1" should probably stay confined to home workstation experiments. The development team handles the bug reports that come in from the unstable (odd-minor-numbered) series, and when things start to settle down after some number of micro releases in that series, they increment the minor number (thus making it even), reset the micro number back to "0", and release a presumably stable package.

This system preserves, or at least, does not conflict with, the compatibility guidelines given earlier. It simply overloads the minor number with some extra information. This forces the minor number to be incremented about twice as often as would otherwise be necessary, but there's no great harm in that. The even/odd system is probably best for projects that have very long release cycles, and which by their nature have a high proportion of conservative users who value stability above new features. It is not the only way to get new functionality tested in the wild, however. la section intitulée « Stabilizing a Release » later in this chapter describes another, perhaps more common, method of releasing potentially unstable code to the public, marked so that people have an idea of the risk/benefit trade-offs immediately on seeing the release's name.

Release Branches

From a developer's point of view, a free software project is in a state of continuous release. Developers usually run the latest available code at all times, because they want to spot bugs, and because they follow the project closely enough to be able to stay away from currently unstable areas of the feature space. They often update their copy of the software every day, sometimes more than once a day, and when they check in a change, they can reasonably expect that every other developer will have it within 24 hours.

How, then, should the project make a formal release? Should it simply take a snapshot of the tree at a moment in time, package it up, and hand it to the world as, say, version "3.5.0"? Common sense says no. First, there may be no moment in time when the entire development tree is clean and ready for release. Newly-started features could be lying around in various states of completion. Someone might have checked in a major change to fix a bug, but the change could be controversial and under debate at the moment the snapshot is taken. If so, it wouldn't work to simply delay the snapshot until the debate ends, because another, unrelated debate could start in the meantime, and then you'd have wait for that one to end too. This process is not guaranteed to halt.

In any case, using full-tree snapshots for releases would interfere with ongoing development work, even if the tree could be put into a releasable state. Say this snapshot is going to be "3.5.0"; presumably, the next snapshot would be "3.5.1", and would contain mostly fixes for bugs found in the 3.5.0 release. But if both are snapshots from the same tree, what are the developers supposed to do in the time between the two releases? They can't be adding new features; the compatibility guidelines prevent that. But not everyone will be enthusiastic about fixing bugs in the 3.5.0 code. Some people may have new features they're trying to complete, and will become irate if they are forced to choose between sitting idle and working on things they're not interested in, just because the project's release processes demand that the development tree remain unnaturally quiescent.

The solution to these problems is to always use a release branch. A release branch is just a branch in the version control system (see Branche ), on which the code destined for this release can be isolated from mainline development. The concept of release branches is certainly not original to free software; many commercial development organizations use them too. However, in commercial environments, release branches are sometimes considered a luxury—a kind of formal "best practice" that can, in the heat of a major deadline, be dispensed with while everyone on the team scrambles to stabilize the main tree.

Release branches are pretty much required in open source projects, however. I have seen projects do releases without them, but it has always resulted in some developers sitting idle while others—usually a minority—work on getting the release out the door. The result is usually bad in several ways. First, overall development momentum is slowed. Second, the release is of poorer quality than it needed to be, because there were only a few people working on it, and they were hurrying to finish so everyone else could get back to work. Third, it divides the development team psychologically, by setting up a situation in which different types of work interfere with each other unnecessarily. The developers sitting idle would probably be happy to contribute some of their attention to a release branch, as long as that were a choice they could make according to their own schedules and interests. But without the branch, their choice becomes "Do I participate in the project today or not?" instead of "Do I work on the release today, or work on that new feature I've been developing in the mainline code?"

Mechanics of Release Branches

The exact mechanics of creating a release branch depend on your version control system, of course, but the general concepts are the same in most systems. A branch usually sprouts from another branch or from the trunk. Traditionally, the trunk is where mainline development goes on, unfettered by release constraints. The first release branch, the one leading to the "1.0" release, sprouts off the trunk. In CVS, the branch command would be something like this

$ cd trunk-working-copy
$ cvs tag -b RELEASE_1_0_X

or in Subversion, like this:

$ svn copy http://.../repos/trunk http://.../repos/branches/1.0.x

(All these examples assume a three-component release numbering system. While I can't show the exact commands for every version control system, I'll give examples in CVS and Subversion and hope that the corresponding commands in other systems can be deduced from those two.)

Notice that we created branch "1.0.x" (with a literal "x") instead of "1.0.0". This is because the same minor line—i.e., the same branch—will be used for all the micro releases in that line. The actual process of stabilizing the branch for release is covered in la section intitulée « Stabilizing a Release » later in this chapter. Here we are concerned just with the interaction between the version control system and the release process. When the release branch is stabilized and ready, it is time to tag a snapshot from the branch:

$ cd RELEASE_1_0_X-working-copy
$ cvs tag RELEASE_1_0_0

or

$ svn copy http://.../repos/branches/1.0.x http://.../repos/tags/1.0.0

That tag now represents the exact state of the project's source tree in the 1.0.0 release (this is useful in case anyone ever needs to get an old version after the packaged distributions and binaries have been taken down). The next micro release in the same line is likewise prepared on the 1.0.x branch, and when it is ready, a tag is made for 1.0.1. Lather, rinse, repeat for 1.0.2, and so on. When it's time to start thinking about a 1.1.x release, make a new branch from trunk:

$ cd trunk-working-copy
$ cvs tag -b RELEASE_1_1_X

or

$ svn copy http://.../repos/trunk http://.../repos/branches/1.1.x

Maintenance can continue in parallel along both 1.0.x and 1.1.x, and releases can be made independently from both lines. In fact, it is not unusual to publish near-simultaneous releases from two different lines. The older series is recommended for more conservative site administrators, who may not want to make the big jump to (say) 1.1 without careful preparation. Meanwhile, more adventurous people usually take the most recent release on the highest line, to make sure they're getting the latest features, even at the risk of greater instability.

This is not the only release branch strategy, of course. In some circumstances it may not even be the best, though it's worked out pretty well for projects I've been involved in. Use any strategy that seems to work, but remember the main points: the purpose of a release branch is to isolate release work from the fluctuations of daily development, and to give the project a physical entity around which to organize its release process. That process is described in detail in the next section.

Stabilizing a Release

Stabilization is the process of getting a release branch into a releasable state; that is, of deciding which changes will be in the release, which will not, and shaping the branch content accordingly.

There's a lot of potential grief contained in that word, "deciding". The last-minute feature rush is a familiar phenomenon in collaborative software projects: as soon as developers see that a release is about to happen, they scramble to finish their current changes, in order not to miss the boat. This, of course, is the exact opposite of what you want at release time. It would be much better for people to work on features at a comfortable pace, and not worry too much about whether their changes make it into this release or the next one. The more changes one tries to cram into a release at the last minute, the more the code is destabilized, and (usually) the more new bugs are created.

Most software engineers agree in theory on rough criteria for what changes should be allowed into a release line during its stabilization period. Obviously, fixes for severe bugs can go in, especially for bugs without workarounds. Documentation updates are fine, as are fixes to error messages (except when they are considered part of the interface and must remain stable). Many projects also allow certain kinds of low-risk or non-core changes to go in during stabilization, and may have formal guidelines for measuring risk. But no amount of formalization can obviate the need for human judgement. There will always be cases where the project simply has to make a decision about whether a given change can go into a release. The danger is that since each person wants to see their own favorite changes admitted into the release, then there will be plenty of people motivated to allow changes, and not enough people motivated to bar them.

Thus, the process of stabilizing a release is mostly about creating mechanisms for saying "no". The trick for open source projects, in particular, is to come up with ways of saying "no" that won't result in too many hurt feelings or disappointed developers, and also won't prevent deserving changes from getting into the release. There are many different ways to do this. It's pretty easy to design systems that satisfy these criteria, once the team has focused on them as the important criteria. Here I'll briefly describe two of the most popular systems, at the extreme ends of the spectrum, but don't let that discourage your project from being creative. Plenty of other arrangements are possible; these are just two that I've seen work in practice.

Dictatorship by Release Owner

The group agrees to let one person be the release owner. This person has final say over what changes make it into the release. Of course, it is normal and expected for there to be discussions and arguments, but in the end the group must grant the release owner sufficient authority to make final decisions. For this system to work, it is necessary to choose someone with the technical competence to understand all the changes, and the social standing and people skills to navigate the discussions leading up to the release without causing too many hurt feelings.

A common pattern is for the release owner to say "I don't think there's anything wrong with this change, but we haven't had enough time to test it yet, so it shouldn't go into this release." It helps a lot if the release owner has broad technical knowledge of the project, and can give reasons why the change could be potentially destabilizing (for example, its interactions with other parts of the software, or portability concerns). People will sometimes ask such decisions to be justified, or will argue that a change is not as risky as it looks. These conversations need not be confrontational, as long as the release owner is able to consider all the arguments objectively and not reflexively dig in his heels.

Note that the release owner need not be the same person as the project leader (in cases where there is a project leader at all; see la section intitulée « Benevolent Dictators » in Chapitre 4, Social and Political Infrastructure). In fact, sometimes it's good to make sure they're not the same person. The skills that make a good development leader are not necessarily the same as those that make a good release owner. In something as important as the release process, it may be wise to have someone provide a counterbalance to the project leader's judgement.

Contrast the release owner role with the less dictatorial role described in la section intitulée « Release manager » later in this chapter.

Change Voting

At the opposite extreme from dictatorship by release owner, developers can simply vote on which changes to include in the release. However, since the most important function of release stabilization is to exclude changes, it's important to design the voting system in such a way that getting a change into the release involves positive action by multiple developers. Including a change should need more than just a simple majority (see la section intitulée « Who Votes? » in Chapitre 4, Social and Political Infrastructure). Otherwise, one vote for and none against a given change would suffice to get it into the release, and an unfortunate dynamic would be set up whereby each developer would vote for her own changes, yet would be reluctant to vote against others' changes, for fear of possible retaliation. To avoid this, the system should be arranged such that subgroups of developers must act in cooperation to get any change into the release. This not only means that more people review each change, it also makes any individual developer less hesitant to vote against a change, because she knows that no particular one among those who voted for it would take her vote against as a personal affront. The greater the number of people involved, the more the discussion becomes about the change and less about the individuals.

The system we use in the Subversion project seems to have struck a good balance, so I'll recommend it here. In order for a change to be applied to the release branch, at least three developers must vote in favor of it, and none against. A single "no" vote is enough to stop the change from being included; that is, a "no" vote in a release context is equivalent to a veto (see la section intitulée « Vetoes »). Naturally, any such vote must be accompanied by a justification, and in theory the veto could be overridden if enough people feel it is unreasonable and force a special vote over it. In practice, this has never happened, and I don't expect that it ever will. People are conservative around releases anyway, and when someone feels strongly enough to veto the inclusion of a change, there's usually a good reason for it.

Because the release procedure is deliberately biased toward conservatism, the justifications offered for vetoes are sometimes procedural rather than technical. For example, a person may feel that a change is well-written and unlikely to cause any new bugs, but vote against its inclusion in a micro release simply because it's too big—perhaps it adds a new feature, or in some subtle way fails to fully follow the compatibility guidelines. I've occasionally even seen developers veto something because they simply had a gut feeling that the change needed more testing, even though they couldn't spot any bugs in it by inspection. People grumbled a little bit, but the vetoes stood and the change was not included in the release (I don't remember if any bugs were found in later testing or not, though).

Managing collaborative release stabilization

If your project chooses a change voting system, it is imperative that the physical mechanics of setting up ballots and casting votes be as convenient as possible. Although there is plenty of open source electronic voting software available, in practice the easiest thing to do is just to set up a text file in the release branch, called STATUS or VOTES or something like that. This file lists each proposed change—any developer can propose a change for inclusion—along with all the votes for and against it, plus any notes or comments. (Proposing a change doesn't necessarily mean voting for it, by the way, although the two often go together.) An entry in such a file might look like this:

* r2401 (issue #49)
  Prevent client/server handshake from happening twice.
  Justification:
    Avoids extra network turnaround; small change and easy to review.
  Notes:
    This was discussed in http://.../mailing-lists/message-7777.html
    and other messages in that thread.
  Votes:
    +1: jsmith, kimf
    -1: tmartin (breaks compatibility with some pre-1.0 servers;
                 admittedly, those servers are buggy, but why be
                 incompatible if we don't have to?)

In this case, the change acquired two positive votes, but was vetoed by tmartin, who gave the reason for the veto in a parenthetical note. The exact format of the entry doesn't matter; whatever your project settles on is fine—perhaps tmartin's explanation for the veto should go up in the "Notes:" section, or perhaps the change description should get a "Description:" header to match the other sections. The important thing is that all the information needed to evaluate the change be reachable, and that the mechanism for casting votes be as lightweight as possible. The proposed change is referred to by its revision number in the repository (in this case a single revision, r2401, although a proposed change could just as easily consist of multiple revisions). The revision is assumed to refer to a change made on the trunk; if the change were already on the release branch, there would be no need to vote on it. If your version control system doesn't have an obvious syntax for referring to individual changes, then the project should make one up. For voting to be practical, each change under consideration must be unambiguously identifiable.

Those proposing or voting for a change are responsible for making sure it applies cleanly to the release branch, that is, applies without conflicts (see Conflit ). If there are conflicts, then the entry should either point to an adjusted patch that does apply cleanly, or to a temporary branch that holds an adjusted version of the change, for example:

* r13222, r13223, r13232
  Rewrite libsvn_fs_fs's auto-merge algorithm
  Justification:
    unacceptable performance (>50 minutes for a small commit) in
    a repository with 300,000 revisions
  Branch:
    1.1.x-r13222@13517
  Votes:
    +1: epg, ghudson

That example is taken from real life; it comes from the STATUS file for the Subversion 1.1.4 release process. Notice how it uses the original revisions as canonical handles on the change, even though there is also a branch with a conflict-adjusted version of the change (the branch also combines the three trunk revisions into one, r13517, to make it easier to merge the change into the release, should it get approval). The original revisions are provided because they're still the easiest entity to review, since they have the original log messages. The temporary branch wouldn't have those log messages; in order to avoid duplication of information (see la section intitulée « Singularity of information » in Chapitre 3, L'infrastructure technique), the branch's log message for r13517 should simply say "Adjust r13222, r13223, and r13232 for backport to 1.1.x branch." All other information about the changes can be chased down at their original revisions.

Release manager

The actual process of merging (see Fusion (ou port) ) approved changes into the release branch can be performed by any developer. There does not need to be one person whose job it is to merge changes; if there are a lot of changes, it can be better to spread the burden around.

However, although both voting and merging happen in a decentralized fashion, in practice there are usually one or two people driving the release process. This role is sometimes formally blessed as release manager, but it is quite different from a release owner (see la section intitulée « Dictatorship by Release Owner » earlier in this chapter) who has final say over the changes. Release managers keep track of how many changes are currently under consideration, how many have been approved, how many seem likely to be approved, etc. If they sense that important changes are not getting enough attention, and might be left out of the release for lack of votes, they will gently nag other developers to review and vote. When a batch of changes are approved, these people will often take it upon themselves to merge them into the release branch; it's fine if others leave that task to them, as long as everyone understands that they are not obligated to do all the work unless they have explicitly committed to it. When the time comes to put the release out the door (see la section intitulée « Testing and Releasing » later in this chapter), the release managers also take care of the logistics of creating the final release packages, collecting digital signatures, uploading the packages, and making the public announcement.

Packaging

The canonical form for distribution of free software is as source code. This is true regardless of whether the software normally runs in source form (i.e., can be interpreted, like Perl, Python, PHP, etc.) or needs to be compiled first (like C, C++, Java, etc.). With compiled software, most users will probably not compile the sources themselves, but will instead install from pre-built binary packages (see la section intitulée « Binary Packages » later in this chapter). However, those binary packages are still derived from a master source distribution. The point of the source package is to unambiguously define the release. When the project distributes "Scanley 2.5.0", what it means, specifically, is "The tree of source code files that, when compiled (if necessary) and installed, produces Scanley 2.5.0."

There is a fairly strict standard for how source releases should look. One will occasionally see deviations from this standard, but they are the exception, not the rule. Unless there is a compelling reason to do otherwise, your project should follow this standard too.

Format

The source code should be shipped in the standard formats for transporting directory trees. For Unix and Unix-like operating systems, the convention is to use TAR format, compressed by compress, gzip, bzip or bzip2. For MS Windows, the standard method for distributing directory trees is zip format, which happens to do compression as well, so there is no need to compress the archive after creating it.

Name and Layout

The name of the package should consist of the software's name plus the release number, plus the format suffixes appropriate for the archive type. For example, Scanley 2.5.0, packaged for Unix using GNU Zip (gzip) compression, would look like this:

scanley-2.5.0.tar.gz

or for Windows using zip compression:

scanley-2.5.0.zip

Either of these archives, when unpacked, should create a single new directory tree named scanley-2.5.0 in the current directory. Underneath the new directory, the source code should be arranged in a layout ready for compilation (if compilation is needed) and installation. In the top level of new directory tree, there should be a plain text README file explaining what the software does and what release this is, and giving pointers to other resources, such as the project's web site, other files of interest, etc. Among those other files should be an INSTALL file, sibling to the README file, giving instructions on how to build and install the software for all the operating systems it supports. As mentioned in la section intitulée « Comment mettre en oeuvre cette licence au projet » in Chapitre 2, Genèse d'un projet, there should also be a COPYING or LICENSE file, giving the software's terms of distribution.

There should also be a CHANGES file (sometimes called NEWS), explaining what's new in this release. The CHANGES file accumulates changelists for all releases, in reverse chronological order, so that the list for this release appears at the top of the file. Completing that list is usually the last thing done on a stabilizing release branch; some projects write the list piecemeal as they're developing, others prefer to save it all up for the end and have one person write it, getting information by combing the version control logs. The list looks something like this:

Version 2.5.0
(20 December 2004, from /branches/2.5.x)
http://svn.scanley.org/repos/svn/tags/2.5.0/

 New features, enhancements:
    * Added regular expression queries (issue #53)
    * Added support for UTF-8 and UTF-16 documents
    * Documentation translated into Polish, Russian, Malagasy
    * ...

 Bugfixes:
    * fixed reindexing bug (issue #945)
    * fixed some query bugs (issues #815, #1007, #1008)
    * ...

The list can be as long as necessary, but don't bother to include every little bugfix and feature enhancement. Its purpose is simply to give users an overview of what they would gain by upgrading to the new release. In fact, the changelist is customarily included in the announcement email (see la section intitulée « Testing and Releasing » later in this chapter), so write it with that audience in mind.

The actual layout of the source code inside the tree should be the same as, or as similar as possible to, the source code layout one would get by checking out the project directly from its version control repository. Usually there are a few differences, for example because the package contains some generated files needed for configuration and compilation (see la section intitulée « Compilation and Installation » later in this chapter), or because it includes third-party software that is not maintained by the project, but that is required and that users are not likely to already have. But even if the distributed tree corresponds exactly to some development tree in the version control repository, the distribution itself should not be a working copy (see Copie de travail ). The release is supposed to represent a static reference point—a particular, unchangeable configuration of source files. If it were a working copy, the danger would be that the user might update it, and afterward think that he still has the release when in fact he has something different.

Remember that the package is the same regardless of the packaging. The release—that is, the precise entity referred to when someone says "Scanley 2.5.0"—is the tree created by unpacking a zip file or tarball. So the project might offer all of these for download:

scanley-2.5.0.tar.bz2
scanley-2.5.0.tar.gz
scanley-2.5.0.zip

...but the source tree created by unpacking them must be the same. That source tree is the distribution; the form in which it is downloaded is merely a matter of convenience. Certain trivial differences between source packages are allowable: for example, in the Windows package, text files should have lines ending with CRLF (Carriage Return and Line Feed), while Unix packages should use just LF. The trees may be arranged slightly differently between source packages destined for different operating systems, too, if those operating systems require different sorts of layouts for compilation. However, these are all basically trivial transformations. The basic source files should be the same across all the packagings of a given release.

To capitalize or not to capitalize

When referring to a project by name, people generally capitalize it as a proper noun, and capitalize acronyms if there are any: "MySQL 5.0", "Scanley 2.5.0", etc. Whether this capitalization is reproduced in the package name is up to the project. Either Scanley-2.5.0.tar.gz or scanley-2.5.0.tar.gz would be fine, for example (I personally prefer the latter, because I don't like to make people hit the shift key, but plenty of projects ship capitalized packages). The important thing is that the directory created by unpacking the tarball use the same capitalization. There should be no surprises: the user must be able to predict with perfect accuracy the name of the directory that will be created when she unpacks a distribution.

Pre-releases

When shipping a pre-release or candidate release, the qualifier is truly a part of the release number, so include it in the name of the package's name. For example, the ordered sequence of alpha and beta releases given earlier in la section intitulée « Release Number Components » would result in package names like this:

scanley-2.3.0-alpha1.tar.gz
scanley-2.3.0-alpha2.tar.gz
scanley-2.3.0-beta1.tar.gz
scanley-2.3.0-beta2.tar.gz
scanley-2.3.0-beta3.tar.gz
scanley-2.3.0.tar.gz

The first would unpack into a directory named scanley-2.3.0-alpha1, the second into scanley-2.3.0-alpha2, and so on.

Compilation and Installation

For software requiring compilation or installation from source, there are usually standard procedures that experienced users expect to be able to follow. For example, for programs written in C, C++, or certain other compiled languages, the standard under Unix-like systems is for the user to type:

   $ ./configure
   $ make
   # make install

The first command autodetects as much about the environment as it can and prepares for the build process, the second command builds the software in place (but does not install it), and the last command installs it on the system. The first two commands are done as a regular user, the third as root. For more details about setting up this system, see the excellent GNU Autoconf, Automake, and Libtool book by Vaughan, Elliston, Tromey, and Taylor. It is published as treeware by New Riders, and its content is also freely available online at http://sources.redhat.com/autobook/.

This is not the only standard, though it is one of the most widespread. The Ant (http://ant.apache.org/) build system is gaining popularity, especially with projects written in Java, and it has its own standard procedures for building and installing. Also, certain programming languages, such as Perl and Python, recommend that the same method be used for most programs written in that language (for example, Perl modules use the command perl Makefile.PL). If it's not obvious to you what the applicable standards are for your project, ask an experienced developer; you can safely assume that some standard applies, even if you don't know what it is at first.

Whatever the appropriate standards for you project are, don't deviate from them unless you absolutely must. Standard installation procedures are practically spinal reflexes for a lot of system administrators now. If they see familiar invocations documented in your project's INSTALL file, that instantly raises their faith that your project is generally aware of conventions, and that it is likely to have gotten other things right as well. Also, as discussed in la section intitulée « Téléchargements » in Chapitre 2, Genèse d'un projet, having a standard build procedure pleases potential developers.

On Windows, the standards for building and installing are a bit less settled. For projects requiring compilation, the general convention seems to be to ship a tree that can fit into the workspace/project model of the standard Microsoft development environments (Developer Studio, Visual Studio, VS.NET, MSVC++, etc.). Depending on the nature of your software, it may be possible to offer a Unix-like build option on Windows via the Cygwin (http://www.cygwin.com/) environment. And of course, if you're using a language or programming framework that comes with its own build and install conventions—e.g., Perl or Python—you should simply use whatever the standard method is for that framework, whether on Windows, Unix, Mac OS X, or any other operating system.

Be willing to put in a lot of extra effort in order to make your project conform to the relevant build or installation standards. Building and installing is an entry point: it's okay for things to get harder after that, if they absolutely must, but it would be a shame for the user's or developer's very first interaction with the software to require unexpected steps.

Binary Packages

Although the formal release is a source code package, most users will install from binary packages, either provided by their operating system's software distribution mechanism, or obtained manually from the project web site or from some third party. Here "binary" doesn't necessarily mean "compiled"; it just means any pre-configured form of the package that allows a user to install it on his computer without going through the usual source-based build and install procedures. On RedHat GNU/Linux, it is the RPM system; on Debian GNU/Linux, it is the APT (.deb) system; on MS Windows, it's usually .MSI files or self-installing .exe files.

Whether these binary packages are assembled by people closely associated with the project, or by distant third parties, users are going to treat them as equivalent to the project's official releases, and will file issues in the project's bug tracker based on the behavior of the binary packages. Therefore, it is in the project's interest to provide packagers with clear guidelines, and work closely with them to see to it that what they produce represents the software fairly and accurately.

The main thing packagers need to know is that they should always base their binary packages on an official source release. Sometimes packagers are tempted to pull a later incarnation of the code from the repository, or include selected changes that were committed after the release was made, in order to provide users with certain bug fixes or other improvements. The packager thinks he is doing his users a favor by giving them the more recent code, but actually this practice can cause a great deal of confusion. Projects are prepared to receive reports of bugs found in released versions, and bugs found in recent trunk and major branch code (that is, found by people who deliberately run bleeding edge code). When a bug report comes in from these sources, the responder will often be able to confirm that the bug is known to be present in that snapshot, and perhaps that it has since been fixed and that the user should upgrade or wait for the next release. If it is a previously unknown bug, having the precise release makes it easier to reproduce and easier to categorize in the tracker.

Projects are not prepared, however, to receive bug reports based on unspecified intermediate or hybrid versions. Such bugs can be hard to reproduce; also, they may be due to unexpected interactions in isolated changes pulled in from later development, and thereby cause misbehaviors that the project's developers should not have to take the blame for. I have even seen dismayingly large amounts of time wasted because a bug was absent when it should have been present: someone was running a slightly patched up version, based on (but not identical to) an official release, and when the predicted bug did not happen, everyone had to dig around a lot to figure out why.

Still, there will sometimes be circumstances when a packager insists that modifications to the source release are necessary. Packagers should be encouraged to bring this up with the project's developers and describe their plans. They may get approval, but failing that, they will at least have notified the project of their intentions, so the project can watch out for unusual bug reports. The developers may respond by putting a disclaimer on the project's web site, and may ask that the packager do the same thing in the appropriate place, so that users of that binary package know what they are getting is not exactly the same as what the project officially released. There need be no animosity in such a situation, though unfortunately there often is. It's just that packagers have a slightly different set of goals from developers. The packagers mainly want the best out-of-the-box experience for their users. The developers want that too, of course, but they also need to ensure that they know what versions of the software are out there, so they can receive coherent bug reports and make compatibility guarantees. Sometimes these goals conflict. When they do, it's good to keep in mind that the project has no control over the packagers, and that the bonds of obligation run both ways. It's true that the project is doing the packagers a favor simply by producing the software. But the packagers are also doing the project a favor, by taking on a mostly unglamorous job in order to make the software more widely available, often by orders of magnitude. It's fine to disagree with packagers, but don't flame them; just try to work things out as best you can.

Testing and Releasing

Once the source tarball is produced from the stabilized release branch, the public part of the release process begins. But before the tarball is made available to the world at large, it should be tested and approved by some minimum number of developers, usually three or more. Approval is not simply a matter of inspecting the release for obvious flaws; ideally, the developers download the tarball, build and install it onto a clean system, run the regression test suite (see la section intitulée « Automated testing ») in Chapitre 8, Managing Volunteers, and do some manual testing. Assuming it passes these checks, as well as any other release checklist criteria the project may have, the developers then digitally sign the tarball using GnuPG (http://www.gnupg.org/), PGP (http://www.pgpi.org/), or some other program capable of producing PGP-compatible signatures.

In most projects, the developers just use their personal digital signatures, instead of a shared project key, and as many developers as want to may sign (i.e., there is a minimum number, but not a maximum). The more developers sign, the more testing the release undergoes, and also the greater the likelihood that a security-conscious user can find a digital trust path from herself to the tarball.

Once approved, the release (that is, all tarballs, zip files, and whatever other formats are being distributed) should be placed into the project's download area, accompanied by the digital signatures, and by MD5/SHA1 checksums (see http://en.wikipedia.org/wiki/Cryptographic_hash_function). There are various standards for doing this. One way is to accompany each released package with a file giving the corresponding digital signatures, and another file giving the checksum. For example, if one of the released packages is scanley-2.5.0.tar.gz, place in the same directory a file scanley-2.5.0.tar.gz.asc containing the digital signature for that tarball, another file scanley-2.5.0.tar.gz.md5 containing its MD5 checksum, and optionally another, scanley-2.5.0.tar.gz.sha1, containing the SHA1 checksum. A different way to provide checking is to collect all the signatures for all the released packages into a single file, scanley-2.5.0.sigs; the same may be done with the checksums.

It doesn't really matter which way you do it. Just keep to a simple scheme, describe it clearly, and be consistent from release to release. The purpose of all this signing and checksumming is to give users a way to verify that the copy they receive has not been maliciously tampered with. Users are about to run this code on their computers—if the code has been tampered with, an attacker could suddenly have a back door to all their data. See la section intitulée « Security Releases » later in this chapter for more about paranoia.

Candidate Releases

For important releases containing many changes, many projects prefer to put out release candidates first, e.g., scanley-2.5.0-beta1 before scanley-2.5.0. The purpose of a candidate is to subject the code to wide testing before blessing it as an official release. If problems are found, they are fixed on the release branch and a new candidate release is rolled out (scanley-2.5.0-beta2). The cycle continues until no unacceptable bugs are left, at which point the last candidate release becomes the official release—that is, the only difference between the last candidate release and the real release is the removal of the qualifier from the version number.

In most other respects, a candidate release should be treated the same as a real release. The alpha, beta, or rc qualifier is enough to warn conservative users to wait until the real release, and of course the announcement emails for the candidate releases should point out that their purpose is to solicit feedback. Other than that, give candidate releases the same amount of care as regular releases. After all, you want people to use the candidates, because exposure is the best way to uncover bugs, and also because you never know which candidate release will end up becoming the official release.

Announcing Releases

Announcing a release is like announcing any other event, and should use the procedures described in la section intitulée « Publicity » in Chapitre 6, Communications. There are a few specific things to do for releases, though.

Whenever you give the URL to the downloadable release tarball, make sure to also give the MD5/SHA1 checksums and pointers to the digital signatures file. Since the announcement happens in multiple forums (mailing list, news page, etc.), this means users can get the checksums from multiple sources, which gives the most security-conscious among them extra assurance that the checksums themselves have not been tampered with. Giving the link to the digital signature files multiple times doesn't make those signatures more secure, but it does reassure people (especially those who don't follow the project closely) that the project takes security seriously.

In the announcement email, and on news pages that contain more than just a blurb about the release, make sure to include the relevant portion of the CHANGES file, so people can see why it might be in their interests to upgrade. This is as important with candidate releases as with final releases; the presence of bugfixes and new features is important in tempting people to try out a candidate release.

Finally, don't forget to thank the development team, the testers, and all the people who took the time to file good bug reports. Don't single out anyone by name, though, unless there's someone who is individually responsible for a huge piece of work, the value of which is widely recognized by everyone in the project. Just be wary of sliding down the slippery slope of credit inflation (see la section intitulée « Credit » in Chapitre 8, Managing Volunteers).

Maintaining Multiple Release Lines

Most mature projects maintain multiple release lines in parallel. For example, after 1.0.0 comes out, that line should continue with micro (bugfix) releases 1.0.1, 1.0.2, etc., until the project explicitly decides to end the line. Note that merely releasing 1.1.0 is not sufficient reason to end the 1.0.x line. For example, some users make it a policy never to upgrade to the first release in a new minor or major series—they let others shake the bugs out of, say 1.1.0, and wait until 1.1.1. This isn't necessarily selfish (remember, they're forgoing the bugfixes and new features too); it's just that, for whatever reason, they've decided to be very careful with upgrades. Accordingly, if the project learns of a major bug in 1.0.3 right before it's about to release 1.1.0, it would be a bit severe to just put the bugfix in 1.1.0 and tell all the old 1.0.x users they should upgrade. Why not release both 1.1.0 and 1.0.4, so everyone can be happy?

After the 1.1.x line is well under way, you can declare 1.0.x to be at end of life. This should be announced officially. The announcement could stand alone, or it could be mentioned as part of a 1.1.x release announcement; however you do it, users need to know that the old line is being phased out, so they can make upgrade decisions accordingly.

Some projects set a window of time during which they pledge to support the previous release line. In an open source context, "support" means accepting bug reports against that line, and making maintenance releases when significant bugs are found. Other projects don't give a definite amount of time, but watch incoming bug reports to gauge how many people are still using the older line. When the percentage drops below a certain point, they declare end of life for the line and stop supporting it.

For each release, make sure to have a target version or target milestone available in the bug tracker, so people filing bugs will be able to do so against the proper release. Don't forget to also have a target called "development" or "latest" for the most recent development sources, since some people—not only active developers—will often stay ahead of the official releases.

Security Releases

Most of the details of handling security bugs were covered in la section intitulée « Announcing Security Vulnerabilities » in Chapitre 6, Communications, but there are some special details to discuss for doing security releases.

A security release is a release made solely to close a security vulnerability. The code that fixes the bug cannot be made public until the release is available, which means not only that the fixes cannot be committed to the repository until the day of the release, but also that the release cannot be publicly tested before it goes out the door. Obviously, the developers can examine the fix among themselves, and test the release privately, but widespread real-world testing is not possible.

Because of this lack of testing, a security release should always consist of some existing release plus the fixes for the security bug, with no other changes. This is because the more changes you ship without testing, the more likely that one of them will cause a new bug, perhaps even a new security bug! This conservatism is also friendly to administrators who may need to deploy the security fix, but whose upgrade policy prefers that they not deploy any other changes at the same time.

Making a security release sometimes involves some minor deception. For example, the project may have been working on a 1.1.3 release, with certain bug fixes to 1.1.2 already publicly declared, when a security report comes in. Naturally, the developers cannot talk about the security problem until they make the fix available; until then, they must continue to talk publicly as though 1.1.3 will be what it's always been planned to be. But when 1.1.3 actually comes out, it will differ from 1.1.2 only in the security fixes, and all those other fixes will have been deferred to 1.1.4 (which, of course, will now also contain the security fix, as will all other future releases).

You could add an extra component to an existing release to indicate that it contains security changes only. For example, people would be able to tell just from the numbers that 1.1.2.1 is a security release against 1.1.2, and they would know that any release "higher" than that (e.g., 1.1.3, 1.2.0, etc.) contains the same security fixes. For those in the know, this system conveys a lot of information. On the other hand, for those not following the project closely, it can be a bit confusing to see a three-component release number most of the time with an occasional four-component one thrown in seemingly at random. Most projects I've looked at choose consistency and simply use the next regularly scheduled number for security releases, even when it means shifting other planned releases by one.

Releases and Daily Development

Maintaining parallel releases simultaneously has implications for how daily development is done. In particular, it makes practically mandatory a discipline that would be recommended anyway: have each commit be a single logical change, and never mix unrelated changes in the same commit. If a change is too big or too disruptive to do in one commit, break it across N commits, where each commit is a well-partitioned subset of the overall change, and includes nothing unrelated to the overall change.

Here's an example of an ill-thought-out commit:

------------------------------------------------------------------------
r6228 | jrandom | 2004-06-30 22:13:07 -0500 (Wed, 30 Jun 2004) | 8 lines

Fix Issue #1729: Make indexing gracefully warn the user when a file
is changing as it is being indexed.

* ui/repl.py
  (ChangingFile): New exception class.
  (DoIndex): Handle new exception.

* indexer/index.py
  (FollowStream): Raise new exception if file changes during indexing.
  (BuildDir): Unrelatedly, remove some obsolete comments, reformat
  some code, and fix the error check when creating a directory.

Other unrelated cleanups:

* www/index.html: Fix some typos, set next release date.
------------------------------------------------------------------------

The problem with it becomes apparent as soon as someone needs to port the BuildDir error check fix over to a branch for an upcoming maintenance release. The porter doesn't want any of the other changes—for example, perhaps the fix to issue #1729 wasn't approved for the maintenance branch at all, and the index.html tweaks would simply be irrelevant there. But she cannot easily grab just the BuildDir change via the version control tool's merge functionality, because the version control system was told that that change is logically grouped with all these other unrelated things. In fact, the problem would become apparent even before the merge. Merely listing the change for voting would become problematic: instead of just giving the revision number, the proposer would have to make a special patch or change branch just to isolate the portion of the commit being proposed. That would be a lot of work for others to suffer through, and all because the original committer couldn't be bothered to break things into logical groups.

In fact, that commit really should have been four separate commits: one to fix issue #1729, another to remove obsolete comments and reformat code in BuildDir, another to fix the error check in BuildDir, and finally, one to tweak index.html. The third of those commits would be the one proposed for the maintenance release branch.

Of course, release stabilization is not the only reason why having each commit be one logical change is desirable. Psychologically, a semantically unified commit is easier to review, and easier to revert if necessary (in some version control systems, reversion is really a special kind of merge anyway). A little up-front discipline on everyone's part can save the project a lot of headache later.

Planning Releases

One area where open source projects have historically differed from proprietary projects is in release planning. Proprietary projects usually have firmer deadlines. Sometimes it's because customers were promised that an upgrade would be available by a certain date, because the new release needs to be coordinated with some other effort for marketing purposes, or because the venture capitalists who invested in the whole thing need to see some results before they put in any more funding. Free software projects, on the other hand, were until recently mostly motivated by amateurism in the most literal sense: they were written for the love of it. No one felt the need to ship before all the features were ready, and why should they? It wasn't as if anyone's job was on the line.

Nowadays, many open source projects are funded by corporations, and are correspondingly more and more influenced by deadline-conscious corporate culture. This is in many ways a good thing, but it can cause conflicts between the priorities of those developers who are being paid and those who are volunteering their time. These conflicts often happen around the issue of when and how to schedule releases. The salaried developers who are under pressure will naturally want to just pick a date when the releases will occur, and have everyone's activities fall into line. But the volunteers may have other agendas—perhaps features they want to complete, or some testing they want to have done—that they feel the release should wait on.

There is no general solution to this problem except discussion and compromise, of course. But you can minimize the frequency and degree of friction caused, by decoupling the proposed existence of a given release from the date when it would go out the door. That is, try to steer discussion toward the subject of which releases the project will be making in the near- to medium-term future, and what features will be in them, without at first mentioning anything about dates, except for rough guesses with wide margins of error. By nailing down feature sets early, you reduce the complexity of the discussion centered on any individual release, and therefore improve predictability. This also creates a kind of inertial bias against anyone who proposes to expand the definition of a release by adding new features or other complications. If the release's contents are fairly well defined, the onus is on the proposer to justify the expansion, even though the date of the release may not have been set yet.

In his multi-volume biography of Thomas Jefferson, Jefferson and His Time, Dumas Malone tells the story of how Jefferson handled the first meeting held to decide the organization of the future University of Virginia. The University had been Jefferson's idea in the first place, but (as is the case everywhere, not just in open source projects) many other parties had climbed on board quickly, each with their own interests and agendas. When they gathered at that first meeting to hash things out, Jefferson made sure to show up with meticulously prepared architectural drawings, detailed budgets for construction and operation, a proposed curriculum, and the names of specific faculty he wanted to import from Europe. No one else in the room was even remotely as prepared; the group essentially had to capitulate to Jefferson's vision, and the University was eventually founded more or less in accordance with his plans. The facts that construction went far over budget, and that many of his ideas did not, for various reasons, work out in the end, were all things Jefferson probably knew perfectly well would happen. His purpose was strategic: to show up at the meeting with something so substantive that everyone else would have to fall into the role of simply proposing modifications to it, so that the overall shape, and therefore schedule, of the project would be roughly as he wanted.

In the case of a free software project, there is no single "meeting", but instead a series of small proposals made mostly by means of the issue tracker. But if you have some credibility in the project to start with, and you start assigning various features, enhancements, and bugs to target releases in the issue tracker, according to some announced overall plan, people will mostly go along with you. Once you've got things laid out more or less as you want them, the conversations about actual release dates will go much more smoothly.

It is crucial, of course, to never present any individual decision as written in stone. In the comments associated with each assignment of an issue to a specific future release, invite discussion, dissent, and be genuinely willing to be persuaded whenever possible. Never exercise control merely for the sake of exercising control: the more deeply others participate in the release planning process (see la section intitulée « Share Management Tasks as Well as Technical Tasks » in Chapitre 8, Managing Volunteers), the easier it will be to persuade them to share your priorities on the issues that really count for you.

The other way the project can lower tensions around release planning is to make releases fairly often. When there's a long time between releases, the importance of any individual release is magnified in everyone's minds; people are that much more crushed when their code doesn't make it in, because they know how long it might be until the next chance. Depending on the complexity of the release process and the nature of your project, somewhere between every three and six months is usually about the right gap between releases, though maintenance lines may put out micro releases a bit faster, if there is demand for them.

Chapitre 8. Managing Volunteers

Getting people to agree on what a project needs, and to work together to achieve it, requires more than just a genial atmosphere and a lack of obvious dysfunction. It requires someone, or several someones, consciously managing all the people involved. Managing volunteers may not be a technical craft in the same sense as computer programming, but it is a craft in the sense that it can be improved through study and practice.

This chapter is a grab-bag of specific techniques for managing volunteers. It draws, perhaps more heavily than previous chapters, on the Subversion project as a case study, partly because I was working on that project as I wrote this and had all the primary sources close at hand, and partly because it's more acceptable to cast critical stones into one's own glass house than into others'. But I have also seen in various other projects the benefits of applying—and the consequences of not applying—the recommendations that follow; when it is politically feasible to give examples from some of those other projects, I will do so.

Speaking of politics, this is as good a time as any to drag that much-maligned word out for a closer look. Many engineers like to think of politics as something other people engage in. "I'm just advocating the best course for the project, but she's raising objections for political reasons." I believe this distaste for politics (or for what is imagined to be politics) is especially strong in engineers because engineers are bought into the idea that some solutions are objectively superior to others. Thus, when someone acts in a way that seems motivated by outside considerations—say, the maintenance of his own position of influence, the lessening of someone else's influence, outright horse-trading, or avoiding hurting someone's feelings—other participants in the project may get annoyed. Of course, this rarely prevents them from behaving in the same way when their own vital interests are at stake.

If you consider "politics" a dirty word, and hope to keep your project free of it, give up right now. Politics are inevitable whenever people have to cooperatively manage a shared resource. It is absolutely rational that one of the considerations going into each person's decision-making process is the question of how a given action might affect his own future influence in the project. After all, if you trust your own judgement and skills, as most programmers do, then the potential loss of future influence has to be considered a technical result, in a sense. Similar reasoning applies to other behaviors that might seem, on their face, like "pure" politics. In fact, there is no such thing as pure politics: it is precisely because actions have multiple real-world consequences that people become politically conscious in the first place. Politics is, in the end, simply an acknowledgment that all consequences of decisions must be taken into account. If a particular decision leads to a result that most participants find technically satisfying, but involves a change in power relationships that leaves key people feeling isolated, the latter is just as important a result as the former. To ignore it would not be high-minded, but shortsighted.

So as you read the advice that follows, and as you work with your own project, remember that there is no one who is above politics. Appearing to be above politics is merely one particular political strategy, and sometimes a very useful one, but it is never the reality. Politics is simply what happens when people disagree, and successful projects are those that evolve political mechanisms for managing disagreement constructively.

Getting the Most Out of Volunteers

Why do volunteers work on free software projects?[20]

When asked, many claim they do it because they want to produce good software, or want to be personally involved in fixing the bugs that matter to them. But these reasons are usually not the whole story. After all, could you imagine a volunteer staying with a project even if no one ever said a word in appreciation of his work, or listened to him in discussions? Of course not. Clearly, people spend time on free software for reasons beyond just an abstract desire to produce good code. Understanding volunteers' true motivations will help you arrange things so as to attract and keep them. The desire to produce good software may be among those motivations, along with the challenge and educational value of working on hard problems. But humans also have a built-in desire to work with other humans, and to give and earn respect through cooperative activities. Groups engaged in cooperative activities must evolve norms of behavior such that status is acquired and kept through actions that help the group's goals.

Those norms won't always arise by themselves. For example, on some projects—experienced open source developers can probably name several off the tops of their heads—people apparently feel that status is acquired by posting frequently and verbosely. They don't come to this conclusion accidentally; they come to it because they are rewarded with respect for making long, intricate arguments, whether or not that actually helps the project. Following are some techniques for creating an atmosphere in which status-acquiring actions are also constructive actions.

Delegation

Delegation is not merely a way to spread the workload around; it is also a political and social tool. Consider all the effects when you ask someone to do something. The most obvious effect is that, if he accepts, he does the task and you don't. But another effect is that he is made aware that you trusted him to handle the task. Furthermore, if you made the request in a public forum, then he knows that others in the group have been made aware of that trust too. He may also feel some pressure to accept, which means you must ask in a way that allows him to decline gracefully if he doesn't really want the job. If the task requires coordination with others in the project, you are effectively proposing that he become more involved, form bonds that might not otherwise have been formed, and perhaps become a source of authority in some subdomain of the project. The added involvement may be daunting, or it may lead him to become engaged in other ways as well, from an increased feeling of overall commitment.

Because of all these effects, it often makes sense to ask someone else to do something even when you know you could do it faster or better yourself. Of course, there is sometimes a strict economic efficiency argument for this anyway: perhaps the opportunity cost of doing it yourself would be too high—there might be something even more important you could do with that time. But even when the opportunity cost argument doesn't apply, you may still want to ask someone else to take on the task, because in the long run you want to draw that person deeper into the project, even if it means spending extra time watching over them at first. The converse technique also applies: if you occasionally volunteer for work that someone else doesn't want or have time to do, you will gain his good will and respect. Delegation and substitution are not just about getting individual tasks done; they're also about drawing people into a closer committment to the project.

Distinguish clearly between inquiry and assignment

Sometimes it is fair to expect that a person will accept a particular task. For example, if someone writes a bug into the code, or commits code that fails to comply with project guidelines in some obvious way, then it is enough to point out the problem and thereafter behave as though you assume the person will take care of it. But there are other situations where it is by no means clear that you have a right to expect action. The person may do as you ask, or may not. Since no one likes to be taken for granted, you need to be sensitive to the difference between these two types of situations, and tailor your requests accordingly.

One thing that almost always causes people instant annoyance is being asked to do something in a way that implies that you think it is clearly their responsibility to do it, when they feel otherwise. For example, assignment of incoming issues is particularly fertile ground for this kind of annoyance. The participants in a project usually know who is expert in what areas, so when a bug report comes in, there will often be one or two people whom everyone knows could probably fix it quickly. However, if you assign the issue over to one of those people without her prior permission, she may feel she has been put into an uncomfortable position. She senses the pressure of expectation, but also may feel that she is, in effect, being punished for her expertise. After all, the way one acquires expertise is by fixing bugs, so perhaps someone else should take this one! (Note that issue trackers that automatically assign issues to particular people based on information in the bug report are less likely to offend, because everyone knows that the assignment was made by an automated process, and is not an indication of human expectations.)

While it would be nice to spread the load as evenly as possible, there are certain times when you just want to encourage the person who can fix a bug the fastest to do so. Given that you can't afford a communications turnaround for every such assignment ("Would you be willing to look at this bug?" "Yes." "Okay, I'm assigning the issue over to you then." "Okay."), you should simply make the assignment in the form of an inquiry, conveying no pressure. Virtually all issue trackers allow a comment to be associated with the assignment of an issue. In that comment, you can say something like this:

Assigning this over to you, jrandom, because you're most familiar with this code. Feel free to bounce this back if you don't have time to look at it, though. (And let me know if you'd prefer not to receive such requests in the future.)

This distinguishes clearly between the request for assignment and the recipient's acceptance of that assignment. The audience here isn't only the assignee, it's everyone: the entire group sees a public confirmation of the assignee's expertise, but the message also makes it clear that the assignee is free to accept or decline the responsibility.

Follow up after you delegate

When you ask someone to do something, remember that you have done so, and follow up with him no matter what. Most requests are made in public forums, and are roughly of the form "Can you take care of X? Let us know either way; no problem if you can't, just need to know." You may or may not get a response. If you do, and the response is negative, the loop is closed—you'll need to try some other strategy for dealing with X. If there is a positive response, then keep an eye out for progress on the issue, and comment on the progress you do or don't see (everyone works better when they know someone else is appreciating their work). If there is no response after a few days, ask again, or post saying that you got no response and are looking for someone else to do it. Or just do it yourself, but still make sure to say that you got no response to the initial inquiry.

The purpose of publicly noting the lack of response is not to humiliate the person, and your remarks should be phrased so as not to have that effect. The purpose is simply to show that you keep track of what you have asked for, and that you notice the reactions you get. This makes people more likely to say yes next time, because they will observe (even if only unconsciously) that you are likely to notice any work they do, given that you noticed the much less visible event of someone failing to respond.

Notice what people are interested in

Another thing that makes people happy is to have their interests noticed—in general, the more aspects of someone's personality you notice and remember, the more comfortable he will be, and the more he will want to work with groups of which you are a part.

For example, there was a sharp distinction in the Subversion project between people who wanted to reach a definitive 1.0 release (which we eventually did), and people who mainly wanted to add new features and work on interesting problems but who didn't much care when 1.0 came out. Neither of these positions is better or worse than the other; they're just two different kinds of developers, and both kinds do lots of work on the project. But we swiftly learned that it was important to not assume that the excitement of the 1.0 drive was shared by everyone. Electronic media can be very deceptive: you may sense an atmosphere of shared purpose when, in fact, it's shared only by the people you happen to have been talking to, while others have completely different priorities.

The more aware you are of what people want out of the project, the more effectively you can make requests of them. Even just demonstrating an understanding of what they want, without making any associated request, is useful, in that it confirms to each person that she's not just another particle in an undifferentiated mass.

Praise and Criticism

Praise and criticism are not opposites; in many ways, they are very similar. Both are primarily forms of attention, and are most effective when specific rather than generic. Both should be deployed with concrete goals in mind. Both can be diluted by inflation: praise too much or too often and you will devalue your praise; the same is true for criticism, though in practice, criticism is usually reactive and therefore a bit more resistant to devaluation.

An important feature of technical culture is that detailed, dispassionate criticism is often taken as a kind of praise (as discussed in la section intitulée « Recognizing Rudeness » in Chapitre 6, Communications), because of the implication that the recipient's work is worth the time required to analyze it. However, both of those conditions—detailed and dispassionate—must be met for this to be true. For example, if someone makes a sloppy change to the code, it is useless (and actually harmful) to follow up saying simply "That was sloppy." Sloppiness is ultimately a characteristic of a person, not of their work, and it's important to keep your reactions focused on the work. It's much more effective to describe all the things wrong with the change, tactfully and without malice. If this is the third or fourth careless change in a row by the same person, it's appropriate to say that—again without anger—at the end of your critique, to make it clear that the pattern has been noticed.

If someone does not improve in response to criticism, the solution is not more or stronger criticism. The solution is for the group to remove that person from the position of incompetence, in a way that minimizes hurt feelings as much as possible; see la section intitulée « Transitions » later in this chapter for examples. That is a rare occurrence, however. Most people respond pretty well to criticism that is specific, detailed, and contains a clear (even if unspoken) expectation of improvement.

Praise won't hurt anyone's feelings, of course, but that doesn't mean it should be used any less carefully than criticism. Praise is a tool: before you use it, ask yourself why you want to use it. As a rule, it's not a good idea to praise people for doing what they usually do, or for actions that are a normal and expected part of participating in the group. If you were to do that, it would be hard to know when to stop: should you praise everyone for doing the usual things? After all, if you leave some people out, they'll wonder why. It's much better to express praise and gratitude sparingly, in response to unusual or unexpected efforts, with the intention of encouraging more such efforts. When a participant seems to have moved permanently into a state of higher productivity, adjust your praise threshold for that person accordingly. Repeated praise for normal behavior gradually becomes meaningless anyway. Instead, that person should sense that her high level of productivity is now considered normal and natural, and only work that goes beyond that should be specially noticed.

This is not to say that the person's contributions shouldn't be acknowledged, of course. But remember that if the project is set up right, everything that person does is already visible anyway, and so the group will know (and the person will know that the rest of the group knows) everything she does. There are also ways to acknowledge someone's work by means other than direct praise. You could mention in passing, while discussing a related topic, that she has done a lot of work in the given area and is the resident expert there; you could publicly consult her on some question about the code; or perhaps most effectively, you could conspicuously make further use of the work she has done, so she sees that others are now comfortable relying on the results of her work. It's probably not necessary to do these things in any calculated way. Someone who regularly makes large contributions in a project will know it, and will occupy a position of influence by default. There's usually no need to take explicit steps to ensure this, unless you sense that, for whatever reason, a contributor is underappreciated.

Prevent Territoriality

Watch out for participants who try to stake out exclusive ownership of certain areas of the project, and who seem to want to do all the work in those areas, to the extent of aggressively taking over work that others start. Such behavior may even seem healthy at first. After all, on the surface it looks like the person is taking on more responsibility, and showing increased activity within a given area. But in the long run, it is destructive. When people sense a "no trespassing" sign, they stay away. This results in reduced review in that area, and greater fragility, because the lone developer becomes a single point of failure. Worse, it fractures the cooperative, egalitarian spirit of the project. The theory should always be that any developer is welcome to help out on any task at any time. Of course, in practice things work a bit differently: people do have areas where they are more and less influential, and non-experts frequently defer to experts in certain domains of the project. But the key is that this is all voluntary: informal authority is granted based on competence and proven judgement, but it should never be actively taken. Even if the person desiring the authority really is competent, it is still crucial that she hold that authority informally, through the consensus of the group, and that the authority never cause her to exclude others from working in that area.

Rejecting or editing someone's work for technical reasons is an entirely different matter, of course. There, the decisive factor is the content of the work, not who happened to act as gatekeeper. It may be that the same person happens to do most of the reviewing for a given area, but as long as he never tries to prevent someone else from doing that work too, things are probably okay.

In order to combat incipient territorialism, or even the appearance of it, many projects have taken the step of banning the inclusion of author names or designated maintainer names in source files. I wholeheartedly agree with this practice: we follow it in the Subversion project, and it is more or less official policy at the Apache Software Foundation. ASF member Sander Striker puts it this way:

At the Apache Software foundation we discourage the use of author tags in source code. There are various reasons for this, apart from the legal ramifications. Collaborative development is about working on projects as a group and caring for the project as a group. Giving credit is good, and should be done, but in a way that does not allow for false attribution, even by implication. There is no clear line for when to add or remove an author tag. Do you add your name when you change a comment? When you put in a one-line fix? Do you remove other author tags when you refactor the code and it looks 95% different? What do you do about people who go about touching every file, changing just enough to make the virtual author tag quota, so that their name will be everywhere?

There are better ways to give credit, and our preference is to use those. From a technical standpoint author tags are unnecessary; if you wish to find out who wrote a particular piece of code, the version control system can be consulted to figure that out. Author tags also tend to get out of date. Do you really wish to be contacted in private about a piece of code you wrote five years ago and were glad to have forgotten?

A software project's source code files are the core of its identity. They should reflect the fact that the developer community as a whole is responsible for them, and not be divided up into little fiefdoms.

People sometimes argue in favor of author or maintainer tags in source files on the grounds that this gives visible credit to those who have done the most work there. There are two problems with this argument. First, the tags inevitably raise the awkward question of how much work one must do to get one's own name listed there too. Second, they conflate the issue of credit with that of authority: having done work in the past does not imply ownership of the area where the work was done, but it's difficult if not impossible to avoid such an implication when individual names are listed at the tops of source files. In any case, credit information can already be obtained from the version control logs and other out-of-band mechanisms like mailing list archives, so no information is lost by banning it from the source files themselves.

If your project decides to ban individual names from source files, make sure not to go overboard. For instance, many projects have a contrib/ area where small tools and helper scripts are kept, often written by people who are otherwise not associated with the project. It's fine for those files to contain author names, because they are not really maintained by the project as a whole. On the other hand, if a contributed tool starts getting hacked on by other people in the project, eventually you may want to move it to a less isolated location and, assuming the original author approves, remove the author's name, so that the code looks like any other community-maintained resource. If the author is sensitive about this, compromise solutions are acceptable, for example:

# indexclean.py: Remove old data from a Scanley index.
#
# Original Author: K. Maru <kobayashi@yetanotheremailservice.com>
# Now Maintained By: The Scanley Project <http://www.scanley.org/>
#                    and K. Maru.
# 
# ...

But it's better to avoid such compromises, if possible, and most authors are willing to be persuaded, because they're happy that their contribution is being made a more integral part of the project.

The important thing is to remember that there is a continuum between the core and the periphery of any project. The main source code files for the software are clearly part of the core, and should be considered as maintained by the community. On the other hand, companion tools or pieces of documentation may be the work of single individuals, who maintain them essentially alone, even though the works may be associated with, and even distributed with, the project. There is no need to apply a one-size-fits-all rule to every file, as long as the principle that community-maintained resources are not allowed to become individual territories is upheld.

The Automation Ratio

Try not to let humans do what machines could do instead. As a rule of thumb, automating a common task is worth at least ten times the effort a developer would spend doing that task manually one time. For very frequent or very complex tasks, that ratio could easily go up to twenty or even higher.

Thinking of yourself as a "project manager", rather than just another developer, may be a useful attitude here. Sometimes individual developers are too wrapped up in low-level work to see the big picture and realize that everyone is wasting a lot of effort performing automatable tasks manually. Even those who do realize it may not take the time to solve the problem: because each individual performance of the task does not feel like a huge burden, no one ever gets annoyed enough to do anything about it. What makes automation compelling is that that small burden is multiplied by the number of times each developer incurs it, and then that number is multiplied by the number of developers.

Here, I am using the term "automation" broadly, to mean not only repeated actions where one or two variables change each time, but any sort of technical infrastructure that assists humans. The minimum standard automation required to run a project these days was described in Chapitre 3, L'infrastructure technique, but each project may have its own special problems too. For example, a group working on documentation might want to have a web site displaying the most up-to-date versions of the documents at all times. Since documentation is often written in a markup language like XML, there may be a compilation step, often quite intricate, involved in creating displayable or downloadable documents. Arranging a web site where such compilation happens automatically on every commit can be complicated and time-consuming—but it is worth it, even if it costs you a day or more to set up. The overall benefits of having up-to-date pages available at all times are huge, even though the cost of not having them might seem like only a small annoyance at any single moment, to any single developer.

Taking such steps eliminates not merely wasted time, but the griping and frustration that ensue when humans make missteps (as they inevitably will) in trying to perform complicated procedures manually. Multi-step, deterministic operations are exactly what computers were invented for; save your humans for more interesting things.

Automated testing

Automated test runs are helpful for any software project, but especially so for open source projects, because automated testing (especially regression testing) allows developers to feel comfortable changing code in areas they are unfamiliar with, and thus encourages exploratory development. Since detecting breakage is so hard to do by hand—one essentially has to guess where one might have broken something, and try various experiments to prove that one didn't—having automated ways to detect such breakage saves the project a lot of time. It also makes people much more relaxed about refactoring large swaths of code, and therefore contributes to the software's long-term maintainability.

Regression testing is not a panacea. For one thing, it works best for programs with batch-style interfaces. Software that is operated primarily through graphical user interfaces is much harder to drive programmatically. Another problem is that the regression test suite framework itself can often be quite complex, with a learning curve and maintenance burden all its own. Reducing this complexity is one of the most useful things you can do, even though it may take a considerable amount of time. The easier it is to add new tests to the suite, the more developers will do so, and the fewer bugs will survive to release. Any effort spent making tests easier to write will be paid back manyfold over the lifetime of the project.

Many projects have a "Don't break the build!" rule, meaning: don't commit a change that makes the software unable to compile or run. Being the person who broke the build is usually cause for mild embarrassment and ribbing. Projects with regression test suites often have a corollary rule: don't commit any change that causes tests to fail. Such failures are easiest to spot if there are automatic nightly runs of the entire test suite, with the results mailed out to the development list or to a dedicated test-results mailing list; that's another example of a worthwhile automation.

Most volunteer developers are willing to take the extra time to write regression tests, when the test system is comprehensible and easy to work with. Accompanying changes with tests is understood to be the responsible thing to do, and it's also an easy opportunity for collaboration: often two developers will divide up the work for a bugfix, with one writing the fix itself, and the other writing the test. The latter developer may often end up with more work, and since writing a test is already less satisfying than actually fixing the bug, it is imperative that the test suite not make the experience more painful than it has to be.

Some projects go even further, requiring that a new test accompany every bugfix or new feature. Whether this is a good idea or not depends on many factors: the nature of the software, the makeup of the development team, and the difficulty of writing new tests. The CVS (http://www.cvshome.org/) project has long had such a rule. It is a good policy in theory, since CVS is version control software and therefore very risk-averse about the possibility of munging or mishandling the user's data. The problem in practice is that CVS's regression test suite is a single huge shell script (amusingly named sanity.sh), hard to read and hard to modify or extend. The difficulty of adding new tests, combined with the requirement that patches be accompanied by new tests, means that CVS effectively discourages patches. When I used to work on CVS, I sometimes saw people start on and even complete a patch to CVS's own code, but give up when told of the requirement to add a new test to sanity.sh.

It is normal to spend more time writing a new regression test than on fixing the original bug. But CVS carried this phenomenon to an extreme: one might spend hours trying to design one's test properly, and still get it wrong, because there are just too many unpredictable complexities involved in changing a 35,000-line Bourne shell script. Even longtime CVS developers often grumbled when they had to add a new test.

This situation was due to a failure on all our parts to consider the automation ratio. It is true that switching to a real test framework—whether custom-built or off-the-shelf—would have been a major effort.[21] But neglecting to do so has cost the project much more, over the years. How many bugfixes and new features are not in CVS today, because of the impediment of an awkward test suite? We cannot know the exact number, but it is surely many times greater than the number of bugfixes or new features the developers might forgo in order to develop a new test system (or integrate an off-the-shelf system). That task would only take a finite amount of time, while the penalty of using the current test suite will continue forever if nothing is done.

The point is not that having strict requirements to write tests is bad, nor that writing your test system as a Bourne shell script is necessarily bad. It might work fine, depending on how you design it and what it needs to test. The point is simply that when the test system becomes a significant impediment to development, something must be done. The same is true for any routine process that turns into a barrier or a bottleneck.

Treat Every User as a Potential Volunteer

Each interaction with a user is an opportunity to get a new volunteer. When a user takes the time to post to one of the project's mailing lists, or to file a bug report, he has already tagged himself as having more potential for involvement than most users (from whom the project will never hear at all). Follow up on that potential: if he described a bug, thank him for the report and ask him if he wants to try fixing it. If he wrote to say that an important question was missing from the FAQ, or that the program's documentation was deficient in some way, then freely acknowledge the problem (assuming it really exists) and ask if he's interested in writing the missing material himself. Naturally, much of the time the user will demur. But it doesn't cost much to ask, and every time you do, it reminds the other listeners in that forum that getting involved in the project is something anyone can do.

Don't limit your goals to acquiring new developers and documentation writers. For example, even training people to write good bug reports pays off in the long run, if you don't spend too much time per person, and if they go on to submit more bug reports in the future—which they are more likely to do if they got a constructive reaction to their first report. A constructive reaction need not be a fix for the bug, although that's always the ideal; it can also be a solicitation for more information, or even just a confirmation that the behavior is a bug. People want to be listened to. Secondarily, they want their bugs fixed. You may not always be able to give them the latter in a timely fashion, but you (or rather, the project as a whole) can give them the former.

A corollary of this is that developers should not express anger at people who file well-intended but vague bug reports. This is one of my personal pet peeves; I see developers do it all the time on various open source mailing lists, and the harm it does is palpable. Some hapless newbie will post a useless report:

Hi, I can't get Scanley to run. Every time I start it up, it just errors. Is anyone else seeing this problem?

Some developer—who has seen this kind of report a thousand times, and hasn't stopped to think that the newbie has not—will respond like this:

What are we supposed to do with so little information? Sheesh. Give us at least some details, like the version of Scanley, your operating system, and the error.

This developer has failed to see things from the user's point of view, and also failed to consider the effect such a reaction might have on all the other people watching the exchange. Naturally a user who has no programming experience, and no prior experience reporting bugs, will not know how to write a bug report. What is the right way to handle such a person? Educate them! And do it in such a way that they come back for more:

Sorry you're having trouble. We'll need more information in order to figure out what's happening here. Please tell us the version of Scanley, your operating system, and the exact text of the error. The very best thing you can do is send a transcript showing the exact commands you ran, and the output they produced. See http://www.scanley.org/how_to_report_a_bug.html for more.

This way of responding is far more effective at extracting the needed information from the user, because it is written to the user's point of view. First, it expresses sympathy: You had a problem; we feel your pain. (This is not necessary in every bug report response; it depends on the severity of the problem and how upset the user seemed.) Second, instead of belittling her for not knowing how to report a bug, it tells her how, and in enough detail to be actually useful—for example, many users don't realize that "show us the error" means "show us the exact text of the error, with no omissions or abridgements." The first time you work with such a user, you need to be specific about that. Finally, it offers a pointer to much more detailed and complete instructions for reporting bugs. If you have successfully engaged with the user, she will often take the time to read that document and do what it says. This means, of course, that you have to have the document prepared in advance. It should give clear instructions about what kind of information your development team wants to see in every bug report. Ideally, it should also evolve over time in response to the particular sorts of omissions and misreports users tend to make for your project.

The Subversion project's bug reporting instructions are a fairly standard example of the form (see Annexe D, Example Instructions for Reporting Bugs). Notice how they close with an invitation to provide a patch to fix the bug. This is not because such an invitation will lead to a greater patch/report ratio—most users who are capable of fixing bugs already know that a patch would be welcome, and don't need to be told. The invitation's real purpose is to emphasize to all readers, especially those new to the project or new to free software in general, that the project runs on volunteer contributions. In a sense, the project's current developers are no more responsible for fixing the bug than is the person who reported it. This is an important point that many new users will not be familiar with. Once they realize it, they're more likely to help make the fix happen, if not by contributing code then by providing a more thorough reproduction recipe, or by offering to test fixes that other people post. The goal is to make every user realize that there is no innate difference between herself and the people who work on the project—that it's a question of how much time and effort one puts in, not a question of who one is.

The admonition against responding angrily does not apply to rude users. Occasionally people post bug reports or complaints that, regardless of their informational content, show a sneering contempt at the project for some failing. Often such people are alternately insulting and flattering, such as the person who posted this to a Subversion mailing list:

Why is it that after almost 6 days there still aren't any binaries posted for the windows platform?!? It's the same story every time and it's pretty frustrating. Why aren't these things automated so that they could be available immediately?? When you post an "RC" build, I think the idea is that you want users to test the build, but yet you don't provide any way of doing so. Why even have a soak period if you provide no means of testing??

Initial response to this rather inflammatory post was surprisingly restrained: people pointed out that the project had a published policy of not providing official binaries, and said, with varying degrees of annoyance, that he ought to volunteer to produce them himself if they were so important to him. Believe it or not, his next post started with these lines:

First of all, let me say that I think Subversion is awesome and I really appreciate the efforts of everyone involved. [...]

...and then he went on to berate the project again for not providing binaries, while still not volunteering to do anything about it. After that, about 50 people just jumped all over him, and I can't say I really minded. The "zero-tolerance" policy toward rudeness advocated in la section intitulée « Tuez l'agressivité dans l'oeuf » in Chapitre 2, Genèse d'un projet applies to people with whom the project has (or would like to have) a sustained interaction. But when someone makes it clear from the start that he is going to be a fountain of bile, there is no point making him feel welcome.

Such situations are fortunately quite rare, and they are noticeably rarer in projects that make an effort to engage users constructively and courteously from their very first interaction.

Share Management Tasks as Well as Technical Tasks

Share the management burden as well as the technical burden of running the project. As a project becomes more complex, more and more of the work is about managing people and information flow. There is no reason not to share that burden, and sharing it does not necessarily require a top-down hierarchy either—what happens in practice tends to be more of a peer-to-peer network topology than a military-style command structure.

Sometimes management roles are formalized, and sometimes they happen spontaneously. In the Subversion project, we have a patch manager, a translation manager, documentation managers, issue managers (albeit unofficial), and a release manager. Some of these roles we made a conscious decision to initiate, others just happened by themselves; as the project grows, I expect more roles to be added. Below we'll examine these roles, and a couple of others, in detail (except for release manager, which was already covered in la section intitulée « Release manager » and la section intitulée « Dictatorship by Release Owner » earlier in this chapter).

As you read the role descriptions, notice that none of them requires exclusive control over the domain in question. The issue manager does not prevent other people from making changes in the issues database, the FAQ manager does not insist on being the only person to edit the FAQ, and so on. These roles are all about responsibility without monopoly. An important part of each domain manager's job is to notice when other people are working in that domain, and train them to do the things the way the manager does, so that the multiple efforts reinforce rather than conflict. Domain managers should also document the processes by which they do their work, so that when one leaves, someone else can pick up the slack right away.

Sometimes there is a conflict: two or more people want the same role. There is no one right way to handle this. You could suggest that each volunteer post a proposal (an "application") and have all the committers vote on which is best. But this is cumbersome and potentially awkward. I find that a better technique is just to ask the multiple candidates to settle it among themselves. They usually will, and will be more satisfied with the result than if a decision had been imposed on them from the outside.

Patch Manager

In a free software project that receives a lot of patches, keeping track of which patches have arrived and what has been decided about them can be a nightmare, especially if done in a decentralized way. Most patches arrive as posts to the project's development mailing list (though some may appear first in the issue tracker, or on external web sites), and there are a number of different routes a patch can take after arrival.

Sometimes someone reviews the patch, finds problems, and bounces it back to the original author for cleanup. This usually leads to an iterative process—all visible on the mailing list—in which the original author posts revised versions of the patch until the reviewer has nothing more to criticize. It is not always easy to tell when this process is done: if the reviewer commits the patch, then clearly the cycle is complete. But if she does not, it might be because she simply didn't have time, or doesn't have commit access herself and couldn't rope any of the other developers into doing it.

Another frequent response to a patch is a freewheeling discussion, not necessarily about the patch itself, but about whether the concept behind the patch is good. For example, the patch may fix a bug, but the project prefers to fix that bug in another way, as part of solving a more general class of problems. Often this is not known in advance, and it is the patch that stimulates the discovery.

Occasionally, a posted patch is met with utter silence. Usually this is due to no developer having time at that moment to review the patch, so each hopes that someone else will do it. Since there's no particular limit to how long each person waits for someone else to pick up the ball, and meanwhile other priorities are always coming up, it's very easy for a patch to slip through the cracks without any single person intending for that to happen. The project might miss out on a useful patch this way, and there are other harmful side effects as well: it is discouraging to the author, who invested work in the patch, and it makes the project as a whole look a bit out of touch, especially to others considering writing patches.

The patch manager's job is to make sure that patches don't "slip through the cracks." This is done by following every patch through to some sort of stable state. The patch manager watches every mailing list thread that results from a patch posting. If it ends in a commit of the patch, he does nothing. If it goes into a review/revise iteration, ending with a final version of the patch but no commit, he files an issue pointing to the final version, and to the mailing list thread around it, so that there is a permanent record for developers to follow up on later. If the patch addresses an existing issue, he annotates that issue with the relevant information, instead of opening a new issue.

When a patch gets no reaction at all, the patch manager waits a few days, then follows up asking if anyone is going to review it. This usually gets a reaction: a developer may explain that she doesn't think the patch should be applied, and give the reasons why, or she may review it, in which case one of the previously described paths is taken. If there is still no response, the patch manager may or may not file an issue for the patch, at his discretion, but at least the original submitter got some reaction.

Having a patch manager has saved the Subversion development team a lot of time and mental energy. Without a designated person to take responsibility, every developer would constantly have to worry "If I don't have time to respond to this patch right now, can I count on someone else doing it? Should I try to keep an eye on it? But if other people are also keeping an eye on it, for the same reasons, then we'd have needlessly duplicated effort." The patch manager removes the second-guessing from the situation. Each developer can make the decision that is right for her at the moment she first sees the patch. If she wants to follow up with a review, she can do that—the patch manager will adjust his behavior accordingly. If she wants to ignore the patch completely, that's fine too; the patch manager will make sure it isn't forgotten.

Because this system works only if people can depend on the patch manager being there without fail, the role should be held formally. In Subversion, we advertised for it on the development and users mailing lists, got several volunteers, and took the first one who replied. When that person had to step down (see la section intitulée « Transitions » later in this chapter), we did the same thing again. We've never tried having multiple people share the role, because of the communications overhead that would be required between them; but perhaps at very high volumes of patch submission, a multiheaded patch manager might make sense.

Translation Manager

In software projects, "translation" can refer to two very different things. It can mean translating the software's documentation into other languages, or it can mean translating the software itself—that is, having the program display errors and help messages in the user's preferred language. Both are complex tasks, but once the right infrastructure is in place, they are largely separable from other development. Because the tasks are similar in some ways, it may make sense (depending on your project) to have a single translation manager handle both, or it may be better to have two different managers.

In the Subversion project, we have one translation manager handle both. He does not actually write the translations himself, of course—he may help out on one or two, but as of this writing, he would need to speak ten languages (twelve counting dialects) in order to work on all of them! Instead, he manages teams of volunteer translators: he helps them coordinate among each other, and he coordinates between the teams and the rest of the project.

Part of the reason the translation manager is necessary is that translators are a different demographic from developers. They sometimes have little or no experience working in a version control repository, or indeed with working as part of a distributed volunteer team at all. But in other respects they are often the best kind of volunteer: people with specific domain knowledge who saw a need and chose to get involved. They are usually willing to learn, and enthusiastic to get to work. All they need is someone to tell them how. The translation manager makes sure that the translations happen in a way that does not interfere unnecessarily with regular development. He also serves as a sort of representative of the translators as a unified body, whenever the developers must be informed of technical changes required to support the translation effort.

Thus, the position's most important skills are diplomatic, not technical. For example, in Subversion we have a policy that all translations should have at least two people working on them, because otherwise there is no way for the text to be reviewed. When a new volunteer shows up offering to translate Subversion to, say, Malagasy, the translation manager has to either hook him up with someone who posted six months ago expressing interest in doing a Malagasy translation, or else politely ask the volunteer to go find another Malagasy translator to work with as a partner. Once enough people are available, the manager sets them up with the proper kind of commit access, informs them of the project's conventions (such as how to write log messages), and then keeps an eye out to make sure they adhere to those conventions.

Conversations between the translation manager and the developers, or between the translation manager and translation teams, are usually held in the project's original language—that is, the language from which all the translations are being made. For most free software projects, this is English, but it doesn't matter what it is as long as the project agrees on it. (English is probably best for projects that want to attract a broad international development community, though.)

Conversations within a particular translation team usually happen in their shared language, however, and one of the other tasks of the translation manager is to set up a dedicated mailing list for each team. That way the translators can discuss their work freely, without distracting people on the project's main lists, most of whom would not be able to understand the translation language anyway.

Documentation Manager

Keeping software documentation up-to-date is a never-ending task. Every new feature or enhancement that goes into the code has the potential to cause a change in the documentation. Also, once the project's documentation reaches a certain level of completeness, you will find that a lot of the patches people send in are for the documentation, not for the code. This is because there are many more people competent to fix bugs in prose than in code: all users are readers, but only a few are programmers.

Documentation patches are usually much easier to review and apply than code patches. There is little or no testing to be done, and the quality of the change can be evaluated quickly just by review. Since the quantity is high, but the review burden fairly low, the ratio of administrative overhead to productive work is greater for documentation patches than for code patches. Furthermore, most of the patches will probably need some sort of adjustment, in order to maintain a consistent authorial voice in the documentation. In many cases, patches will overlap with or affect other patches, and need to be adjusted with respect to each other before being committed.

Given the exigencies of handling documentation patches, and the fact that the code base needs to be constantly monitored so the documentation can be kept up-to-date, it makes sense to have one person, or a small team, dedicated to the task. They can keep a record of exactly where and how the documentation lags behind the software, and they can have practiced procedures for handling large quantities of patches in an integrated way.

Of course, this does not preclude other people in the project from applying documentation patches on the fly, especially small ones, as time permits. And the same patch manager (see la section intitulée « Patch Manager » earlier in this chapter) can track both code and documentation patches, filing them wherever the development and documentation teams want them, respectively. (If the total quantity of patches ever exceeds one human's capacity to track, though, switching to separate patch managers for code and documentation is probably a good first step.) The point of a documentation team is to have people who think of themselves as responsible for keeping the documentation organized, up-to-date, and consistent with itself. In practice, this means knowing the documentation intimately, watching the code base, watching the changes others commit to the documentation, watching for incoming documentation patches, and using all these information sources to do whatever is necessary to keep the documentation healthy.

Issue Manager

The number of issues in a project's bug tracker grows in proportion to the number of people using the software. Therefore, even as you fix bugs and ship an increasingly robust program, you should still expect the number of open issues to grow essentially without bound. The frequency of duplicate issues will also increase, as will the frequency of incomplete or poorly described issues.

Issue managers help alleviate these problems by watching what goes into the database, and periodically sweeping through it looking for specific problems. Their most common action is probably to fix up incoming issues, either because the reporter didn't set some of the form fields correctly, or because the issue is a duplicate of one already in the database. Obviously, the more familiar an issue manager is with the project's bug database, the more efficiently she will be able to detect duplicate issues—this is one of the main advantages of having a few people specialize in the bug database, instead of everyone trying to do it ad hoc. When the group tries to do it in a decentralized manner, no single individual acquires a deep expertise in the content of the database.

Issue managers can also help map between issues and individual developers. When there are a lot of bug reports coming in, not every developer may read the issue notification mailing list with equal attention. However, if someone who knows the development team is keeping an eye on all incoming issues, then she can discreetly direct certain developers' attention to specific bugs when appropriate. Of course, this has to be done with a sensitivity to everything else going on in development, and to the recipient's desires and temperament. Therefore, it is often best for issue managers to be developers themselves.

Depending on how your project uses the issue tracker, issue managers can also shape the database to reflect the project's priorities. For example, in Subversion we schedule issues into specific future releases, so that when someone asks "When will bug X be fixed?" we can say "Two releases from now," even if we can't give an exact date. The releases are represented in the issue tracker as target milestones, a field available in IssueZilla.[22] As a rule, every Subversion release has one major new feature and a list of specific bug fixes. We assign the appropriate target milestone to all the issues planned for that release (including the new feature—it gets an issue too), so that people can view the bug database through the lens of release scheduling. These targets rarely remain static, however. As new bugs come in, priorities sometimes get shifted around, and issues must be moved from one milestone to another so that each release remains manageable. This, again, is best done by people who have an overall sense of what's in the database, and how various issues relate to each other.

Another thing issue managers do is notice when issues become obsolete. Sometimes a bug is fixed accidentally as part of an unrelated change to the software, or sometimes the project changes its mind about whether a certain behavior is buggy. Finding obsoleted issues is not easy: the only way to do it systematically is by making a sweep over all the issues in the database. Full sweeps become less and less feasible over time, however, as the number of issues grows. After a certain point, the only way to keep the database sane is to use a divide-and-conquer approach: categorize issues immediately on arrival and direct them to the appropriate developer's or team's attention. The recipient then takes charge of the issue for the rest of its lifetime, shepherding it to resolution or oblivion as necessary. When the database is that large, the issue manager becomes more of an overall coordinator, spending less time looking at each issue herself and more time getting it into the right person's hands.

FAQ Manager

FAQ maintenance is a surprisingly difficult problem. Unlike most other documents in a project, whose content is planned out in advance by the authors, a FAQ is a wholly reactive document (see Maintenir une FAQ). No matter how big it gets, you still never know what the next addition will be. And because it is always added to piecemeal, it is very easy for the document as a whole to become incoherent and disorganized, and even to contain duplicate or semi-duplicate entries. Even when it does not have any obvious problems like that, there are often unnoticed interdependencies between items—links that should be made but aren't—because the related items were added a year apart.

The role of a FAQ manager is twofold. First, she maintains the overall quality of the FAQ by staying familiar with at least the topics of all the questions in it, so that when people add new items that are duplicates of, or related to, existing items, the appropriate adjustments can be made. Second, she watches the project mailing lists and other forums for recurring problems or questions, and to write new FAQ entries based on this input. This latter task can be quite complex: one must be able to follow a thread, recognize the core questions raised in it, post a proposed FAQ entry, incorporate comments from others (since it's impossible for the FAQ manager to be an expert in every topic covered by the FAQ), and sense when the process is finished so the item can at last be added.

The FAQ manager usually also becomes the default expert in FAQ formatting. There are a lot of little details involved in keeping a FAQ in shape (see la section intitulée « Treat all resources like archives » in Chapitre 6, Communications); when random people edit the FAQ, they will sometimes forget some of these details. That's okay, as long as the FAQ manager is there to clean up after them.

Various free software is available to help with the process of FAQ maintenance. It's fine to use it, as long as it doesn't compromise the quality of the FAQ, but beware of over-automation. Some projects try to fully automate the process of FAQ maintenance, allowing everyone to contribute and edit FAQ items in a manner similar to a wiki (see la section intitulée « Wikis » in Chapitre 3, L'infrastructure technique). I've seen this happen particularly with Faq-O-Matic (http://faqomatic.sourceforge.net/), though it may be that the cases I saw were simply abuses that went beyond what Faq-O-Matic was originally intended for. In any case, while complete decentralization of FAQ maintenance does reduce the workload for the project, it also results in a poorer FAQ. There's no one person with a broad view of the entire FAQ, no one to notice when certain items need updating or become obsolete entirely, and no one keeping watch for interdependencies between items. The result is a FAQ that often fails to provide users what they were looking for, and in the worst cases misleads them. Use whatever tools you need to to maintain your project's FAQ, but never let the convenience of the tools seduce you into compromising the quality of the FAQ.

See Sean Michael Kerner's article, The FAQs on FAQs, at http://osdir.com/Article1722.phtml, for descriptions and evaluations of open source FAQ maintenance tools.

Transitions

From time to time, a volunteer in a position of ongoing responsibility (e.g., patch manager, translation manager, etc.) will become unable to perform the duties of the position. It may be because the job turned out to be more work than he anticipated, or it may be due to completely external factors: marriage, a new baby, a new employer, or whatever.

When a volunteer gets swamped like this, he usually doesn't notice it right away. It happens by slow degrees, and there's no point at which he consciously realizes that he can no longer fulfill the duties of the role. Instead, the rest of the project just doesn't hear much from him for a while. Then there will suddenly be a flurry of activity, as he feels guilty for neglecting the project for so long and sets aside a night to catch up. Then you won't hear from him for a while longer, and then there might or might not be another flurry. But there's rarely an unsolicited formal resignation. The volunteer was doing the job in his spare time, so resigning would mean openly acknowledging to himself that his spare time is permanently reduced. People are often reluctant to do that.

Therefore, it's up to you and the others in the project to notice what's happening—or rather, not happening—and to ask the volunteer what's going on. The inquiry should be friendly and 100% guilt-free. Your purpose is to find out a piece of information, not to make the person feel bad. Generally, the inquiry should be visible to the rest of the project, but if you know of some special reason why a private inquiry would be better, that's fine too. The main reason to do it publicly is so that if the volunteer responds by saying that he won't be able to do the job anymore, there's a context established for your next public post: a request for a new volunteer to fill that role.

Sometimes, a volunteer is unable to do the job he's taken on, but is either unaware or unwilling to admit that fact. Of course, anyone may have trouble at first, especially if the responsibility is complex. However, if someone just isn't working out in the task he's taken on, even after everyone else has given all the help and suggestions they can, then the only solution is for him to step aside and let someone new have a try. And if the person doesn't see this himself, he'll need to be told. There's basically only one way to handle this, I think, but it's a multistep process and each step is important.

First, make sure you're not crazy. Privately talk to others in the project to see if they agree that the problem is as serious as you think it is. Even if you're already positive, this serves the purpose of letting others know that you're considering asking the person to step aside. Usually no one will object to that—they'll just be happy you're taking on the awkward task, so they don't have to!

Next, privately contact the volunteer in question and tell him, kindly but directly, about the problems you see. Be specific, giving as many examples as possible. Make sure to point out how people had tried to help, but that the problems persisted without improving. You should expect this email to take a long time to write, but with this sort of message, if you don't back up what you're saying, you shouldn't say it at all. Say that you would like to find a new volunteer to fill the role, but also point out that there are many other ways to contribute to the project. At this stage, don't say that you've talked to others about it; nobody likes to be told that people were conspiring behind his back.

There are a few different ways things can go after that. The most likely reaction is that he'll agree with you, or at any rate not want to argue, and be willing to step down. In that case, suggest that he make the announcement himself, and then you can follow up with a post seeking a replacement.

Or, he may agree that there have been problems, but ask for a little more time (or for one more chance, in the case of discrete-task roles like release manager). How you react to that is a judgement call, but whatever you do, don't agree to it just because you feel like you can't refuse such a reasonable request. That would prolong the agony, not lessen it. There is often a very good reason to refuse the request, namely, that there have already been plenty of chances, and that's how things got to where they are now. Here's how I put it in a mail to someone who was filling the release manager role but was not really suited for it:

> If you wish to replace me with some one else, I will gracefully
> pass on the role to who comes next.  I have one request, which
> I hope is not unreasonable.  I would like to attempt one more
>  release in an effort to prove myself.

I totally understand the desire (been there myself!), but in
this case, we shouldn't do the "one more try" thing.

This isn't the first or second release, it's the sixth or
seventh... And for all of those, I know you've been dissatisfied
with the results too (because we've talked about it before).  So
we've effectively already been down the one-more-try route.
Eventually, one of the tries has to be the last one... I think
[this past release] should be it.

In the worst case, the volunteer may disagree outright. Then you have to accept that things are going to be awkward and plow ahead anyway. Now is the time to say that you talked to other people about it (but still don't say who until you have their permission, since those conversations were confidential), and that you don't think it's good for the project to continue as things are. Be insistent, but never threatening. Keep in mind that with most roles, the transition really happens the moment someone new starts doing the job, not the moment the old person stops doing it. For example, if the contention is over the role of, say, issue manager, at any point you and other influential people in the project can solicit for a new issue manager. It's not actually necessary that the person who was previously doing it stop doing it, as long as he does not sabotage (deliberately or otherwise) the efforts of the new volunteer.

Which leads to a tempting thought: instead of asking the person to resign, why not just frame it as a matter of getting him some help? Why not just have two issue managers, or patch managers, or whatever the role is?

Although that may sound nice in theory, it is generally not a good idea. What makes the manager roles work—what makes them useful, in fact—is their centralization. Those things that can be done in a decentralized fashion are usually already being done that way. Having two people fill one managerial role introduces communications overhead between those two people, as well as the potential for slippery displacement of responsibility ("I thought you brought the first aid kit!" "Me? No, I thought you brought the first aid kit!"). Of course, there are exceptions. Sometimes two people work extremely well together, or the nature of the role is such that it can easily be spread across multiple people. But these are not likely to be of much use when you see someone flailing in a role he is not suited for. If he'd appreciated the problem in the first place, he would have sought such help before now. In any case, it would be disrespectful to let someone waste time continuing to do a job no one will pay attention to.

The most important factor in asking someone to step down is privacy: giving him the space to make a decision without feeling like others are watching and waiting. I once made the mistake—an obvious mistake, in retrospect—of mailing all three parties at once in order to ask Subversion's release manager to step aside in favor of two other volunteers. I'd already talked to the two new people privately, and knew that they were willing to take on the responsibility. So I thought, naïvely and somewhat insensitively, that I'd save some time and hassle by sending one mail to all of them to initiate the transition. I assumed that the current release manager was already fully aware of the problems and would see the reasonableness of my point immediately.

I was wrong. The current release manager was very offended, and rightly so. It's one thing to be asked to hand off the job; it's another thing to be asked that in front of the people you'll hand it off to. Once I got it through my head why he was offended, I apologized. He eventually did step aside gracefully, and continues to be involved with the project today. But his feelings were hurt, and needless to say, this was not the most auspicious of beginnings for the new volunteers either.

Committers

As the only formally distinct class of people found in all open source projects, committers deserve special attention here. Committers are an unavoidable concession to discrimination in a system which is otherwise as non-discriminatory as possible. But "discrimination" is not meant as a pejorative here. The function committers perform is utterly necessary, and I do not think a project could succeed without it. Quality control requires, well, control. There are always many people who feel competent to make changes to a program, and some smaller number who actually are. The project cannot rely on people's own judgement; it must impose standards and grant commit access only to those who meet them[23]. On the other hand, having people who can commit changes directly working side-by-side with people who cannot sets up an obvious power dynamic. That dynamic must be managed so that it does not harm the project.

In la section intitulée « Who Votes? » in Chapitre 4, Social and Political Infrastructure, we already discussed the mechanics of considering new committers. Here we will look at the standards by which potential new committers should be judged, and how this process should be presented to the larger community.

Choosing Committers

In the Subversion project, we choose committers primarily on the Hippocratic Principle: first, do no harm. Our main criterion is not technical skill or even knowledge of the code, but merely that the committer show good judgement. Judgement can mean simply knowing what not to take on. A person might post only small patches, fixing fairly simple problems in the code; but if the patches apply cleanly, do not contain bugs, and are mostly in accord with the project's log message and coding conventions, and there are enough patches to show a clear pattern, then an existing committer will usually propose that person for commit access. If at least three people say yes, and no one objects, then the offer is made. True, we might have no evidence that the person is able to solve complex problems in all areas of the code base, but that does not matter: the person has made it clear that he is capable of at least judging his own abilities. Technical skills can be learned (and taught), but judgement, for the most part, cannot. Therefore, it is the one thing you want to make sure a person has before you give him commit access.

When a new committer proposal does provoke a discussion, it is usually not about technical ability, but rather about the person's behavior on the mailing lists or in IRC. Sometimes someone shows technical skill and an ability to work within the project's formal guidelines, yet is also consistently belligerent or uncooperative in public forums. That's a serious concern; if the person doesn't seem to shape up over time, even in response to hints, then we won't add him as a committer no matter how skilled he is. In a volunteer group, social skills, or the ability to "play well in the sandbox", are as important as raw technical ability. Because everything is under version control, the penalty for adding a committer you shouldn't have is not so much the problems it could cause in the code (review would spot those quickly anyway), but that it might eventually force the project to revoke the person's commit access—an action that is never pleasant and can sometimes be confrontational.

Many projects insist that the potential committer demonstrate a certain level of technical expertise and persistence, by submitting some number of nontrivial patches—that is, not only do these projects want to know that the person will do no harm, they want to know that she is likely to do good across the code base. This is fine, but be careful that it doesn't start to turn committership into a matter of membership in an exclusive club. The question to keep in everyone's mind should be "What will bring the best results for the code?" not "Will we devalue the social status associated with committership by admitting this person?" The point of commit access is not to reinforce people's self-worth, it's to allow good changes to enter the code with a minimum of fuss. If you have 100 committers, 10 of whom make large changes on a regular basis, and the other 90 of whom just fix typos and small bugs a few times a year, that's still better than having only the 10.

Revoking Commit Access

The first thing to be said about revoking commit access is: try not to be in that situation in the first place. Depending on whose access is being revoked, and why, the discussions around such an action can be very divisive. Even when not divisive, they will be a time-consuming distraction from productive work.

However, if you must do it, the discussion should be had privately among the same people who would be in a position to vote for granting that person whatever flavor of commit access they currently have. The person herself should not be included. This contradicts the usual injunction against secrecy, but in this case it's necessary. First, no one would be able to speak freely otherwise. Second, if the motion fails, you don't necessarily want the person to know it was ever considered, because that could open up questions ("Who was on my side? Who was against me?") that lead to the worst sort of factionalism. In certain rare circumstances, the group may want someone to know that revocation of commit access is or was being considered, as a warning, but this openness should be a decision the group makes. No one should ever, on her own initiative, reveal information from a discussion and ballot that others assumed were secret.

Once someone's access is revoked, that fact is unavoidably public (see la section intitulée « Avoid Mystery » later in this chapter), so try to be as tactful as you can in how it is presented to the outside world.

Partial Commit Access

Some projects offer gradations of commit access. For example, there might be contributors whose commit access gives them free rein in the documentation, but who do not commit to the code itself. Common areas for partial commit access include documentation, translations, binding code to other programming languages, specification files for packaging (e.g., RedHat RPM spec files, etc.), and other places where a mistake will not result in a problem for the core project.

Since commit access is not only about committing, but about being part of an electorate (see la section intitulée « Who Votes? » in Chapitre 4, Social and Political Infrastructure), the question naturally arises: what can the partial committers vote on? There is no one right answer; it depends on what sorts of partial commit domains your project has. In Subversion we've kept things fairly simple: a partial committer can vote on matters confined exclusively to that committer's domain, and not on anything else. Importantly, we do have a mechanism for casting advisory votes (essentially, the committer writes "+0" or "+1 (non-binding)" instead of just "+1" on the ballot). There's no reason to silence people entirely just because their vote isn't formally binding.

Full committers can vote on anything, just as they can commit anywhere, and only full committers vote on adding new committers of any kind. In practice, though, the ability to add new partial committers is usually delegated: any full committer can "sponsor" a new partial committer, and partial committers in a domain can often essentially choose new committers for that same domain (this is especially helpful in making translation work run smoothly).

Your project may need a slightly different arrangement, depending on the nature of the work, but the same general principles apply to all projects. Each committer should be able to vote on matters that fall within the scope of her commit access, and not on matters outside that, and votes on procedural questions should default to the full committers, unless there's some reason (as decided by the full committers) to widen the electorate.

Regarding enforcement of partial commit access: it's often best not to have the version control system enforce partial commit domains, even if it can. See la section intitulée « Authorization » in Chapitre 3, L'infrastructure technique for the reasons why.

Dormant Committers

Some projects automatically remove people's commit access if they go a certain amount of time (say, a year) without committing anything. I think this is usually unhelpful and even counterproductive, for two reasons.

First, it may tempt some people into committing acceptable but unnecessary changes, just to prevent their commit access from expiring. Second, it doesn't really serve any purpose. If the main criterion for granting commit access is good judgement, then why assume someone's judgement would deteriorate just because he's away from the project for a while? Even if he completely vanishes for years, not looking at the code or following development discussions, when he reappears he'll know how out of touch he is, and act accordingly. You trusted his judgement before, so why not trust it always? If high school diplomas do not expire, then commit access certainly shouldn't.

Sometimes a committer may ask to be removed, or to be explicitly marked as dormant in the list of committers (see la section intitulée « Avoid Mystery » below for more about that list). In these cases, the project should accede to the person's wishes, of course.

Avoid Mystery

Although the discussions around adding any particular new committer must be confidential, the rules and procedures themselves need not be secret. In fact, it's best to publish them, so people realize that the committers are not some mysterious Star Chamber, closed off to mere mortals, but that anyone can join simply by posting good patches and knowing how to handle herself in the community. In the Subversion project, we put this information right in the developer guidelines document, since the people most likely to be interested in how commit access is granted are those thinking of contributing code to the project.

In addition to publishing the procedures, publish the actual list of committers. The traditional place for this is a file called MAINTAINERS or COMMITTERS in the top level of the project's source code tree. It should list all the full committers first, followed by the various partial commit domains and the members of each domain. Each person should be listed by name and email address, though the address can be encoded to prevent spam (see la section intitulée « Masquer les adresses dans les archives » in Chapitre 3, L'infrastructure technique) if the person prefers that.

Since the distinction between full commit and partial commit access is obvious and well defined, it is proper for the list to make that distinction too. Beyond that, the list should not try to indicate the informal distinctions that inevitably arise in a project, such as who is particularly influential and how. It is a public record, not an acknowledgments file. List committers either in alphabetical order, or in the order in which they arrived.

Credit

Credit is the primary currency of the free software world. Whatever people may say about their motivations for participating in a project, I don't know any developers who would be happy doing all their work anonymously, or under someone else's name. There are tangible reasons for this: one's reputation in a project roughly governs how much influence one has, and participation in an open source project can also indirectly have monetary value, because some employers now look for it on resumés. There are also intangible reasons, perhaps even more powerful: people simply want to be appreciated, and instinctively look for signs that their work was recognized by others. The promise of credit is therefore one of best motivators the project has. When small contributions are acknowledged, people come back to do more.

One of the most important features of collaborative development software (see Chapitre 3, L'infrastructure technique) is that it keeps accurate records of who did what, when. Wherever possible, use these existing mechanisms to make sure that credit is distributed accurately, and be specific about the nature of the contribution. Don't just write "Thanks to J. Random <jrandom@example.com>" if instead you can write "Thanks to J. Random <jrandom@example.com> for the bug report and reproduction recipe" in a log message.

In Subversion, we have an informal but consistent policy of crediting the reporter of a bug in either the issue filed, if there is one, or the log message of the commit that fixes the bug, if not. A quick survey of Subversion commit logs up to commit number 14525 shows that about 10% of commits give credit to someone by name and email address, usually the person who reported or analyzed the bug fixed by that commit. Note that this person is different from the developer who actually made the commit, whose name is already recorded automatically by the version control system. Of the 80-odd full and partial committers Subversion has today, 55 were credited in the commit logs (usually multiple times) before they became committers themselves. This does not, of course, prove that being credited was a factor in their continued involvement, but it at least sets up an atmosphere in which people know they can count on their contributions being acknowledged.

It is important to distinguish between routine acknowledgment and special thanks. When discussing a particular piece of code, or some other contribution someone made, it is fine to acknowledge their work. For example, saying "Daniel's recent changes to the delta code mean we can now implement feature X" simultaneously helps people identify which changes you're talking about and acknowledges Daniel's work. On the other hand, posting solely to thank Daniel for the delta code changes serves no immediate practical purpose. It doesn't add any information, since the version control system and other mechanisms have already recorded the fact that he made the changes. Thanking everyone for everything would be distracting and ultimately information-free, since thanks are effective largely by how much they stand out from the default, background level of favorable comment going on all the time. This does not mean, of course, that you should never thank people. Just make sure to do it in ways that tend not to lead to credit inflation. Following these guidelines will help:

  • The more ephemeral the forum, the more free you should feel to express thanks there. For example, thanking someone for their bugfix in passing during an IRC conversation is fine, as is an aside in an email devoted mainly to other topics. But don't post an email solely to thank someone, unless it's for a truly unusual feat. Likewise, don't clutter the project's web pages with expressions of gratitude. Once you start that, it'll never be clear when or where to stop. And never put thanks into comments in the code; that would only be a distraction from the primary purpose of comments, which is to help the reader understand the code.

  • The less involved someone is in the project, the more appropriate it is to thank her for something she did. This may sound counterintuitive, but it fits with the attitude that expressing thanks is something you do when someone contributes even more than you thought she would. Thus, to constantly thank regular contributors for doing what they normally do would be to express a lower expectation of them than they have of themselves. If anything, you want to aim for the opposite effect!

    There are occasional exceptions to this rule. It's acceptable to thank someone for fulfilling his expected role when that role involves temporary, intense efforts from time to time. The canonical example is the release manager, who goes into high gear around the time of each release, but otherwise lies dormant (dormant as a release manager, in any case—he may also be an active developer, but that's a different matter).

  • As with criticism and crediting, gratitude should be specific. Don't thank people just for being great, even if they are. Thank them for something they did that was out of the ordinary, and for bonus points, say exactly why what they did was so great.

In general, there is always a tension between making sure that people's individual contributions are recognized, and making sure the project is a group effort rather than a collection of individual glories. Just remain aware of this tension and try to err on the side of group, and things won't get out of hand.

Forks

In la section intitulée « Forkability » in Chapitre 4, Social and Political Infrastructure, we saw how the potential to fork has important effects on how projects are governed. But what happens when a fork actually occurs? How should you handle it, and what effects can you expect it to have? Conversely, when should you initiate a fork?

The answers depend on what kind of fork it is. Some forks are due to amicable but irreconcilable disagreements about the direction of the project; perhaps more are due to both technical disagreements and interpersonal conflicts. Of course, it's not always possible to tell the difference between the two, as technical arguments may involve personal elements as well. What all forks have in common is that one group of developers (or sometimes even just one developer) has decided that the costs of working with some or all of the others now outweigh the benefits.

Once a project forks, there is no definitive answer to the question of which fork is the "true" or "original" project. People will colloquially talk of fork F coming out of project P, as though P is continuing unchanged down some natural path while F diverges into new territory, but this is, in effect, a declaration of how that particular observer feels about it. It is fundamentally a matter of perception: when a large enough percentage of observers agree, the assertion starts to become true. It is not the case that there is an objective truth from the outset, one that we are only imperfectly able to perceive at first. Rather, the perceptions are the objective truth, since ultimately a project—or a fork—is an entity that exists only in people's minds anyway.

If those initiating the fork feel that they are sprouting a new branch off the main project, the perception question is resolved immediately and easily. Everyone, both developers and users, will treat the fork as a new project, with a new name (perhaps based on the old name, but easily distinguishable from it), a separate web site, and a separate philosophy or goal. Things get messier, however, when both sides feel they are the legitimate guardians of the original project and therefore have the right to continue using the original name. If there is some organization with trademark rights to the name, or legal control over the domain or web pages, that usually resolves the issue by fiat: that organization will decide who is the project and who is the fork, because it holds all the cards in a public relations war. Naturally, things rarely get that far: since everyone already knows what the power dynamics are, they will avoid fighting a battle whose outcome is known in advance, and just jump straight to the end.

Fortunately, in most cases there is little doubt as to which is the project and which is the fork, because a fork is, in essence, a vote of confidence. If more than half of the developers are in favor of whatever course the fork proposes to take, usually there is no need to fork—the project can simply go that way itself, unless it is run as a dictatorship with a particularly stubborn dictator. On the other hand, if fewer than half of the developers are in favor, the fork is a clearly minority rebellion, and both courtesy and common sense indicate that it should think of itself as the divergent branch rather than the main line.

Handling a Fork

If someone threatens a fork in your project, keep calm and remember your long-term goals. The mere existence of a fork isn't what hurts a project; rather, it's the loss of developers and users. Your real aim, therefore, is not to squelch the fork, but to minimize these harmful effects. You may be mad, you may feel that the fork was unjust and uncalled for, but expressing that publicly can only alienate undecided developers. Instead, don't force people to make exclusive choices, and be as cooperative as is practicable with the fork. To start with, don't remove someone's commit access in your project just because he decided to work on the fork. Work on the fork doesn't mean that person has suddenly lost his competence to work on the original project; committers before should remain committers afterward. Beyond that, you should express your desire to remain as compatible as possible with the fork, and say that you hope developers will port changes between the two whenever appropriate. If you have administrative access to the project's servers, publicly offer the forkers infrastructure help at startup time. For example, offer them a complete, deep-history copy of the version control repository, if there's no other way for them to get it, so that they don't have to start off without historical data (this may not be necessary depending on the version control system). Ask them if there's anything else they need, and provide it if you can. Bend over backward to show that you are not standing in the way, and that you want the fork to succeed or fail on its own merits and nothing else.

The reason to do all this—and do it publicly—is not to actually help the fork, but to persuade developers that your side is a safe bet, by appearing as non-vindictive as possible. In war it sometimes makes sense (strategic sense, if not human sense) to force people to choose sides, but in free software it almost never does. In fact, after a fork some developers often openly work on both projects, and do their best to keep the two compatible. These developers help keep the lines of communication open after the fork. They allow your project to benefit from interesting new features in the fork (yes, the fork may have things you want), and also increase the chances of a merger down the road.

Sometimes a fork becomes so successful that, even though it was regarded even by its own instigators as a fork at the outset, it becomes the version everybody prefers, and eventually supplants the original by popular demand. A famous instance of this was the GCC/EGCS fork. The GNU Compiler Collection (GCC, formerly the GNU C Compiler) is the most popular open source native-code compiler, and also one of the most portable compilers in the world. Due to disagreements between the GCC's official maintainers and Cygnus Software,[24] one of GCC's most active developer groups, Cygnus created a fork of GCC called EGCS. The fork was deliberately non-adversarial: the EGCS developers did not, at any point, try to portray their version of GCC as a new official version. Instead, they concentrated on making EGCS as good as possible, incorporating patches at a faster rate than the official GCC maintainers. EGCS gained in popularity, and eventually some major operating system distributors decided to package EGCS as their default compiler instead of GCC. At this point, it became clear to the GCC maintainers that holding on to the "GCC" name while everyone switched to the EGCS fork would burden everyone with a needless name change, yet do nothing to prevent the switchover. So GCC adopted the EGCS codebase, and there is once again a single GCC, but greatly improved because of the fork.

This example shows why you cannot always regard a fork as an unadulteratedly bad thing. A fork may be painful and unwelcome at the time, but you cannot necessarily know whether it will succeed. Therefore, you and the rest of the project should keep an eye on it, and be prepared not only to absorb features and code where possible, but in the most extreme case to even join the fork if it gains the bulk of the project's mindshare. Of course, you will often be able to predict a fork's likelihood of success by seeing who joins it. If the fork is started by the project's biggest complainer and joined by a handful of disgruntled developers who weren't behaving constructively anyway, they've essentially solved a problem for you by forking, and you probably don't need to worry about the fork taking momentum away from the original project. But if you see influential and respected developers supporting the fork, you should ask yourself why. Perhaps the project was being overly restrictive, and the best solution is to adopt into the mainline project some or all of the actions contemplated by the fork—in essence, to avoid the fork by becoming it.

Initiating a Fork

All the advice here assumes that you are forking as a last resort. Exhaust all other possibilities before starting a fork. Forking almost always means losing developers, with only an uncertain promise of gaining new ones later. It also means starting out with competition for users' attention: everyone who's about to download the software has to ask themselves: "Hmm, do I want that one or the other one?" Whichever one you are, the situation is messy, because a question has been introduced that wasn't there before. Some people maintain that forks are healthy for the software ecosystem as a whole, by a standard natural selection argument: the fittest will survive, which means that, in the end, everyone gets better software. This may be true from the ecosystem's point of view, but it's not true from the point of view of any individual project. Most forks do not succeed, and most projects are not happy to be forked.

A corollary is that you should not use the threat of a fork as an extremist debating technique—"Do things my way or I'll fork the project!"—because everyone is aware that a fork that fails to attract developers away from the original project is unlikely to survive long. All observers—not just developers, but users and operating system packagers too—will make their own judgement about which side to choose. You should therefore appear extremely reluctant to fork, so that if you finally do it, you can credibly claim it was the only route left.

Do not neglect to take all factors into account in evaluating the potential success of your fork. For example, if many of the developers on a project have the same employer, then even if they are disgruntled and privately in favor of a fork, they are unlikely to say so out loud if they know that their employer is against it. Many free software programmers like to think that having a free license on the code means no one company can dominate development. It is true that the license is, in an ultimate sense, a guarantor of freedom—if others want badly enough to fork the project, and have the resources to do so, they can. But in practice, some projects' development teams are mostly funded by one entity, and there is no point pretending that that entity's support doesn't matter. If it is opposed to the fork, its developers are unlikely to take part, even if they secretly want to.

If you still conclude that you must fork, line up support privately first, then announce the fork in a non-hostile tone. Even if you are angry at, or disappointed with, the current maintainers, don't say that in the message. Just dispassionately state what led you to the decision to fork, and that you mean no ill will toward the project from which you're forking. Assuming that you do consider it a fork (as opposed to an emergency preservation of the original project), emphasize that you're forking the code and not the name, and choose a name that does not conflict with the project's name. You can use a name that contains or refers to the original name, as long as it does not open the door to identity confusion. Of course it's fine to explain prominently on the fork's home page that it descends from the original program, and even that it hopes to supplant it. Just don't make users' lives harder by forcing them to untangle an identity dispute.

Finally, you can get things started on the right foot by automatically granting all committers of the original project commit access to the fork, including even those who openly disagreed with the need for a fork. Even if they never use the access, your message is clear: there are disagreements here, but no enemies, and you welcome code contributions from any competent source.



[20] This question was studied in detail, with interesting results, in a paper by Karim Lakhani and Robert G. Wolf, entitled Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects. See http://freesoftware.mit.edu/papers/lakhaniwolf.pdf.

[21] Note that there would be no need to convert all the existing tests to the new framework; the two could happily exist side by side, with old tests converted over only as they needed to be changed.

[22] IssueZilla is the issue tracker we use; it is a descendant of BugZilla.

[23] Note that the commit access means something a bit different in decentralized version control systems, where anyone can set up a repository that is linked into the project, and give themselves commit access to that repository. Nevertheless, the concept of commit access still applies: "commit access" is shorthand for "the right to make changes to the code that will ship in the group's next release of the software." In centralized version control systems, this means having direct commit access; in decentralized ones, it means having one's changes pulled into the main distribution by default. It is the same idea either way; the mechanics by which it is realized are not terribly important.

[24] Now part of RedHat (http://www.redhat.com/).

Chapitre 9. Licenses, Copyrights, and Patents

The license you select probably won't have a major impact on the adoption of your project, as long as the license is open source. Users generally choose software based on quality and features, not on the details of the license. Nevertheless, you still need a basic understanding of free software licensing issues, both to ensure that the project's license is compatible with its goals, and to be able to discuss licensing decisions with other people. Please note, however, that I am not a lawyer, and that nothing in this chapter should be construed as formal legal advice. For that, you'll need to hire a lawyer or be one.

Terminology

In any discussion of open source licensing, the first thing that becomes apparent is that there seem to be many different words for the same thing: free software, open source, FOSS, F/OSS, and FLOSS. Let's start by sorting those out, along with a few other terms.

free software

Software that can be freely shared and modified, including in source code form. The term was first coined by Richard Stallman, who codified it in the GNU General Public License (GPL), and who founded the Free Software Foundation (http://www.fsf.org/) to promote the concept.

Although "free software" covers almost exactly the same range of software as "open source", the FSF, among others, prefers the former term because it emphasizes the idea of freedom, and the concept of freely redistributable software as primarily a social movement rather than a technical one. The FSF acknowledges that the term is ambiguous—it could mean "free" as in "zero-cost", instead of "free" as in "freedom"—but feels that it's still the best term, all things considered, and that the other possibilities in English have their own ambiguities. (Throughout this book, "free" is used in the "freedom" sense, not the "zero-cost" sense.)

open source software

Free software under another name. But the different name reflects an important philosophical difference: "open source" was coined by the Open Source Initiative (http://www.opensource.org/) as a deliberate alternative to "free software," in order to make such software a more palatable choice for corporations, by presenting it as a development methodology rather than a political movement. They may also have wanted to overcome another stigma: that anything "free" must be low quality.

While any license that is free is also open source, and vice versa (with a few minor exceptions), people tend to pick one term and stick with it. In general, those who prefer "free software" are more likely to have a philosophical or moral stance on the issue, while those who prefer "open source" either don't view it as a matter of freedom, or are not interested in advertising the fact that they do. See la section intitulée « « Libre » contre « Open Source » » in Chapitre 1, Introduction for a more detailed history of this schism.

The Free Software Foundation has an excellent—utterly unobjective, but nuanced and quite fair—exegesis of the two terms, at http://www.fsf.org/licensing/essays/free-software-for-freedom.html. The Open Source Initiative's take on it is spread across two pages: http://www.opensource.org/advocacy/case_for_hackers.php#marketing and http://www.opensource.org/advocacy/free-notfree.php.

FOSS, F/OSS, FLOSS

Where there are two of anything, there will soon be three, and that is exactly what is happening with terms for free software. The academic world, perhaps wanting precision and inclusiveness over elegance, seems to have settled on FOSS, or sometimes F/OSS, standing for "Free / Open Source Software". Another variant gaining momentum is FLOSS, which stands for "Free / Libre Open Source Software" (libre is familiar in many languages and does not suffer from the ambiguities of "free"; see http://en.wikipedia.org/wiki/FLOSS for more).

All these terms mean essentially the same thing: software that can be modified and redistributed by everyone, sometimes—but not always—with the requirement that derivative works be freely redistributable under the same terms.

DFSG-compliant

Compliant with the Debian Free Software Guidelines (http://www.debian.org/social_contract#guidelines). This is a widely-used test for whether a given license is truly open source (free, libre, etc.). The Debian Project's mission is to maintain an entirely free operating system, such that someone installing it need never doubt that she has the right to modify and redistribute any or all of the system. The Debian Free Software Guidelines are the requirements that a software package's license must meet in order to be included in Debian. Because the Debian Project spent a good deal of time thinking about how to construct such a test, the guidelines they came up with have proven very robust (see http://en.wikipedia.org/wiki/DFSG), and as far as I'm aware, no serious objection to them has been raised either by the Free Software Foundation or the Open Source Initiative. If you know that a given license is DFSG-compliant, you know that it guarantees all the important freedoms (such as forkability even against the original author's wishes) required to sustain the dynamics of an open source project. All of the licenses discussed in this chapter are DFSG-compliant.

OSI-approved

Approved by the Open Source Initiative. This is another widely-used test of whether a license permits all the necessary freedoms. The OSI's definition of open source software is based on the Debian Free Software Guidelines, and any license that meets one definition almost always meets the other. There have been a few exceptions over the years, but only involving niche licenses and none of any relevance here. Unlike the Debian Project, the OSI maintains a list of all licenses it has ever approved, at http://www.opensource.org/licenses/, so that being "OSI-approved" is an unambiguous state: a license either is or isn't on the list.

The Free Software Foundation also maintains a list of licenses at http://www.fsf.org/licensing/licenses/license-list.html. The FSF categorizes licenses not only by whether they are free, but whether they are compatible with the GNU General Public License. GPL compatibility is an important topic, covered in la section intitulée « The GPL and License Compatibility » later in this chapter.

proprietary, closed-source

The opposite of "free" or "open source." It means software distributed under traditional, royalty-based licensing terms, where users pay per copy, or under any other terms sufficiently restrictive to prevent open source dynamics from operating. Even software distributed at no charge can still be proprietary, if its license does not permit free redistribution and modification.

Generally "proprietary" and "closed-source" are synonyms. However, "closed-source" additionally implies that the source code cannot even be seen. Since the source code cannot be seen with most proprietary software, this is normally a distinction without a difference. However, occasionally someone releases proprietary software under a license that allows others to view the source code. Confusingly, they sometimes call this "open source" or "nearly open source," etc., but that's misleading. The visibility of the source code is not the issue; the important question is what you're allowed to do with it. Thus, the difference between proprietary and closed-source is mostly irrelevant, and the two can be treated as synonyms.

Sometimes commercial is used as a synonym for "proprietary," but properly speaking, the two are not the same thing. Free software can be commercial software. After all, free software can be sold, as long as the buyers are not restricted from giving away copies themselves. It can be commercialized in other ways as well, for example by selling support, services, and certification. There are multimillion dollar companies built on free software today, so it is clearly neither inherently anti-commercial nor anti-corporate. On the other hand, it is anti-proprietary by its nature, and this is the key way in which it differs from traditional per-copy license models.

public domain

Having no copyright holder, meaning that there is no one who has the right to restrict copying of the work. Being in the public domain is not the same as having no author. Everything has an author, and even if a work's author or authors choose to put it in the public domain, that doesn't change the fact that they wrote it.

When a work is in the public domain, material from it can be incorporated into a copyrighted work, and thereafter that copy of the material is covered under the same copyright as the whole work. But this does not affect the availability of the original work, which remains in the public domain. Thus, releasing something into the public domain is technically one way to make it "free," according to the guidelines of most free software certifying organizations. However, there are usually good reasons to use a license instead of just releasing into the public domain: even with free software, certain restrictions can be useful, not only to the copyright holder but even to recipients as well, as the next section makes clear.

copyleft

A license that uses copyright law to achieve a result opposite to traditional copyright. Depending on whom you ask, this means either licenses that permit the freedoms under discussion here, or, more narrowly, licenses that not only permit those freedoms but enforce them, by stipulating that the freedoms must travel with the work. The Free Software Foundation uses the second definition exclusively; elsewhere, it's a toss-up: a lot of people use the term the same way the FSF does, but others—including some who write for mainstream media—tend to use the first definition. It's not clear that everyone using the term is aware that there's a distinction to be made.

The canonical example of the narrower, stricter definition is the GNU General Public License, which stipulates that any derivative works must also be licensed under the GPL; see la section intitulée « The GPL and License Compatibility » later in this chapter for more.

Aspects of Licenses

Although there are many different free software licenses available, in the important respects they all say the same things: that anyone can modify the code, that anyone can redistribute it both in original and modified form, and that the copyright holders and authors provide no warranties whatsoever (avoiding liability is especially important given that people might run modified versions without even knowing it). The differences between licences boil down to a few oft-recurring issues:

compatibility with proprietary licenses

Some free licenses allow the covered code to be used in proprietary programs. This does not affect the licensing terms of the proprietary program: it is still as proprietary as ever, it just happens to contain some code from a non-proprietary source. The Apache License, X Consortium License, BSD-style license, and the MIT-style license are all examples of proprietary-compatible licenses.

compatibility with other free licenses

Most free licenses are compatible with each other, meaning that code under one license can be combined with code under another, and the result distributed under either license without violating the terms of the other. The major exception to this is the GNU General Public License, which requires that any work using GPLed code be itself distributed under the GPL, and without adding any further restrictions beyond what the GPL requires. The GPL is compatible with some free licenses, but not with others. This is discussed in more detail in la section intitulée « The GPL and License Compatibility » later in this chapter.

enforcement of crediting

Some free licenses stipulate that any use of the covered code be accompanied by a notice, whose placement and display is usually specified, giving credit to the authors or copyright holders of the code. These licenses are often still proprietary-compatible: they do not necessarily demand that the derivative work be free, merely that credit be given to the free code.

protection of trademark

A variant of credit enforcement. Trademark-protecting licenses specify that the name of the original software (or its copyright holders, or their institution, etc.) may not be used by derivative works without prior written permission. Although credit enforcement insists that a certain name be used, and trademark protection insists that it not be used, they are both expressions of the same desire: that the original code's reputation be preserved and transmitted, but not tarnished by association.

protection of "artistic integrity"

Some licenses (the Artistic License, used for the most popular implementation of the Perl programming language, and Donald Knuth's TeX license, for example) require that modification and redistribution be done in a manner that distinguishes clearly between the pristine original version of the code and any modifications. They permit essentially the same freedoms as other free license, but impose certain requirements that make the integrity of the original code easy to verify. These licenses have not caught on much beyond the specific programs they were made for, and will not be discussed in this chapter; they are mentioned here only for the sake of completeness.

Most of these stipulations are not mutually exclusive, and some licenses include several. The common thread among them is that they place demands on the recipient in exchange for the recipient's right to use and/or redistribute the code. For example, some projects want their name and reputation to be transmitted along with the code, and this is worth imposing the extra burden of a credit or trademark clause; depending on its onerousness, that burden may result in some users choosing a package with a less demanding license.

The GPL and License Compatibility

By far the sharpest dividing line in licensing is that between proprietary-incompatible and proprietary-compatible licenses, that is, between the GNU General Public License and everything else. Because the primary goal of the GPL's authors is the promotion of free software, they deliberately crafted the license to make it impossible to mix GPLed code into proprietary programs. Specifically, among the GPL's requirements (see http://www.fsf.org/licensing/licenses/gpl.html for its full text) are these two:

  1. Any derivative work—that is, any work containing a nontrivial amount of GPLed code—must itself be distributed under the GPL.

  2. No additional restrictions may be placed on the redistribution of either the original work or a derivative work. (The exact language is: "You may not impose any further restrictions on the recipients' exercise of the rights granted herein.")

With these conditions, the GPL succeeds in making freedom contagious. Once a program is copyrighted under the GPL, its terms of redistribution are viral—they are passed on to anything else the code gets incorporated into, making it effectively impossible to use GPLed code in closed-source programs. However, these same clauses also make the GPL incompatible with certain other free licenses. The usual way this happens is that the other license imposes a requirement—for example, a credit clause requiring the original authors to be mentioned in some way—that is incompatible with the GPL's "You may not impose any further restrictions..." language. From the point of view of the Free Software Foundation, these second-order consequences are desirable, or at least not regrettable. The GPL not only keeps your software free, but effectively makes your software an agent in pushing other software to enforce freedom as well.

The question of whether or not this is a good way to promote free software is one of the most persistent holy wars on the Internet (see la section intitulée « Avoid Holy Wars » in Chapitre 6, Communications), and we won't investigate it here. What's important for our purposes is that GPL compatibility is an important issue when choosing a license. The GPL is by far the most popular open source license; at http://freshmeat.net/stats/#license, it is at 68%, and the next highest license is at 6%. If you want your code to be able to be mixed freely with GPLed code—and there's a lot of GPLed code out there—then you should pick a GPL-compatible license. Most of the GPL-compatible open source licenses are also proprietary-compatible: that is, code under such a license can be used in a GPLed program, and it can be used in a proprietary program. Of course, the results of these mixings would not be compatible with each other, since one would be under the GPL and the other would be under a closed-source license. But that concern applies only to the derivative works, not to the code you distribute in the first place.

Fortunately, the Free Software Foundation maintains a list showing which licenses are compatible with the GPL and which are not, at http://www.gnu.org/licenses/license-list.html. All of the licenses discussed in this chapter are present on that list, on one side or the other.

Choosing a License

When choosing a license to apply to your project, if at all possible use an existing license instead of making up a new one. There are two reasons why existing licenses are better:

  • Familiarity. If you use one of the three or four most popular licenses, people won't feel they have to read the legalese in order to use your code, because they'll have already done so for that license a long time ago.

  • Quality. Unless you have a team of lawyers at your disposal, you are unlikely to come up with a legally solid license. The licenses mentioned here are the products of much thought and experience; unless your project has truly unusual needs, it is unlikely you would do better.

To apply one of these licenses to your project, see la section intitulée « Comment mettre en oeuvre cette licence au projet » in Chapitre 2, Genèse d'un projet.

The MIT / X Window System License

If your goal is that your code be accessible by the greatest possible number of developers and derivative works, and you do not mind the code being used in proprietary programs, choose the MIT / X Window System license (so named because it is the license under which the Massachusetts Institute of Technology released the original X Window System code). This license's basic message is "You are free to use this code however you want." It is compatible with the GNU GPL, and it is short, simple, and easy to understand:

Copyright (c) <year> <copyright holders>

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

(Taken from http://www.opensource.org/licenses/mit-license.php.)

The GNU General Public License

If you prefer that your project's code not be used in proprietary programs, or if you at least don't care whether or not it can be used in proprietary programs, choose the GNU General Public License (http://www.fsf.org/licensing/licenses/gpl.html). The GPL is probably the most widely-used free software license in the world today; this instant recognizability is itself one of the GPL's major advantages.

When writing a code library that is meant mainly to be used as part of other programs, consider carefully whether the restrictions imposed by the GPL are in line with your project's goals. In some cases—for example, when you're trying to unseat a competing, proprietary library that does the same thing—it may make more strategic sense to license your code in such a way that it can be mixed into proprietary programs, even though you would otherwise not wish this. The Free Software Foundation even fashioned an alternative to the GPL for such circumstances: the GNU Library GPL, later renamed to the GNU Lesser GPL (most people just use the acronym LGPL, in any case). The LGPL has looser restrictions than the GPL, and can be mixed more easily with non-free code. However, it's also a bit complex and takes some time to understand, so if you're not going to use the GPL, I recommend just using the MIT/X-style license.

Is the GPL free or not free?

One consequence of choosing the GPL is the possibility—small, but not infinitely small—of finding yourself or your project embroiled in a dispute about whether or not the GPL is truly "free", given that it places some restrictions on what you can do with the code—namely, the restriction that the code cannot be distributed under any other license. For some people, the existence of this restriction means the GPL is "less free" than more permissive licenses such as the MIT/X license. Where this argument usually goes, of course, is that since "more free" must be better than "less free" (after all, who's not in favor of freedom?), it follows that those licenses are better than the GPL.

This debate is another popular holy war (see la section intitulée « Avoid Holy Wars » in Chapitre 6, Communications). Avoid participating in it, at least in project forums. Don't attempt to prove that the GPL is less free, as free, or more free than other licenses. Instead, emphasize the specific reasons your project chose the GPL. If the recognizability of license was a reason, say that. If the enforcement of a free license on derivative works was also a reason, say that too, but refuse to be drawn into discussion about whether this makes the code more or less "free". Freedom is a complex topic, and there is little point talking about it if terminology is going to be used as a stalking horse for substance.

Since this is a book and not a mailing list thread, however, I will admit that I've never understood the "GPL is not free" argument. The only restriction the GPL imposes is that it prevents people from imposing further restrictions. To say that this results in less freedom has always seemed to me like saying that outlawing slavery reduces freedom, because it prevents some people from owning slaves.

(Oh, and if you do get drawn into a debate about it, don't raise the stakes by making inflammatory analogies.)

What About The BSD License?

A fair amount of open source software is distributed under a BSD license (or sometimes a BSD-style license). The original BSD license was used for the Berkeley Software Distribution, in which the University of California released important portions of a Unix implementation. This license (the exact text may be seen in section 2.2.2 of http://www.xfree86.org/3.3.6/COPYRIGHT2.html#6) was similar in spirit to the MIT/X license, except for one clause:

All advertising materials mentioning features or use of this software must display the following acknowledgement: This product includes software developed by the University of California, Lawrence Berkeley Laboratory.

The presence of that clause not only made the original BSD license GPL-incompatible, it also set a dangerous precedent: as other organizations put similar advertising clauses into their free software—substituting their own organization's name in place of "the University of California, Lawrence Berkeley Laboratory"—software redistributors faced an ever-increasing burden in what they were required to display. Fortunately, many of the projects that used this license became aware of the problem, and simply dropped the advertising clause. In 1999, even the University of California did so.

The result is the revised BSD license, which is simply the original BSD license with the advertising clause removed. However, this history makes the phrase "BSD license" a bit ambiguous: does it refer to the original, or the revised version? This is why I prefer the MIT/X license, which is essentially equivalent, and which does not suffer from any ambiguity. However, there is perhaps one reason to prefer the revised BSD license to the MIT/X license, which is that the BSD includes this clause:

Neither the name of the <ORGANIZATION> nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

It's not clear that without such a clause, a recipient of the software would have had the right to use the licensor's name anyway, but the clause removes any possible doubt. For organizations worried about trademark control, therefore, the revised BSD license may be slightly preferable to MIT/X. In general, however, a liberal copyright license does not imply that recipients have any right to use or dilute your trademarks — copyright law and trademark law are two different beasts.

If you wish to use the revised BSD license, a template is available at http://www.opensource.org/licenses/bsd-license.php.

Copyright Assignment and Ownership

There are three ways to handle copyright ownership of free code contributed to by many people. The first is to ignore the issue of copyright entirely (I don't recommend this). The second is to collect a contributor license agreement (CLA) from each person who works on the project, explicitly granting the project the right to use that person's code. This is usually enough for most projects, and the nice thing is that in some jurisdictions, CLAs can be sent in by email. The third way is to get actual copyright assignments from contributors, so that the project (i.e., some legal entity, usually a nonprofit) is the copyright owner for everything. This is the most legally airtight way, but it's also the most burdensome for contributors; only a few projects insist on it.

Note that even under centralized copyright ownership, the code remains free, because open source licenses do not give the copyright holder the right to retroactively proprietize all copies of the code. So even if the project, as a legal entity, were to suddenly turn around and started distributing all the code under a restrictive license, that wouldn't cause a problem for the public community. The other developers would simply start a fork based on the latest free copy of the code, and continue as if nothing had happened. Because they know they can do this, most contributors cooperate when asked to sign a CLA or an assignment of copyright.

Doing Nothing

Most projects never collect CLAs or copyright assignments from their contributors. Instead, they accept code whenever it seems reasonably clear that the contributor intended it to be incorporated into the project.

Under normal circumstances, this is okay. But every now and then, someone may decide to sue for copyright infringement, alleging that they are the true owner of the code in question and that they never agreed to its being distributed by the project under an open source license. For example, the SCO Group did something like this to the Linux project, see http://en.wikipedia.org/wiki/SCO-Linux_controversies for details. When this happens, the project will have no documentation showing that the contributor formally granted the right to use the code, which could make some legal defenses more difficult.

Contributor License Agreements

CLAs probably offer the best tradeoff between safety and convenience. A CLA is typically an electronic form that a developer fills out and sends in to the project. In many jurisdictions, email submission is enough. A secure digital signature may or may not be required; consult a lawyer to find out what method would be best for your project.

Most projects use two slightly different CLAs, one for individuals, and one for corporate contributors. But in both types, the core language is the same: the contributor grants the project "...perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute [the] Contributions and such derivative works." Again, you should have a lawyer approve any CLA, but if you get all those adjectives into it, you're probably fine.

When you request CLAs from contributors, make sure to emphasize that you are not asking for actual copyright assignment. In fact, many CLAs start out by reminding the reader of this:

This is a license agreement only; it does not transfer copyright ownership and does not change your rights to use your own Contributions for any other purpose.

Here are some examples:

Transfer of Copyright

Copyright transfer means that the contributor assigns to the project copyright ownership on her contributions. Ideally, this is done on paper and either faxed or snail-mailed to the project.

Some projects insist on full assignment because having a single legal entity own the copyright on the entire code base can be useful if the terms of the open source license ever need to be enforced in court. If no single entity has the right to do it, all the contributors might have to cooperate, but some might not have time or even be reachable when the issue arises.

Different organizations apply different amounts of rigor to the task of collecting assignments. Some simply get an informal statement from a contributor on a public list mailing list—something to the effect of "I hereby assign copyright in this code to the project, to be licensed under the same terms as the rest of the code." At least one lawyer I've talked to says that's really enough, presumably because it happens in a context where copyright assignment is normal and expected anyway, and because it represents a bona fide effort on the project's part to ascertain the developer's true intentions. On the other hand, the Free Software Foundation goes to the opposite extreme: they require contributors to physically sign and mail in a piece of paper containing a formal statement of copyright assignment, sometimes for just one contribution, sometimes for current and future contributions. If the developer is employed, the FSF asks that the employer sign it too.

The FSF's paranoia is understandable. If someone violates the terms of the GPL by incorporating some of their software into a proprietary program, the FSF will need to fight that in court, and they want their copyrights to be as airtight as possible when that happens. Since the FSF is copyright holder for a lot of popular software, they view this as a real possibility. Whether your organization needs to be similarly scrupulous is something only you can decide, in consultation with lawyers. In general, unless there's some specific reason why your project needs full copyright assignment, just go with CLAs; they're easier for everyone.

Dual Licensing Schemes

Some projects try to fund themselves by using a dual licensing scheme, in which proprietary derivative works may pay the copyright holder for the right to use the code, but the code still remains free for use by open source projects. This tends to work better with code libraries than with standalone applications, naturally. The exact terms differ from case to case. Often the license for the free side is the GNU GPL, since it already bars others from incorporating the covered code into their proprietary product without permission from the copyright holder, but sometimes it is a custom license that has the same effect. An example of the former is the MySQL license, described at http://www.mysql.com/company/legal/licensing/; an example of the latter is Sleepycat Software's licensing strategy, described at http://www.sleepycat.com/download/licensinginfo.shtml.

You might be wondering: how can the copyright holder offer proprietary licensing for a mandatory fee if the terms of the GNU GPL stipulate that the code must be available under less restrictive terms? The answer is that the GPL's terms are something the copyright holder imposes on everyone else; the owner is therefore free to decide not to apply those terms to itself. A good way to think of it is to imagine that the copyright owner has an infinite number of copies of the software stored in a bucket. Each time it takes one out of the bucket to send into the world, it can decide what license to put on it: GPL, proprietary, or something else. Its right to do this is not tied to the GPL or any other open source license; it is simply a power granted by copyright law.

The attractiveness of dual licensing is that, at its best, it provides a way for a free software project to get a reliable income stream. Unfortunately, it can also interfere with the normal dynamics of open source projects. The problem is that any volunteer who makes a code contribution is now contributing to two distinct entities: the free version of the code and the proprietary version. While the contributor will be comfortable contributing to the free version, since that's the norm in open source projects, she may feel funny about contributing to someone else's semi-proprietary revenue stream. The awkwardness is exacerbated by the fact that in dual licensing, the copyright owner really needs to gather formal, signed copyright assignments from all contributors, in order to protect itself from a disgruntled contributor later claiming a percentage of royalties from the proprietary stream. The process of collecting these assignment papers means that contributors are starkly confronted with the fact that they are doing work that makes money for someone else.

Not all volunteers will be bothered by this; after all, their contributions go into the open source edition as well, and that may be where their main interest lies. Nevertheless, dual licensing is an instance of the copyright holder assigning itself a special right that others in the project do not have, and is thus bound to raise tensions at some point, at least with some volunteers.

What seems to happen in practice is that companies based on dual licensed software do not have truly egalitarian development communities. They get small-scale bug fixes and cleanup patches from external sources, but end up doing most of the hard work with internal resources. For example, Zack Urlocker, vice president of marketing at MySQL, told me that the company generally ends up hiring the most active volunteers anyway. Thus, although the product itself is open source, licensed under the GPL, its development is more or less controlled by the company, albeit with the (extremely unlikely) possibility that someone truly dissatisfied with the company's handling of the software could fork the project. To what degree this threat preëmptively shapes the company's policies I don't know, but at any rate, MySQL does not seem to be having acceptance problems either in the open source world or beyond.

Patents

Software patents are the lightning rod issue of the moment in free software, because they pose the only real threat against which the free software community cannot defend itself. Copyright and trademark problems can always be gotten around. If part of your code looks like it may infringe on someone else's copyright, you can just rewrite that part. If it turns out someone has a trademark on your project's name, at the very worst you can just rename the project. Although changing names would be a temporary inconvenience, it wouldn't matter in the long run, since the code itself would still do what it always did.

But a patent is a blanket injunction against implementing a certain idea. It doesn't matter who writes the code, nor even what programming language is used. Once someone has accused a free software project of infringing a patent, the project must either stop implementing that particular feature, or face an expensive and time-consuming lawsuit. Since the instigators of such lawsuits are usually corporations with deep pockets—that's who has the resources and inclination to acquire patents in the first place—most free software projects cannot afford the latter possibility, and must capitulate immediately even if they think it highly likely that the patent would be unenforceable in court. To avoid getting into such a situation in the first place, free software projects are starting to code defensively, avoiding patented algorithms in advance even when they are the best or only available solution to a programming problem.[25]

Surveys and anecdotal evidence show that not only the vast majority of open source programmers, but a majority of all programmers, think that software patents should be abolished entirely.[26] Open source programmers tend to feel particularly strongly about it, and may refuse to work on projects that are too closely associated with the collection or enforcement of software patents. If your organization collects software patents, then make it clear, in a public and irrevocable way, that the patents would never be enforced on open source projects, and that they are only to be used as a defense in case some other party initiates an infringement suit against your organization. This is not only the right thing to do, it's also good open source public relations.[27]

Unfortunately, collecting patents for defensive purposes is a rational action. The current patent system, at least in the United States, is by its nature an arms race: if your competitors have acquired a lot of patents, then your best defense is to acquire a lot of patents yourself, so that if you're ever hit with a patent infringement suit you can respond with a similar threat—then the two parties usually sit down and work out a cross-licensing deal so that neither of them has to pay anything, except to their intellectual property lawyers of course.

The harm done to free software by software patents is more insidious than just direct threats to code development, however. Software patents encourage an atmosphere of secrecy among firmware designers, who justifiably worry that by publishing details of their interfaces they will be giving technical help to competitors seeking to slap them with patent infringement suits. This is not just a theoretical danger; it has apparently been happening for a long time in the video card industry, for example. Many video card manufacturers are reluctant to release the detailed programming specifications needed to produce high-performance open source drivers for their cards, thus making it impossible for free operating systems to support those cards to their full potential. Why would the manufacturers do this? It doesn't make sense for them to work against software support; after all, compatibility with more operating systems can only mean more card sales. But it turns out that, behind the design room door, these shops are all violating one another's patents, sometimes knowingly and sometimes accidentally. The patents are so unpredictable and so potentially broad that no card manufacturer can ever be certain it's safe, even after doing a patent search. Thus, manufacturers dare not publish their full interface specifications, since that would make it much easier for competitors to figure out whether any patents are being infringed. (Of course, the nature of this situation is such that you will not find a written admission from a primary source that it is going on; I learned it through a personal communication.)

Some free software licenses have special clauses to combat, or at least discourage, software patents. The GNU GPL, for example, contains this language:

  7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License.  If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all.  For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.

[...]

It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices.  Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.

The Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0) also contains anti-patent requirements. First, it stipulates that anyone distributing code under the license must implicitly include a royalty-free patent license for any patents they might hold that could apply to the code. Second, and most ingeniously, it punishes anyone who initiates a patent infringement claim on the covered work, by automatically terminating their implicit patent license the moment such a claim is made:

3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except
as stated in this section) patent license to make, have made, use,
offer to sell, sell, import, and otherwise transfer the Work, where
such license applies only to those patent claims licensable by such
Contributor that are necessarily infringed by their Contribution(s)
alone or by combination of their Contribution(s) with the Work to
which such Contribution(s) was submitted. If You institute patent
litigation against any entity (including a cross-claim or counterclaim
in a lawsuit) alleging that the Work or a Contribution incorporated
within the Work constitutes direct or contributory patent
infringement, then any patent licenses granted to You under this
License for that Work shall terminate as of the date such litigation
is filed.

Although it is useful, both legally and politically, to build patent defenses into free software licenses this way, in the end these steps will not be enough to dispel the chilling effect that the threat of patent lawsuits has on free software. Only changes in the substance or interpretation of international patent law will do that. To learn more about the problem, and how it's being fought, go to http://www.nosoftwarepatents.com/. The Wikipedia article http://en.wikipedia.org/wiki/Software_patent also has a lot of useful information on software patents. I've also written a blog post summarizing the arguments against software patents, at http://www.rants.org/2007/05/01/how-to-tell-that-software-patents-are-a-bad-idea/.

Further Resources

This chapter has only been an introduction to free software licensing issues. Although I hope it contains enough information to get you started on your own open source project, any serious investigation of licensing issues will quickly exhaust what this book can provide. Here is a list of further resources on open source licensing:

  • Understanding Open Source and Free Software Licensing by Andrew M. St. Laurent. Published by O'Reilly Media, first edition August 2004, ISBN: 0-596-00581-4.

    This is a full-length book on open source licensing in all its complexity, including many topics omitted from this chapter. See http://www.oreilly.com/catalog/osfreesoft/ for details.

  • Make Your Open Source Software GPL-Compatible. Or Else. by David A. Wheeler, at http://www.dwheeler.com/essays/gpl-compatible.html.

    This is a detailed and well-written article on why it is important to use a GPL-compatible license even if you don't use the GPL itself. The article also touches on many other licensing questions, and has a high density of excellent links.

  • http://creativecommons.org/

    Creative Commons is an organization that promotes a range of more flexible and liberal copyrights than traditional copyright practice encourages. They offer licenses not just for software, but for text, art, and music as well, all accessible via a user-friendly license selector; some of the licenses are copylefts, some are non-copyleft but still free, others are simply traditional copyrights but with some restrictions relaxed. The Creative Commons web site gives extremely clear explanations of what it's about. If I had to pick one site to demonstrate the broader philosophical implications of the free software movement, this would be it.



[25] Sun Microsystems and IBM have also made at least a gesture at the problem from the other direction, by freeing large numbers of software patents—1600 and 500 respectively—for use by the open source community. I am not a lawyer and thus can't evaluate the real utility of these grants, but even if they are all important patents, and the terms of the grants make them truly free for use by any open source project, it would still be only a drop in the bucket.

[27] For example, RedHat has pledged that open source projects are safe from its patents, see http://www.redhat.com/legal/patent_policy.html.

Annexe A. Free Version Control Systems

These are all the open source version control systems I was aware of as of mid-2007. The only one I use on a regular basis is Subversion. I have little or no experience with most of these systems, except for Subversion and CVS; the information here is taken from their web sites. See also http://en.wikipedia.org/wiki/List_of_revision_control_software.

CVS has been around for a long time, and many developers are already familiar with it. In its day it was revolutionary: it was the first open source version control system with wide-area network access for developers (as far as I know), and the first to offer anonymous read-only checkouts, which gave new developers an easy way to get involved in projects. CVS versions files only, not directories; it offers branching, tagging, and good client-side performance, but doesn't handle large files or binary files very well. It also does not support atomic commits.[Disclaimer: I was active in CVS development for about five years, before helping to start the Subversion project to replace it.]

Subversion was written first and foremost to be a replacement for CVS—that is, to approach version control in roughly the same way CVS does, but without the problems and feature omissions that most frequently annoy users of CVS. One of Subversion's goals is for people already accustomed to CVS to find the transition to Subversion relatively smooth. There is not space here to go into detail about Subversion's features; see its web site for more information. [Disclaimer: I am involved in Subversion development, and it is the only one of these systems that I use on a regular basis.]

Although it is built on top of Subversion, SVK probably resembles some of the decentralized systems below more than it does Subversion. SVK supports distributed development, local commits, sophisticated change merging, and the ability to mirror trees from non-SVK version control systems. See its web site for details.

Mercurial is a distributed version control system that offers, among other things, "complete cross-indexing of files and changesets; bandwidth and CPU efficient HTTP and SSH sync protocols; arbitrary merging between developer branches; integrated stand-alone web interface; [portability to] UNIX, MacOS X, and Windows" and more (the preceding feature list was paraphrased from the Mercurial web site).

GIT is a project started by Linus Torvalds to manage the Linux kernel source tree. At first GIT was rather narrowly focused on the needs of kernel development, but it has expanded beyond that and is now used by projects other than the Linux kernel. Its home page says it is "...designed to handle very large projects with speed and efficiency; it is used mainly for various open source projects, most notably the Linux kernel. Git falls in the category of distributed source code management tools, similar to e.g. GNU Arch or Monotone (or BitKeeper in the proprietary world). Every Git working directory is a full-fledged repository with full revision tracking capabilities, not dependent on network access or a central server."

Bazaar is still under development. It will be an implementation of the GNU Arch protocol, will retain compatability with the GNU Arch protocol as it evolves, and work with the GNU Arch community process for any protocol changes that might be required for user friendliness.

Bazaar-NG — http://bazaar-ng.org/

Bazaar-NG (or bzr) is currently under development by Canonical (http://canonical.com/). It offers a choice between centralized and decentralized work within a single project. For example, when in the office, you can work on a shared central branch; for experimental changes or offline work, you can make a branch on your laptop and merge back in later.

"David's Advanced Revision Control System is yet another replacement for CVS. It is written in Haskell, and has been used on Linux, MacOS X, FreeBSD, OpenBSD and Microsoft Windows. Darcs includes a cgi script, which can be used to view the contents of your repository."

GNU Arch supports both distributed and centralized development. Developers commit their changes to an "archive", which may be local, and the changes can be pushed and pulled to other archives as the managers of those archives see fit. As such a methodology implies, Arch has more sophisticated merge support than CVS. Arch also allows one to easily make branches of archives to which one does not have commit access. This is only a brief summary; see the Arch web pages for details.

"monotone is a free distributed version control system. it provides a simple, single-file transactional version store, with fully disconnected operation and an efficient peer-to-peer synchronization protocol. it understands history-sensitive merging, lightweight branches, integrated code review and 3rd party testing. it uses cryptographic version naming and client-side RSA certificates. it has good internationalization support, has no external dependencies, runs on linux, solaris, OSX, and windows, and is licensed under the GNU GPL."

Codeville — http://codeville.org/

"Why yet another version control system? All other version control systems require that you keep careful track of the relationships between branches so as not have to repeatedly merge the same conflicts. Codeville is much more anarchic. It allows you to update from or commit to any repository at any time with no unnecessary re-merges."

"Codeville works by creating an identifier for each change which is done, and remembering the list of all changes which have been applied to each file and the last change which modified each line in each file. When there's a conflict, it checks to see if one of the two sides has already been applied to the other one, and if so makes the other side win automatically. When there's an actual not automatically mergeable version conflict, Codeville behaves in almost exactly the same way as CVS."

"Vesta is a portable SCM [Software Configuration Management] system targeted at supporting development of software systems of almost any size, from fairly small (under 10,000 source lines) to very large (10,000,000 source lines)."

"Vesta is a mature system. It is the result of over 10 years of research and development at the Compaq/Digital Systems Research Center, and it was in production use by Compaq's Alpha microprocessor group for over two and a half years. The Alpha group had over 150 active developers at two sites thousands of miles apart, on the east and west coasts of the United States. The group used Vesta to manage builds with as much as 130 MB of source data, each producing 1.5 GB of derived data. The builds done at the eastern site in an average day produced about 10-15 GB of derived data, all managed by Vesta. Although Vesta was designed with software development in mind, the Alpha group demonstrated the system's flexibility by using it for hardware development, checking their hardware description language files into Vesta's source code control facility and building simulators and other derived objects with Vesta's builder. The members of the former Alpha group, now a part of Intel, are continuing to use Vesta today in a new microprocessor project."

"Aegis is a transaction-based software configuration management system. It provides a framework within which a team of developers may work on many changes to a program independently, and Aegis coordinates integrating these changes back into the master source of the program, with as little disruption as possible."

CVSNT — http://cvsnt.org/

"CVSNT is an advanced multiplatform version control system. Compatible with the industry standard CVS protocol it now supports many more features. ... CVSNT is Open Source, Free software licensed under the GNU General Public License." Its feature list includes authentication via all standard CVS protocols, plus Windows specific SSPI and Active Directory; secure transport support, via sserver or encrypted SSPI; cross platform (runs in Windows or Unix environments); NT version is fully integrated with Win32 system; MergePoint processing means no more tagging to merge; under active development.

"Meta-CVS is a version control system built around CVS. Although it retains most of the features of CVS, including all of the networking support, it is more capable than CVS, and easier to use." The features listed on META-CVS's web site include: directory structure versioning, improved file type handling, simpler and more user-friendly branching and merging, support for symbolic links, property lists attached to versioned data, improved third-party data importing, and easy upgrading from stock CVS.

"OpenCM is designed as a secure, high-integrity replacement for CVS. A list of the key features can be found on the features page. While not as 'feature rich' as CVS, it supports some useful things that CVS lacks. Briefly, OpenCM provides first-class support for renames and configuration, cryptographic authentication and access control, and first-class branches."

"Stellation is an advanced, extensible software configuration management system, originally developed at IBM Research. While Stellation provides all of the standard functions available in any SCM system, it is distinguished by a number of advanced features, such as task-oriented change management, consistent project versioning and lightweight branching, intended to ease the development of software systems by large groups of loosely coordinated developers."

"PRCS, the Project Revision Control System, is the front end to a set of tools that (like CVS) provide a way to deal with sets of files and directories as an entity, preserving coherent versions of the entire set. ... Its purpose is similar to that of SCCS, RCS, and CVS, but (according to its authors, at least), it is much simpler than any of those systems."

ArX is a distributed version control system offering branching and merging features, cryptographic data integrity verification, and the ability to publish archives easily on any HTTP server.

SourceJammer — http://sourcejammer.org/

"SourceJammer is a source control and versioning system written in Java. It consists of a server-side component that maintains the files and version history, and handles check-in, check-out, etc. and other commands; and a client-side component that makes requests of the server and manages the files on the client-side file system."

"A 'modern' system that uses changesets over file revisions and distributed operation rather than centralized control. As long as you have an e-mail account you can use FastCST. For larger distribution you only need an FTP server and/or an HTTP server or use the built in 'serve' command to serve your stuff up directly. All changesets are universally unique and have tons of meta-data so you can reject anything you don't [want] before you try it. Merging is done by comparing a merged changeset against the current directory contents, rather than trying to merge it with another changeset."

Superversion — http://www.superversion.org/

"Superversion is a multi-user distributed version control system based on change sets. It aims to be an industrial-strength, open source alternative to commercial solutions that is equally easy to use (or even easier) and similarly powerful. In fact, intuitive and efficient usability has been one of the top priorities in Superversion's development from the very beginning."

Annexe B. Free Bug Trackers

No matter what bug tracker a project uses, some developers always like to complain about it. This seems to be more true of bug trackers than of any other standard development tool. I think it's because bug trackers are so visual and so interactive that it's easy to imagine the improvements one would make (if one only had the time), and to describe those improvements out loud. Take the inevitable complaints with a grain of salt—many of the trackers below are pretty good.

Throughout these listings, the word "issue" is used to refer to the items the trackers track. But remember that each system may have its own terminology, in which the corresponding term might be "artifact" or "bug" or something else.

Bugzilla is very popular, actively maintained, and seems to make its users pretty happy. I've been using a modified variant of it in my work for four years now, and like it. It's not highly customizable, but in a odd way, that may be one of its features: Bugzilla installations tend to look pretty much the same wherever they are found, which means many developers are already accustomed to its interface and will feel they are in familiar territory.

GNU GNATS is one of the oldest open source bug trackers, and is widely used. Its biggest strengths are interface diversity (it can be used not just through a web browser, but also through email or command-line tools), and plaintext issue storage. The fact that all issue data is stored in text files on disk makes it easier to write custom tools to trawl and parse the data (for example, to generate statistical reports). GNATS can also absorb emails automatically by various means, and add them to the appropriate issues based on patterns in the email headers, which makes logging user/developer conversations very easy.

RequestTracker (RT) — http://www.bestpractical.com/rt/

RT's web site says "RT is an enterprise-grade ticketing system which enables a group of people to intelligently and efficiently manage tasks, issues, and requests submitted by a community of users," and that about sums it up. RT has a fairly polished web interface, and seems to have a pretty wide installed base. The interface is a bit visually complex, but that becomes less distracting as you get used to it. RT is licenced under the GNU GPL (for some reason, their web site doesn't make this clear).

Trac is a bit more than a bug tracker: it's really an integrated wiki and bug tracking system. It uses wiki linking to connect issues, files, version control changesets, and plain wiki pages. It's fairly simple to set up, and integrates with Subversion (see Annexe A, Free Version Control Systems).

Roundup is pretty easy to install (only Python 2.1 or higher is required), and simple to use. It has web, email, and command-line interfaces. The issue data templates and web interface are customizable, as is some of its state-transition logic.

Mantis is a web-based bug tracking system, written in PHP, and using MySQL database for storage. It has the features you'd expect. Personally, I find the web interface clean, intuitive, and easy on the eyes.

Scarab is meant to be a highly customizable, full-featured bug tracker, offering more or less the union of the features offered by other bug trackers: data entry, queries, reports, notifications to interested parties, collaborative accumulation of comments, and dependency tracking.

It is customizable through administrative web pages. You can have multiple "modules" (projects) active in a single Scarab installation. Within a given module, you can create new issue types (defects, enhancements, tasks, support requests, etc.), and add arbitrary attributes, to tune the tracker to your project's specific requirements.

As of late 2004, Scarab was getting close to its 1.0 release.

Debian Bug Tracking System (DBTS) — http://www.chiark.greenend.org.uk/~ian/debbugs/

The Debian Bug Tracking System is unusual in that all input and manipulation of issues is done via email: each issue gets its own dedicated email address. The DBTS scales pretty well: http://bugs.debian.org/ has 277,741 issues, for example.

Since interaction is done via regular mail clients, an environment which is familiar and easily accessible to most people, the DBTS is good for handling high volumes of incoming reports that need quick classification and response. There are disadvantages too, of course. Developers must invest the time needed to learn the email command system, and users must write their bug reports without a web form to guide them in choosing what information to write. There are tools available to help users send better bug reports, such as the command-line reportbug program or the debbugs-el package for Emacs. But most people won't use these tools; they'll just write email manually, and they may or may not follow the bug reporting guidelines posted by your project.

The DBTS has a read-only web interface, for viewing and querying issues.

Trouble-Ticket Trackers

These are more oriented toward help desk ticket tracking than software bug tracking. You'll probably do better with a regular bug tracker, but these are listed for the sake of completeness, and because there could conceivably be unusual projects for which a trouble-ticket system might be more appropriate than a traditional bug tracker.

Bluetail Ticket Tracker (BTT) — http://btt.sourceforge.net/

BTT is somewhere between a standard trouble-ticket tracker and a bug tracker. It offers privacy features that are somewhat unusual among open source bug trackers: users of the system are categorized as Staff, Friend, Customer, or Anonymous, and more or less data is available depending on one's category. It offers some email integration, a command-line interface, and mechanisms for converting emails into tickets. It also has features for maintaining information not associated with any specific ticket, such as internal documentation or FAQs.

Annexe C. Why Should I Care What Color the Bikeshed Is?

You shouldn't; it doesn't really matter, and you have better things to spend your time on.

Poul-Henning Kamp's famous "bikeshed" post (an excerpt from which appears in Chapitre 6, Communications) is an eloquent disquisition on what tends to go wrong in group discussions. It is reprinted here with his permission. The orginal URL is http://www.freebsd.org/cgi/getmsg.cgi?fetch=506636+517178+/usr/local/www/db/text/1999/freebsd-hackers/19991003.freebsd-hackers.

Subject: A bike shed (any colour will do) on greener grass...
From: Poul-Henning Kamp <phk@freebsd.org>
Date: Sat, 02 Oct 1999 16:14:10 +0200
Message-ID: <18238.938873650@critter.freebsd.dk>
Sender: phk@critter.freebsd.dk
Bcc: Blind Distribution List: ;
MIME-Version: 1.0


[bcc'ed to committers, hackers]

My last pamphlet was sufficiently well received that I was not
scared away from sending another one, and today I have the time
and inclination to do so.

I've had a little trouble with deciding on the right distribution
of this kind of stuff, this time it is bcc'ed to committers and
hackers, that is probably the best I can do.  I'm not subscribed
to hackers myself but more on that later.

The thing which have triggered me this time is the "sleep(1) should
do fractional seconds" thread, which have pestered our lives for
many days now, it's probably already a couple of weeks, I can't
even be bothered to check.

To those of you who have missed this particular thread: Congratulations.

It was a proposal to make sleep(1) DTRT if given a non-integer
argument that set this particular grass-fire off.  I'm not going
to say anymore about it than that, because it is a much smaller
item than one would expect from the length of the thread, and it
has already received far more attention than some of the *problems*
we have around here.

The sleep(1) saga is the most blatant example of a bike shed
discussion we have had ever in FreeBSD.  The proposal was well
thought out, we would gain compatibility with OpenBSD and NetBSD,
and still be fully compatible with any code anyone ever wrote.

Yet so many objections, proposals and changes were raised and
launched that one would think the change would have plugged all
the holes in swiss cheese or changed the taste of Coca Cola or
something similar serious.

"What is it about this bike shed ?" Some of you have asked me.

It's a long story, or rather it's an old story, but it is quite
short actually.  C. Northcote Parkinson wrote a book in the early
1960'ies, called "Parkinson's Law", which contains a lot of insight
into the dynamics of management.

You can find it on Amazon, and maybe also in your dads book-shelf,
it is well worth its price and the time to read it either way,
if you like Dilbert, you'll like Parkinson.

Somebody recently told me that he had read it and found that only
about 50% of it applied these days.  That is pretty darn good I
would say, many of the modern management books have hit-rates a
lot lower than that, and this one is 35+ years old.

In the specific example involving the bike shed, the other vital
component is an atomic power-plant, I guess that illustrates the
age of the book.

Parkinson shows how you can go in to the board of directors and
get approval for building a multi-million or even billion dollar
atomic power plant, but if you want to build a bike shed you will
be tangled up in endless discussions.

Parkinson explains that this is because an atomic plant is so vast,
so expensive and so complicated that people cannot grasp it, and
rather than try, they fall back on the assumption that somebody
else checked all the details before it got this far.   Richard P.
Feynmann gives a couple of interesting, and very much to the point,
examples relating to Los Alamos in his books.

A bike shed on the other hand.  Anyone can build one of those over
a weekend, and still have time to watch the game on TV.  So no
matter how well prepared, no matter how reasonable you are with
your proposal, somebody will seize the chance to show that he is
doing his job, that he is paying attention, that he is *here*.

In Denmark we call it "setting your fingerprint".  It is about
personal pride and prestige, it is about being able to point
somewhere and say "There!  *I* did that."  It is a strong trait in
politicians, but present in most people given the chance.  Just
think about footsteps in wet cement.

I bow my head in respect to the original proposer because he stuck
to his guns through this carpet blanking from the peanut gallery,
and the change is in our tree today.  I would have turned my back
and walked away after less than a handful of messages in that
thread.

And that brings me, as I promised earlier, to why I am not subscribed
to -hackers:

I un-subscribed from -hackers several years ago, because I could
not keep up with the email load.  Since then I have dropped off
several other lists as well for the very same reason.

And I still get a lot of email.  A lot of it gets routed to /dev/null
by filters:  People like [omitted] will never make it onto my
screen, commits to documents in languages I don't understand
likewise, commits to ports as such.  All these things and more go
the winter way without me ever even knowing about it.

But despite these sharp teeth under my mailbox I still get too much
email.

This is where the greener grass comes into the picture:

I wish we could reduce the amount of noise in our lists and I wish
we could let people build a bike shed every so often, and I don't
really care what colour they paint it.

The first of these wishes is about being civil, sensitive and 
intelligent in our use of email.

If I could concisely and precisely define a set of criteria for
when one should and when one should not reply to an email so that
everybody would agree and abide by it, I would be a happy man, but
I am too wise to even attempt that.

But let me suggest a few pop-up windows I would like to see
mail-programs implement whenever people send or reply to email
to the lists they want me to subscribe to:

      +------------------------------------------------------------+
      | Your email is about to be sent to several hundred thousand |
      | people, who will have to spend at least 10 seconds reading |
      | it before they can decide if it is interesting.  At least  |
      | two man-weeks will be spent reading your email.  Many of   |
      | the recipients will have to pay to download your email.    |
      |                                                            |
      | Are you absolutely sure that your email is of sufficient   |
      | importance to bother all these people ?                    |
      |                                                            |
      |                  [YES]  [REVISE]  [CANCEL]                 |
      +------------------------------------------------------------+

      +------------------------------------------------------------+
      | Warning:  You have not read all emails in this thread yet. |
      | Somebody else may already have said what you are about to  |
      | say in your reply.  Please read the entire thread before   |
      | replying to any email in it.                               |
      |                                                            |
      |                      [CANCEL]                              |
      +------------------------------------------------------------+

      +------------------------------------------------------------+
      | Warning:  Your mail program have not even shown you the    |
      | entire message yet.  Logically it follows that you cannot  |
      | possibly have read it all and understood it.               |
      |                                                            |
      | It is not polite to reply to an email until you have       |
      | read it all and thought about it.                          |
      |                                                            |
      | A cool off timer for this thread will prevent you from     |
      | replying to any email in this thread for the next one hour |
      |                                                            |
      |                       [Cancel]                             |
      +------------------------------------------------------------+

      +------------------------------------------------------------+
      | You composed this email at a rate of more than N.NN cps    |
      | It is generally not possible to think and type at a rate   |
      | faster than A.AA cps, and therefore you reply is likely to |
      | incoherent, badly thought out and/or emotional.            |
      |                                                            |
      | A cool off timer will prevent you from sending any email   |
      | for the next one hour.                                     |
      |                                                            |
      |                       [Cancel]                             |
      +------------------------------------------------------------+

The second part of my wish is more emotional.  Obviously, the
capacities we had manning the unfriendly fire in the sleep(1)
thread, despite their many years with the project, never cared
enough to do this tiny deed, so why are they suddenly so enflamed
by somebody else so much their junior doing it ?

I wish I knew.

I do know that reasoning will have no power to stop such "reactionaire
conservatism".  It may be that these people are frustrated about
their own lack of tangible contribution lately or it may be a bad
case of "we're old and grumpy, WE know how youth should behave".

Either way it is very unproductive for the project, but I have no
suggestions for how to stop it.  The best I can suggest is to refrain
from fuelling the monsters that lurk in the mailing lists:  Ignore
them, don't answer them, forget they're there.

I hope we can get a stronger and broader base of contributors in
FreeBSD, and I hope we together can prevent the grumpy old men
and the [omitted]s of the world from chewing them up, spitting
them out and scaring them away before they ever get a leg to the 
ground.

For the people who have been lurking out there, scared away from
participating by the gargoyles:  I can only apologise and encourage
you to try anyway, this is not the way I want the environment in
the project to be.

Poul-Henning

Annexe D. Example Instructions for Reporting Bugs

This is a lightly-edited copy of the Subversion project's online instructions to new users on how to report bugs. See la section intitulée « Treat Every User as a Potential Volunteer » in Chapitre 8, Managing Volunteers for why it is important that a project have such instructions. The original document is located at http://svn.collab.net/repos/svn/trunk/www/bugs.html.

                       Reporting Bugs in Subversion

This document tells how and where to report bugs. (It is not a list of
all outstanding bugs — you can get that here instead.)

Where To Report A Bug
---------------------

    * If the bug is in Subversion itself, send mail to
      users@subversion.tigris.org. Once it's confirmed as a bug,
      someone, possibly you, can enter it into the issue tracker. (Or
      if you're pretty sure about the bug, go ahead and post directly
      to our development list, dev@subversion.tigris.org. But if
      you're not sure, it's better to post to users@ first; someone
      there can tell you whether the behavior you encountered is
      expected or not.)

    * If the bug is in the APR library, please report it to both of
      these mailing lists: dev@apr.apache.org, dev@subversion.tigris.org.

    * If the bug is in the Neon HTTP library, please report it to:
      neon@webdav.org, dev@subversion.tigris.org.

    * If the bug is in Apache HTTPD 2.0, please report it to both of
      these mailing lists: dev@httpd.apache.org,
      dev@subversion.tigris.org. The Apache httpd developer mailing
      list is high-traffic, so your bug report post has the
      possibility to be overlooked. You may also file a bug report at
      http://httpd.apache.org/bug_report.html.

    * If the bug is in your rug, please give it a hug and keep it snug.

How To Report A Bug
-------------------

First, make sure it's a bug. If Subversion does not behave the way you
expect, look in the documentation and mailing list archives for
evidence that it should behave the way you expect. Of course, if it's
a common-sense thing, like Subversion just destroyed your data and
caused smoke to pour out of your monitor, then you can trust your
judgement. But if you're not sure, go ahead and ask on the users
mailing list first, users@subversion.tigris.org, or ask in IRC,
irc.freenode.net, channel #svn.

Once you've established that it's a bug, the most important thing you
can do is come up with a simple description and reproduction
recipe. For example, if the bug, as you initially found it, involves
five files over ten commits, try to make it happen with just one file
and one commit. The simpler the reproduction recipe, the more likely a
developer is to successfully reproduce the bug and fix it.

When you write up the reproduction recipe, don't just write a prose
description of what you did to make the bug happen. Instead, give a
literal transcript of the exact series of commands you ran, and their
output. Use cut-and-paste to do this. If there are files involved, be
sure to include the names of the files, and even their content if you
think it might be relevant. The very best thing is to package your
reproduction recipe as a script, that helps us a lot.

Quick sanity check: you *are* running the most recent version of
Subversion, right? :-) Possibly the bug has already been fixed; you
should test your reproduction recipe against the most recent
Subversion development tree.

In addition to the reproduction recipe, we'll also need a complete
description of the environment in which you reproduced the bug. That
means:

    * Your operating system
    * The release and/or revision of Subversion
    * The compiler and configuration options you built Subversion with
    * Any private modifications you made to your Subversion
    * The version of Berkeley DB you're running Subversion with, if any
    * Anything else that could possibly be relevant. Err on the side
      of too much information, rather than too little.

Once you have all this, you're ready to write the report. Start out
with a clear description of what the bug is. That is, say how you
expected Subversion to behave, and contrast that with how it actually
behaved. While the bug may seem obvious to you, it may not be so
obvious to someone else, so it's best to avoid a guessing game. Follow
that with the environment description, and the reproduction recipe. If
you also want to include speculation as to the cause, and even a patch
to fix the bug, that's great — see
http://subversion.apache.org/docs/community-guide/#patches for
instructions on sending patches.

Post all of this information to dev@subversion.tigris.org, or if you
have already been there and been asked to file an issue, then go to
the Issue Tracker and follow the instructions there.

Thanks. We know it's a lot of work to file an effective bug report,
but a good report can save hours of a developer's time, and make the
bug much more likely to get fixed.

Annexe E. Copyright

This work is licensed under the Creative Commons
Attribution-ShareAlike License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to
Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305,
USA.  A summary of the license is given below, followed by the full
legal text.  If you wish to distribute some or all of this work under
different terms, please contact the author, Karl Fogel
<kfogel@red-bean.com>.

-*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*-

You are free:

    * to Share — to copy, distribute and transmit the work
    * to Remix — to adapt the work

Under the following conditions:

    * Attribution. You must attribute the work in the manner specified
      by the author or licensor (but not in any way that suggests that
      they endorse you or your use of the work).

    * Share Alike. If you alter, transform, or build upon this work,
      you may distribute the resulting work only under the same,
      similar or a compatible license.

    * For any reuse or distribution, you must make clear to others the
      license terms of this work.  The best way to do this is with a
      link to this web page.

    * Any of the above conditions can be waived if you get permission
      from the copyright holder.

    * Nothing in this license impairs or restricts the author's moral
      rights.

-*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*-

   Creative Commons Legal Code: Attribution-ShareAlike 3.0 Unported

CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
LEGAL SERVICES. DISTRIBUTION OF THIS LICENSE DOES NOT CREATE AN
ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
REGARDING THE INFORMATION PROVIDED, AND DISCLAIMS LIABILITY FOR
DAMAGES RESULTING FROM ITS USE.

License:

THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS
CREATIVE COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS
PROTECTED BY COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE
WORK OTHER THAN AS AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS
PROHIBITED.

BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND
AGREE TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS
LICENSE MAY BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU
THE RIGHTS CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH
TERMS AND CONDITIONS.

1. Definitions

   a. "Adaptation" means a work based upon the Work, or upon the Work
      and other pre-existing works, such as a translation, adaptation,
      derivative work, arrangement of music or other alterations of a
      literary or artistic work, or phonogram or performance and
      includes cinematographic adaptations or any other form in which
      the Work may be recast, transformed, or adapted including in any
      form recognizably derived from the original, except that a work
      that constitutes a Collection will not be considered an
      Adaptation for the purpose of this License. For the avoidance of
      doubt, where the Work is a musical work, performance or
      phonogram, the synchronization of the Work in timed-relation
      with a moving image ("synching") will be considered an
      Adaptation for the purpose of this License.

   b. "Collection" means a collection of literary or artistic works,
      such as encyclopedias and anthologies, or performances,
      phonograms or broadcasts, or other works or subject matter other
      than works listed in Section 1(f) below, which, by reason of the
      selection and arrangement of their contents, constitute
      intellectual creations, in which the Work is included in its
      entirety in unmodified form along with one or more other
      contributions, each constituting separate and independent works
      in themselves, which together are assembled into a collective
      whole. A work that constitutes a Collection will not be
      considered an Adaptation (as defined below) for the purposes of
      this License.

   c. "Creative Commons Compatible License" means a license that is
      listed at http://creativecommons.org/compatiblelicenses that has
      been approved by Creative Commons as being essentially
      equivalent to this License, including, at a minimum, because
      that license: (i) contains terms that have the same purpose,
      meaning and effect as the License Elements of this License; and,
      (ii) explicitly permits the relicensing of adaptations of works
      made available under that license under this License or a
      Creative Commons jurisdiction license with the same License
      Elements as this License.

   d. "Distribute" means to make available to the public the original
      and copies of the Work or Adaptation, as appropriate, through
      sale or other transfer of ownership.

   e. "License Elements" means the following high-level license
      attributes as selected by Licensor and indicated in the title of
      this License: Attribution, ShareAlike.

   f. "Licensor" means the individual, individuals, entity or entities
      that offer(s) the Work under the terms of this License.

   g. "Original Author" means, in the case of a literary or artistic
      work, the individual, individuals, entity or entities who
      created the Work or if no individual or entity can be
      identified, the publisher; and in addition (i) in the case of a
      performance the actors, singers, musicians, dancers, and other
      persons who act, sing, deliver, declaim, play in, interpret or
      otherwise perform literary or artistic works or expressions of
      folklore; (ii) in the case of a phonogram the producer being the
      person or legal entity who first fixes the sounds of a
      performance or other sounds; and, (iii) in the case of
      broadcasts, the organization that transmits the broadcast.

   h. "Work" means the literary and/or artistic work offered under the
      terms of this License including without limitation any
      production in the literary, scientific and artistic domain,
      whatever may be the mode or form of its expression including
      digital form, such as a book, pamphlet and other writing; a
      lecture, address, sermon or other work of the same nature; a
      dramatic or dramatico-musical work; a choreographic work or
      entertainment in dumb show; a musical composition with or
      without words; a cinematographic work to which are assimilated
      works expressed by a process analogous to cinematography; a work
      of drawing, painting, architecture, sculpture, engraving or
      lithography; a photographic work to which are assimilated works
      expressed by a process analogous to photography; a work of
      applied art; an illustration, map, plan, sketch or
      three-dimensional work relative to geography, topography,
      architecture or science; a performance; a broadcast; a
      phonogram; a compilation of data to the extent it is protected
      as a copyrightable work; or a work performed by a variety or
      circus performer to the extent it is not otherwise considered a
      literary or artistic work.

   i. "You" means an individual or entity exercising rights under this
      License who has not previously violated the terms of this
      License with respect to the Work, or who has received express
      permission from the Licensor to exercise rights under this
      License despite a previous violation.

   j. "Publicly Perform" means to perform public recitations of the
      Work and to communicate to the public those public recitations,
      by any means or process, including by wire or wireless means or
      public digital performances; to make available to the public
      Works in such a way that members of the public may access these
      Works from a place and at a place individually chosen by them;
      to perform the Work to the public by any means or process and
      the communication to the public of the performances of the Work,
      including by public digital performance; to broadcast and
      rebroadcast the Work by any means including signs, sounds or
      images.

   k. "Reproduce" means to make copies of the Work by any means
      including without limitation by sound or visual recordings and
      the right of fixation and reproducing fixations of the Work,
      including storage of a protected performance or phonogram in
      digital form or other electronic medium.

2. Fair Dealing Rights.

   Nothing in this License is intended to reduce, limit, or restrict
   any uses free from copyright or rights arising from limitations or
   exceptions that are provided for in connection with the copyright
   protection under copyright law or other applicable laws.

3. License Grant.

   Subject to the terms and conditions of this License, Licensor
   hereby grants You a worldwide, royalty-free, non-exclusive,
   perpetual (for the duration of the applicable copyright) license to
   exercise the rights in the Work as stated below:

   a. to Reproduce the Work, to incorporate the Work into one or more
      Collections, and to Reproduce the Work as incorporated in the
      Collections;

   b. to create and Reproduce Adaptations provided that any such
      Adaptation, including any translation in any medium, takes
      reasonable steps to clearly label, demarcate or otherwise
      identify that changes were made to the original Work. For
      example, a translation could be marked "The original work was
      translated from English to Spanish," or a modification could
      indicate "The original work has been modified.";

   c. to Distribute and Publicly Perform the Work including as
      incorporated in Collections; and,

   d. to Distribute and Publicly Perform Adaptations.

   e. For the avoidance of doubt:

         i. Non-waivable Compulsory License Schemes. In those
            jurisdictions in which the right to collect royalties
            through any statutory or compulsory licensing scheme
            cannot be waived, the Licensor reserves the exclusive
            right to collect such royalties for any exercise by You of
            the rights granted under this License;

        ii. Waivable Compulsory License Schemes. In those
            jurisdictions in which the right to collect royalties
            through any statutory or compulsory licensing scheme can
            be waived, the Licensor waives the exclusive right to
            collect such royalties for any exercise by You of the
            rights granted under this License; and,

       iii. Voluntary License Schemes. The Licensor waives the right
            to collect royalties, whether individually or, in the
            event that the Licensor is a member of a collecting
            society that administers voluntary licensing schemes, via
            that society, from any exercise by You of the rights
            granted under this License.

   The above rights may be exercised in all media and formats whether
   now known or hereafter devised. The above rights include the right
   to make such modifications as are technically necessary to exercise
   the rights in other media and formats. Subject to Section 8(f), all
   rights not expressly granted by Licensor are hereby reserved.

4. Restrictions. 

   The license granted in Section 3 above is expressly made subject to
   and limited by the following restrictions:

   a. You may Distribute or Publicly Perform the Work only under the
      terms of this License. You must include a copy of, or the
      Uniform Resource Identifier (URI) for, this License with every
      copy of the Work You Distribute or Publicly Perform. You may not
      offer or impose any terms on the Work that restrict the terms of
      this License or the ability of the recipient of the Work to
      exercise the rights granted to that recipient under the terms of
      the License. You may not sublicense the Work. You must keep
      intact all notices that refer to this License and to the
      disclaimer of warranties with every copy of the Work You
      Distribute or Publicly Perform. When You Distribute or Publicly
      Perform the Work, You may not impose any effective technological
      measures on the Work that restrict the ability of a recipient of
      the Work from You to exercise the rights granted to that
      recipient under the terms of the License. This Section 4(a)
      applies to the Work as incorporated in a Collection, but this
      does not require the Collection apart from the Work itself to be
      made subject to the terms of this License. If You create a
      Collection, upon notice from any Licensor You must, to the
      extent practicable, remove from the Collection any credit as
      required by Section 4(c), as requested. If You create an
      Adaptation, upon notice from any Licensor You must, to the
      extent practicable, remove from the Adaptation any credit as
      required by Section 4(c), as requested.

   b. You may Distribute or Publicly Perform an Adaptation only under
      the terms of: (i) this License; (ii) a later version of this
      License with the same License Elements as this License; (iii) a
      Creative Commons jurisdiction license (either this or a later
      license version) that contains the same License Elements as this
      License (e.g., Attribution-ShareAlike 3.0 US)); (iv) a Creative
      Commons Compatible License. If you license the Adaptation under
      one of the licenses mentioned in (iv), you must comply with the
      terms of that license. If you license the Adaptation under the
      terms of any of the licenses mentioned in (i), (ii) or (iii)
      (the "Applicable License"), you must comply with the terms of
      the Applicable License generally and the following provisions:
      (I) You must include a copy of, or the URI for, the Applicable
      License with every copy of each Adaptation You Distribute or
      Publicly Perform; (II) You may not offer or impose any terms on
      the Adaptation that restrict the terms of the Applicable License
      or the ability of the recipient of the Adaptation to exercise
      the rights granted to that recipient under the terms of the
      Applicable License; (III) You must keep intact all notices that
      refer to the Applicable License and to the disclaimer of
      warranties with every copy of the Work as included in the
      Adaptation You Distribute or Publicly Perform; (IV) when You
      Distribute or Publicly Perform the Adaptation, You may not
      impose any effective technological measures on the Adaptation
      that restrict the ability of a recipient of the Adaptation from
      You to exercise the rights granted to that recipient under the
      terms of the Applicable License. This Section 4(b) applies to
      the Adaptation as incorporated in a Collection, but this does
      not require the Collection apart from the Adaptation itself to
      be made subject to the terms of the Applicable License.

   c. If You Distribute, or Publicly Perform the Work or any
      Adaptations or Collections, You must, unless a request has been
      made pursuant to Section 4(a), keep intact all copyright notices
      for the Work and provide, reasonable to the medium or means You
      are utilizing: (i) the name of the Original Author (or
      pseudonym, if applicable) if supplied, and/or if the Original
      Author and/or Licensor designate another party or parties (e.g.,
      a sponsor institute, publishing entity, journal) for attribution
      ("Attribution Parties") in Licensor's copyright notice, terms of
      service or by other reasonable means, the name of such party or
      parties; (ii) the title of the Work if supplied; (iii) to the
      extent reasonably practicable, the URI, if any, that Licensor
      specifies to be associated with the Work, unless such URI does
      not refer to the copyright notice or licensing information for
      the Work; and (iv) , consistent with Ssection 3(b), in the case
      of an Adaptation, a credit identifying the use of the Work in
      the Adaptation (e.g., "French translation of the Work by
      Original Author," or "Screenplay based on original Work by
      Original Author"). The credit required by this Section 4(c) may
      be implemented in any reasonable manner; provided, however, that
      in the case of a Adaptation or Collection, at a minimum such
      credit will appear, if a credit for all contributing authors of
      the Adaptation or Collection appears, then as part of these
      credits and in a manner at least as prominent as the credits for
      the other contributing authors. For the avoidance of doubt, You
      may only use the credit required by this Section for the purpose
      of attribution in the manner set out above and, by exercising
      Your rights under this License, You may not implicitly or
      explicitly assert or imply any connection with, sponsorship or
      endorsement by the Original Author, Licensor and/or Attribution
      Parties, as appropriate, of You or Your use of the Work, without
      the separate, express prior written permission of the Original
      Author, Licensor and/or Attribution Parties.

   d. Except as otherwise agreed in writing by the Licensor or as may
      be otherwise permitted by applicable law, if You Reproduce,
      Distribute or Publicly Perform the Work either by itself or as
      part of any Adaptations or Collections, You must not distort,
      mutilate, modify or take other derogatory action in relation to
      the Work which would be prejudicial to the Original Author's
      honor or reputation. Licensor agrees that in those jurisdictions
      (e.g. Japan), in which any exercise of the right granted in
      Section 3(b) of this License (the right to make Adaptations)
      would be deemed to be a distortion, mutilation, modification or
      other derogatory action prejudicial to the Original Author's
      honor and reputation, the Licensor will waive or not assert, as
      appropriate, this Section, to the fullest extent permitted by
      the applicable national law, to enable You to reasonably
      exercise Your right under Section 3(b) of this License (right to
      make Adaptations) but not otherwise.

5. Representations, Warranties and Disclaimer

UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING,
LICENSOR OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR
WARRANTIES OF ANY KIND CONCERNING THE WORK, EXPRESS, IMPLIED,
STATUTORY OR OTHERWISE, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF
TITLE, MERCHANTIBILITY, FITNESS FOR A PARTICULAR PURPOSE,
NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY,
OR THE PRESENCE OF ABSENCE OF ERRORS, WHETHER OR NOT
DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED
WARRANTIES, SO SUCH EXCLUSION MAY NOT APPLY TO YOU.

6. Limitation on Liability.

EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE LAW, IN NO EVENT WILL
LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY FOR ANY SPECIAL,
INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY DAMAGES ARISING OUT
OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF LICENSOR HAS BEEN
ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

7. Termination

   a. This License and the rights granted hereunder will terminate
      automatically upon any breach by You of the terms of this
      License. Individuals or entities who have received Adaptations
      or Collections from You under this License, however, will not
      have their licenses terminated provided such individuals or
      entities remain in full compliance with those licenses. Sections
      1, 2, 5, 6, 7, and 8 will survive any termination of this
      License.

   b. Subject to the above terms and conditions, the license granted
      here is perpetual (for the duration of the applicable copyright
      in the Work). Notwithstanding the above, Licensor reserves the
      right to release the Work under different license terms or to
      stop distributing the Work at any time; provided, however that
      any such election will not serve to withdraw this License (or
      any other license that has been, or is required to be, granted
      under the terms of this License), and this License will continue
      in full force and effect unless terminated as stated above.

8. Miscellaneous

   a. Each time You Distribute or Publicly Perform the Work or a
      Collection, the Licensor offers to the recipient a license to
      the Work on the same terms and conditions as the license granted
      to You under this License.

   b. Each time You Distribute or Publicly Perform an Adaptation,
      Licensor offers to the recipient a license to the original Work
      on the same terms and conditions as the license granted to You
      under this License.

   c. If any provision of this License is invalid or unenforceable
      under applicable law, it shall not affect the validity or
      enforceability of the remainder of the terms of this License,
      and without further action by the parties to this agreement,
      such provision shall be reformed to the minimum extent necessary
      to make such provision valid and enforceable.

   d. No term or provision of this License shall be deemed waived and
      no breach consented to unless such waiver or consent shall be in
      writing and signed by the party to be charged with such waiver
      or consent.

   e. This License constitutes the entire agreement between the
      parties with respect to the Work licensed here. There are no
      understandings, agreements or representations with respect to
      the Work not specified here. Licensor shall not be bound by any
      additional provisions that may appear in any communication from
      You. This License may not be modified without the mutual written
      agreement of the Licensor and You.

   f. The rights granted under, and the subject matter referenced, in
      this License were drafted utilizing the terminology of the Berne
      Convention for the Protection of Literary and Artistic Works (as
      amended on September 28, 1979), the Rome Convention of 1961, the
      WIPO Copyright Treaty of 1996, the WIPO Performances and
      Phonograms Treaty of 1996 and the Universal Copyright Convention
      (as revised on July 24, 1971). These rights and subject matter
      take effect in the relevant jurisdiction in which the License
      terms are sought to be enforced according to the corresponding
      provisions of the implementation of those treaty provisions in
      the applicable national law. If the standard suite of rights
      granted under applicable copyright law includes additional
      rights not granted under this License, such additional rights
      are deemed to be included in the License; this License is not
      intended to restrict the license of any rights under applicable
      law.

Creative Commons Notice

Creative Commons is not a party to this License, and makes no warranty
whatsoever in connection with the Work. Creative Commons will not be
liable to You or any party on any legal theory for any damages
whatsoever, including without limitation any general, special,
incidental or consequential damages arising in connection to this
license. Notwithstanding the foregoing two (2) sentences, if Creative
Commons has expressly identified itself as the Licensor hereunder, it
shall have all rights and obligations of Licensor.

Except for the limited purpose of indicating to the public that the
Work is licensed under the CCPL, Creative Commons does not authorize
the use by either party of the trademark "Creative Commons" or any
related trademark or logo of Creative Commons without the prior
written consent of Creative Commons. Any permitted use will be in
compliance with Creative Commons' then-current trademark usage
guidelines, as may be published on its website or otherwise made
available upon request from time to time. For the avoidance of doubt,
this trademark restriction does not form part of the License.

Creative Commons may be contacted at http://creativecommons.org/.