You may have heard the argument, or seen the poster, in software development organizations: Reuse the Code, Do Not Re-invent the Wheel. Using off-the-shelf code to accelerate software development and reduce costs is nothing new. If it is available, and does the job, then use it. Open source software is probably the ultimate manifestation of code reuse, widely recognized in software organizations. Without open source, many of the technology phenomena of the last 15 years, from social networking to web applications to mobile communications and more, would not be with us in their current form.
With the accelerated use of third-party software comes the task of managing the list of components of a software project (the Bill of Materials, or BOM). Tracking third-party and open source components in a software project helps manage the quality and security aspects of the project. It also ensures compliance with the terms specified in the license.
A Real-Life Story of Compliance
Our software assessment team regularly carries out software audits for companies. Most of the audits are focused on ensuring open source software license obligations are in line with the business model and that those obligations are met. In order to carry out IP due diligence on a software project that contains anything more than tens of files, code portfolios are scanned using automated code-scanning tools. (Most projects run between five and 100,000 files.) The automated scan results are then reviewed manually to confirm or to fill in the missing information.
Confirming those detected open source projects with licenses is quick and easy. Confirming those pieces of code that are proprietary and that sometimes may have clear headers is also painless. Normally the same authors (developers) show up in the header, or a quick conversation with the engineering group resolves the identity of the code.
By far the most time-consuming aspect of an audit project is the public domain code that has clear copyright ownership (with a copyright statement in the file, or a match to the open source code that is held in an open source reference database). In most jurisdictions, you do not need to put a copyright statement on what you write: If you created it, you own it. Nobody else can use your code unless you explicitly give them permission.
And here is where the trouble starts.
When we identify such code (and we almost always find unlicensed public domain code in a portfolio), we are putting our clients in a position to track down the author of the code and get their permission in writing. Assuming that the author or copyright owners of the original code can be tracked down (a big assumption), the process becomes a patchwork of detective work intermixed with licensing and corporate decision-making. We have had cases where the offending code had to be pulled out of the portfolio and possibly replaced with proprietary software. When you are trying to ship your product, or are involved in an M&A where IP due diligence is one of the last activities, the delay caused by using unlicensed code can be very expensive.
So it was with interest that we came across this article by Simon Phipps, well known for his activities in the open source arena and his experience with open source licenses. Basically his argument revolved around the fact that most code in GitHub does not have a specific license. Moreover, there is a movement that believes “software licenses are outdated” and encourages code forking without considering the original end-result licensing aspects. Although GitHub is singled out here, the behavior is not unique to GitHub. Sourceforge has a good number of project pages with no license listing or just a mention of an “approved OSI License” against the project. Although, in all fairness, and according to our own Global IP Signatures database, GitHub is probably the biggest source of unlicensed projects.
It is unreasonable to expect repository administrators, like GitHub, to enforce license requirements on anyone that posts or stores code on their forge. Unlicensed code will appear on other sites, if not on GitHub. Instead, developers need to be made aware that public domain code has little chance of adoption if it doesn’t carry explicit permission to others to post it as well. The explicit permission, called a license, need not be complicated, and certainly doesn’t have to be invented from scratch. The Open Source Initiative has a collection of open source licenses, categorized and updated as needed.
Put a License On It
If you are creating code that you want the public to use without any hindrance, then simply say so, in the file header (embedded license) or in a simple text file in the project folder (and if you call the file COPYING or License.txt you will make the job easier for the licensing party to find it). If all you want is the public to reference you as the original author, then try either the MIT or BSD license. If you want a more conclusive license, for example covering indemnification, patents, etc., try the Apache 2.0 license. Or go all out, make your code available and have the users contribute back to the community by using the latest Gnu General Public License version.
Whatever you do, make sure you have a disclaimer of any warrantees for use for any specific purpose or market. Many litigation-happy jurisdictions automatically assign responsibility to the creator of the code unless the responsibility is explicitly disclaimed.
If you are using open source code and mixing it with your own proprietary code for your projects, wonderful! But make the task easier for yourselves and your licensing team, and avoid headaches down the road when you are shipping your product to a client or find yourself involved in an acquisition. Only use open source projects or components that are aligned with your organization’s open source policy.
Part of this alignment means that licenses for the open source code that you are using do exist and are approved. Also, always, always put a standard header on top of your source files, one that contains your copyright statement, a date and, if applicable, license information. If you are using part of someone else’s code in your program, say so in the source file, and copy their license information in your source file too. Automated solutions these days can detect public-domain snippets as short as five lines of code within a file. The license associated with that snippet of code, or the absence of a license for that snippet, can cause problems later down the road.
Mahshad Koohgoli is CEO of Protecode, an innovative provider of open source license management solutions. He has more than 25 years of experience in the high-tech industry, specializing in technology start-up businesses, and holds several patents in the computer and communications field. For more information, visit protecode.com or follow @Protecode on Twitter.