Press Clipping
Guest Column: Blockchain And Music Data's Byzantine Problem

This guest column on music and blockchain technology is by Annie Lin, general counsel for Loudr.

Nine generals are stationed at separate points around the walls of a massive kingdom. A majority of the generals wish to launch an attack, but in the absence of direct communication channels, must rely on their messengers to reach a consensus on whether and how to attack.

If one or more generals betrays the group and flees, or one or more messengers relays false information, what can be done to enable the remaining generals to proceed as a unanimous group with a successful attack?

Now imagine a group of nine Nashville country songwriters who collaborate on a Billboard-charting song. Each writer gets a fractional ownership in the resulting work, but not all of the shares are equal. A few of the writers own their song shares outright, while others are represented by publishers who administer the shares.

If one of the publishers insists that it is the sole owner of the entire song, and the other publishers can’t be confirmed, what can be done to determine who owns the song and who needs to sign off on a license deal?

(This actually once happened on a large scale when two Italian producers fraudulently claimed ownership of more than 200 songs, scoring a payday 250,000 Euros — and eventually some jail time as well.)

These are two simplified examples of the so-called Byzantine Generals problem, a mathematical conundrum that is addressed by blockchain technology.

A major selling point of blockchain technology has been its promise of information systems that can remain operational, even despite missing links, communication misfires, and data failures. This is achieved through “blocks” of data which are time and date-stamped, linked to other blocks, and shared across a network of peers, each of whom also shares the responsibility of data maintenance.

No Clean Slate

While blockchain has emerged as a hot topic in the digital music space, the technical and operational aspects of adoption have been largely absent from the conversation. Most would agree that the music industry has an extensively documented metadata problem, but relatively few people realize that publishing metadata and metadata standard already exist today.

Likewise, beyond a raft of data wranglers on the operations teams of various publishing companies, relatively few are familiar with what publishing metadata looks like today, let alone what it might look like as part of the blockchain.

If blockchain offers a smarter approach to data management, the reality is that re-encoding the entire history of published music raises a host of issues and requires resources. There is no clean slate, only partial or missing data, competing standards, and the kind of technical debt incurred by a failure to account for transfer of rights over a passage of time.

In addition, the landscape of expertise is uneven on the rights holder side: some publishers are highly sophisticated entities with dedicated technology teams, while others are creators or resource-limited content purveyors who may not understand the music rights they hold.

Music Data Today

North American publishers may use a variety of proprietary and non-proprietary data standards, but Common Works Registration (CWR) is a commonly adopted music publishing data standard. For the purposes of comparing today’s data to tomorrow’s blockchain, let’s take a closer look at what a CWR file contains.

Rights administration companies are responsible for matching tracks to songs and paying out royalties on behalf of clients, and to facilitate this, usually receive metadata deliveries from publishers.

If you work at a rights administration company like Loudr like I do, and your company receives regular data deliveries from music publishers, much of that data will come in as unencrypted UTF-8 text files and assigned file names that look like this: “CWXXXXXSA.v21”. You can use a text editor like Sublime Text [link] to open one of the files and see a string of data that looks like this excerpt:

It is possible to obtain a user manual that explains how information about songwriters, publishers, and ownership shares is encoded in CWR format. However, publishers generally use software (like Counterpoint) to encode song data into CWR files.

In turn, rights administrators use software to read and extract actionable rights data from those files — that is, rights information that can support licensing and other commercial activity. For each song ingested as a CWR file, Loudr also creates a variety of abbreviated, human-readable summaries, one of which is excerpted here:

As you can see, this file contains data about the song entitled “Around The World,” which is linked to a long list of songwriters including Christina Aguilera. The song is also linked to a number of catalogs, each of which is linked to publishers or parent administrators, and each of which may claim one or more strands in a bundle of music rights.

The information is complex, as is the case for any song with traction within the market. Commercial activity increases the likelihood that assets will bought, sold, licensed, and/or split up, and this in turn increases the number of parties with an interest in the rights.

Rights claims are divided into two main categories, ownership and collection, to allow for the possibility that the owner of a song might not be the same party that you pay for a license. For a single song, a publisher may administer one of multiple combinations of rights claims in one or multiple territories.

Because of this, each claim is also broken into categories including mechanical collection, public performance, and synchronisation, reflecting the reality that the licensor for a film use might be very different from the licensor for a cover song.

Each claim is also associated with a string of territories and a percentage of the right claimed. This allows for documentation of the ownership or administration of fractional shares, which are common in music publishing.

While CWR does capture most of the important music rights data points and has seen somewhat widespread adoption, it is still an imperfect standard. A common complaint about CWR is that the files cannot be easily read or understood without the use of software, much or all of which is proprietary.

Another common complaint is that data documenting the transactions that may occur during the life of a song copyright, such as ownership transfers, need to come in a separate format and are not always disseminated by rights holders.

Some also note that the complexity of the standard makes it challenging to validate data and/or pinpoint invalid data, to the extent where rights holders may not be entirely aware of all the required fields and data. Moreover, the standard lacks the kind of built-in development and evolution driven by an open source community.

Music Data as a Block in the Chain

There are a number of companies developing blockchain technologies for the music rights space, but let’s focus on one for the purpose of evaluating the same data in a different format. Mediachain Labs, recently acquired by Spotify, provides information about its data standards on Github and on the company website.

While Mediachain does not appear to offer examples of what blockchain music rights data might look like, it is possible to take a sample art data schema and imagine what a song would look like:

The above is an abbreviated mockup that does not reflect the complete ownership picture, let alone the kind of engineering thought that should go into structuring the data. However, even as a hypothetical, it raises a variety of practical questions, such as how entities can be consistently linked to each other when each may be associated with multiple names, may take on a host of spelling variations, or may be associated with a host of identification codes, not all of which are present, accurate, or unique.

It may be difficult to consistently capture that information when it is created by users of varying sophistication across the network. Mediachain allows its users to create upserts — that is, data insertions or updates which do not require knowledge of the current state of the database — but reliance on data which is unordered, unvalidated, and potentially incorrect may lead to liability in a commercial setting.

In theory, blockchain would allow for the expression of rich relationships between objects. In practice, music rights are inherently complex and the data will be a reflection of that complexity, regardless of the format or standard.

To fully accommodate the needs of music rights users, blockchain would need to capture all of the nuances of rights, relationships, and transactions over time, all of the information currently conveyed in CWR files. While blockchain may provide a better means of conveying that information in the long run, the end result will be far from a simple solution or a panacea.

Beyond the logistical challenges, there are larger business and legal questions. Complete transparency may not be attractive to a publisher who has made investments in a catalog and would not want to widely circulate sensitive contractual information, such as the end date of the deal.

Some blockchain providers acknowledge the necessity of involving rights societies as data arbiters to handle conflict resolution and other administrative functions, but music data validation at scale is a large task that calls for natural language processing and other advanced technologies.

A Resource Question

Who will be responsible for the cost of implementing blockchain? While the blockchain conversation is promising, this arguably is the most important question, and it remains answered in an industry known for its razor-thin margins.

It’s hard to imagine that every indie artist or publisher has the knowledge or resources to boot up Terminal and type in complex data strings, so it will be necessary to build a user layer that provides seamless interaction with blockchain. Someone will need to fund the tools to validate and administer blockchain data.

Also, it’s hard to imagine that rights societies and publishers would suddenly abandon existing data systems and dedicate all their resources to migration. It will be necessary to incentivise market players to invest resources in adopting this technology.

Music data standards may seem very far removed from the financial realities of artists trying to scrape together a living, but they have long term implications for songwriters.

Streaming topped all forms of US music consumption in 2016, to the tune of 1.2 billion songs per day. At that scale, music services and administrators have to rely on music metadata to determine who should get royalties and how much should be paid.

If music data is the pipeline that delivers songwriter and publisher royalties, it’s important not only to plug the leaks, but to lay down a sustainable infrastructure of pipes as an investment in the future of music.

Special thanks to Loudr engineer Adrian Moses and licensing director Jesse Buddington for insight and guidance on CWR and music metadata generally