The potential of data and data sharing has now been widely recognised as a principal driver in efforts for public benefit. At its core, data is merely information – and thus something that has always informed policy, governance and corporate decision-making at every rung of their institutions. It is the swell and omnipresence of computational tools, machine learning and artificial intelligence that has harkened a new and global focus on data, particularly given the massive digital economy that it now sustains. In this context, it is urgent to consider how the data economy can be moulded both to counter its current inequities, and forge channels for broader societal value. To realise this, notions of collective governance have found voice in ‘data stewardship’ – where stewards act as trusted intermediaries that focalise agency and value creation for data subjects. In practice, however, it is clear that such intermediaries require cardinal legislative support in the jurisdictions they inhabit. Countries and policymakers are moving swiftly to regulate, govern and define the guiding principles for the protection, management and use of data; and it is necessary to marry the move for data stewardship with enabling policies. As various global approaches to data policy make themselves apparent, there is space to begin outlining the theoretical jurisprudence and enabling features that law-making and regulation must embody. Thus, this paper explores data stewardship in the particular context of its legislative needs.
A data steward can be defined as a trusted intermediary between data subjects or data generators, and data requestors. In this regard, a steward is envisioned as a body that acts in the interest of data subjects, and works to protect their rights over data, enable greater agency and transparency for subjects, negotiate with data requesters, and seek avenues for value creation from data that extend to benefit subjects or the communities they may belong to. The role of a steward is both rights preserving and value generative; founded in the fact that existing models of data collection, usage and sharing constitute marked power imbalances that largely favour private, profit-driven interests, and increasingly disempower data generators – consequently neglecting the potential societal value of data. Stepping beyond the paradigm of just protection, stewardship strives to empower and circularise value chains – not only for those who most crucially drive the data economy, but to use data as a leveller for pre-existing vulnerabilities in society. However, data is difficult to govern – its value is defined most critically by how it is used, this value is often dynamic over time, and different data types necessitate different rights, needs and management. Consequently, data stewardship cannot be seen uni-dimensionally – there are multiple structures that stewardship can embody.
Data cooperatives present a good example. Allowing for democratic voting on decision-making, data sharing and more, data cooperatives allow citizens to take an active and agential role in how their data is pooled, managed and shared. Health data stewards have been found to spur citizen-driven research, and in some cases, stewards like Open Humans have garnered proactive and tech-involved communities of patients that work to maintain and refine the steward itself. Cooperatives like Drivers’ Seat, which works to empower platform workers in the ride-sharing community, provide analytic services to workers so they may have greater visibility on the algorithms and data that determines their earnings. In the case of Drivers’ Seat, aggregated data is shared with local transport and governance agencies to enable informed policy decisions – and financial returns on this sharing are divided amongst the cooperative’s members. Data trusts, that have seen numerous pilots, envision a codification of fiduciary responsibility for trustees that manage data and data-related decisions for a group of beneficiaries.
Numerous other models of stewardship exist in implementation, optimised for various contexts – personal data stores, account aggregators, data collaboratives and more – presenting novel ways of governing data. Typically problem-led, each model can differ across metrics like governance, consent mechanisms, data flows, access provisions to third parties – leading to multiple permutations and combinations of design choices. We are now seeing stewardship evolve for various data types – environmental data, cultural data, agricultural data, etc.
Stewardship is complex for a number of reasons. By definition, it involves at least two parties, and often more than two. The interests of these parties may or may not always align, and breaches or violations often affect parties asymmetrically. The organisational, logistical and technical burden of managing data is high, and data usage can be incredibly dynamic. These challenges are compounded when there is a lack of regulatory or legislative clarity around fundamental data issues. Further, a lack of technical infrastructure or standardisation of sharing agreements and pathways makes these models arduous to develop and sustain.
The legal and policy-led needs of a steward can be distilled into a few fundamentals, informed by the functions of a steward. Given the two primary roles – to empower and create value – let us consider the necessary environment for an intermediary to carry out such roles.
Protection situated within rights
In order for stewards to adequately protect the individuals they serve, foundational data privacy and protection are pertinent. Crystallising consent frameworks and requirements at the stage of data collection are a regulatory necessity – this cannot be enabled by stewards alone. As stewards are expected to suitably provide data subjects with recourse mechanisms in the event of violations or harms arising from data sharing – judicial access and intervention on instances of violation form the basis for such protection. To enable a degree of autonomy over one’s data, this must go beyond the prevention of harm and include factors like data transparency, accessibility, findability and data portability. For example, a number of data stewardship pilots in the European Union have been enabled by such provisions within the GDPR – allowing individuals to download their data from various sources, and in some cases, to pool it with others doing the same. However, the outcomes of most evolving data regimes are still unfolding, and the upshots of more mature data rights remain unclear.
Alongside the individual, there is a need to conceptualise frameworks for community rights over data, and how communities may be protected from harms that arise at a collective level, or affect groups of people. To do this, policy moves must balance the rights of the individual with the rights of individuals as community members. This remains a nascent, albeit important, area of research. Discourse on community data rights is founded in the notion that communities may be able to govern a resource like data as a commons, particularly in the case of non-personal data such as environmental, aggregated, geospatial or other machine data.
Data sharing situated within value
The key to value generation from data lies in data sharing – and the problem is two pronged. On the one hand, vast quantities of existing datasets are held in silos – by corporations and by state actors, largely inaccessible to the wider public or to civil society actors. And beyond access, there is a need to build routes from data sharing to value creation. Both of these elements are intrinsic to the vision of stewardship, and can be significantly spurred by focussed policy initiatives.
This need is slowly being recognised by countries globally, as many move to regulate data sharing, and adopt various approaches to open up existing data for societal good. For example, the EU’s proposed Data Governance Act adopts voluntary modes of data sharing by building clear outlines on data use, sharing and re-use. Beyond this, voluntary data sharing structures rely on an altruistic basis where it is hoped that once provided with a trusted and streamlined data network, data holders will be willing to partake in sharing. While voluntary data sharing, as opposed to mandatory sharing, is often criticised for adopting a ‘soft’ approach to existing data silos – there is definite value in the ecosystem it strives to build. This is not limited to the dissolution of regulatory barriers and uncertainty to data sharing, but also the articulation of the role of intermediaries in data sharing and innovation. Regulatory clarity can be a powerful incentive in a space that has been striving for data liberation, but has been hastened by legal barriers, overlaps and risks.
Along with a focus on such clarity, a robust data sharing ecosystem requires efforts in the building blocks of technical infrastructure. In order for data sharing to be a responsible yet bodied and durable reality, policy pathways cannot ignore technical capacity builders like interoperability, standardisation, storage formats, and data exchange platforms. As seen in projects like GAIA-X or X-Road, nationwide efforts that enable different data systems to be able to communicate are an important supplement to regulatory measures for data sharing. This helps reduce the burden of aligning data formats, taxonomies and storage from sharing parties by creating standardised and facilitated data sharing. Often, digital public infrastructure serves as a base for data sharing initiatives to take effect, or enable broader innovation due to standardisation. In India, for example, the existence of a unified payments interface (UPI) has enabled swift innovation in fintech, with multiple players able to build upon the layer.
Sufficient, collaborative infrastructure and technical pathways allow for greater innovation and reduced risks to data sharing, and also act as a powerful incentive for data sharing. As we consider the optimal policy approaches to data sharing, these pathways can work to steadily ease burdens and build trust within the ecosystem, ostensibly minimising the need for stringent or alienating policy moves. Not only in order to enable data stewardship models, but to spur data sharing and innovation, an ecosystem-oriented approach can work to maximise benefit, trust and responsible innovation. While these approaches are recommended, they remain tested only in limited and early ways – this makes it crucial for most countries to adopt a flexible and iterative pace as they move toward layered data regimes. By prioritising incentives, collaboration and testing, risks can be minimised while also building pathways for value. In the case of data stewards as well, a staggered and collaboration-oriented strategy can allow practitioners to build upon use cases, refine design choices, and build towards value. There are certainly challenges to data stewardship that remain at the level of structure, internal governance or incentivisation. It is difficult to identify at this stage, the ideal pathways to maintain the neutrality of an intermediary while ensuring that stewards are sustainable in their revenue models. However, the right policy environment to discover and build on these aspects is crucial in allowing data stewards to become a reality. Without the crucial, actionable policy limbs that can embolden data stewardship, we face the risk of testing and sequestering stewardship on roads that may not be part of a larger network for value.