Data quality concerns continue to plague artificial intelligence. But blockchain-based incentive … [+] mechanisms could significantly change that. (Photo by Michel Porro/Getty Images)
The Covid-19 outbreak has overwhelmed health systems around the world. At a point, bed spaces and ventilators for patients as well as protective gear for health workers were not enough to go around. This meant that health systems, especially in developed countries, had to employ certain technologies to allocate resources efficiently. AI is one of those and its importance in the fight against coronavirus continues to grow.
An app, developed by researchers at New York University, which uses AI and big data to predicts the severity of Covid-19 cases is a good example of how the technology helps in resource allocation, at least in theory. The researchers used patient data from 160 hospitals in Wuhan, China to identify four biomarkers that were significantly higher in patients who died of the virus versus those who recovered. Based on the data fed into the AI model, the app assigns a severity score for patients, which a clinician can use to make informed care and resource allocation decisions.
Despite the positive impact that AI could bring to the coronavirus battlefield, the flaws in the underlying data being employed could deepen the inequities that already exist across gender and racial groups, wrote Genevieve Smith and Ishita Rustagi, both of the Center for Equity, Gender and Leadership at the UC Berkeley Haas School of Business, in an article published in the Stanford Social Innovation Review.
Interestingly, these data reliability concerns aren’t native to the coronavirus era. In fact, AI, along with its subsets of machine learning and deep learning, just to name a few, is plagued by the data bias and data quality conundrum.
The main discussion here is about how blockchain could help in tackling these data reliability concerns. But it’d be valuable to first understand the source of data bias.
How Data Bias Crawls Into Artificial Intelligence
Data bias can creep into AI set up at different stages — including the problem framing, data collections and data preparation stages. The business goal a company is looking to reach will be fundamental to the framing of the problem. The goal, in itself, could be discriminatory or unfair.
Also, during the data collection stage, bias could slip in by collecting data that’s either unrepresentative of the reality or reflective of existing prejudices. If, for instance, you feed a deep learning model with more photos of a specific skin color over another, the subsequent facial recognition system will fare better at identifying the skin color predominant in the training data.
With regards to collecting data that reflect existing prejudices, Amazon
reportedly ditched an AI-based recruitment system after finding that it was biased against women. Bringing it back to healthcare, an algorithm used by many U.S. hospitals to forecast risk and subsequently allocate resources favored white patients over blacks for the same disease affliction, a group of researchers found in 2019.
Two Blockchain-Based Approaches for Improving Data Quality
The deeper you dig, the more of these biases you’ll find. There isn’t a single solution to these issues given their complexity. One thing that experts agree on, though, is the need for data diversity. To achieve this data diversity, improved transparency of data as well as robust collaboration could improve the situation. Here enters blockchain technology. By design, the technology works only through collaboration between several parties to maintain the network. This could bring transparency, decentralization and verifiability to machine learning models and the data they’re fed.
Incentivizing The Contribution of Quality Training Data
Last year, Microsoft
introduced an initiative called Decentralized & Collaborative AI on Blockchain. The goal is to leverage public blockchains, ethereum in this case, for collaborative and continuous model training and maintenance. A key component of this is developing a mechanism that incentivizes participants to contribute “good data,” according to Justin Harris, a senior software developer at Microsoft, who works on this initiative.
In this system, participants must commit a certain amount up front to the smart contract to contribute their data for the training. If the system determines that the data is good — i.e., meets certain requirements, they get a refund. Contributing bad data would, therefore, result in the loss of the initial commitment, with the funds distrusted to contributors of good data.
However, the focus here isn’t on the economic incentive of contributing good data, but on the cost of bringing bad data.
Taking a facial recognition model, for instance, you could use a smart contract to require a variety of skin colors in the image data participants submit. As such, any dataset that doesn’t meet the requirement is deemed bad and hence the contributor is penalized. This is a simplified illustration. Things are likely to get more complex when dealing with diverse and complex data sets. Still, the intention here is to point out how a blockchain-based incentive system could help get improve data quality.
Streamr, a blockchain-based data marketplace is also developing a system that could potentially contribute to improving the quality and depth of data used in AI models. Through what it calls “data unions” Streamr wants to give internet users the ability to sell their data. As an example, the startup created a data union dubbed Swash, which used browsers extensions to collect user data, which is then bundled with the data of every user in the Swash network. The bundled data eventually becomes available for sale on its marketplace. Users retain the right to exclude any data that they’re not willing to share.
What’s interesting here is that a data union can be set up to generate a specific type of data — based on gender or race, for instance. Additionally, collecting data in unions makes them more usable and valuable as opposed to individual entities. Such a system could potentially bring data used in AI closer to reality in certain areas.
Using Blockchain to Open Access to Siloed Data
A few projects are also exploring the potential for blockchain-based federated learning, so to speak, in improving AI outcomes. Federated learning makes it possible for AI algorithms to amass experience from a wide range of siloed data. Instead of having the data moved to the computation venue, the computation happens at the data location. Federated learning allows data providers to retain control over their data. However, privacy risks lurk whenever federated learning is employed.
Blockchain is able to alleviate this risk thanks to its superior traceability and transparency. Also, a smart contract could be used to discourage malicious players by requiring a security deposit, which is only refundable if the algorithm doesn’t violate the network’s privacy standards.
Ocean Protocol and GNY are two projects exploring blockchain-based federated learning. Ocean recently launched a product, called Compute-to-Data, which allows data providers and data consumers to leverage blockchain to security buy and sell data. The Singapore-based startup already has some enterprise names including Roche Diagnostics, the diagnostic division of multinational healthcare company F. Hoffmann-La Roche AG using its services.
GNY, which plans to launch its mainnet later this year, recently demoed how researchers could set up a side chain on its blockchain and run a comparative analysis of daily Covid-19 mortalities across the U.S. cities privately. This is a slightly different approach in that it encourages data providers to put their data on-chain, eliminating the storied single-point-of-attack risk.
In its demo, GNY employed a support vector machine (SVM) algorithm on-chain to analyze Covid-19 related mortality data. The SVM model then forecasts which cities are likely to see increasing or decreasing infection rates. Such predictions could help cities and states tighten or loosen control measures ahead of time.
Will Data Providers and Consumers Turn to Blockchain?
The big question here is if the incentives that blockchain offers are significant enough for businesses and decision makers in AI to embrace blockchain. Over the last few years, blockchain has been touted to have the potential to disrupt everything. The reality hasn’t changed much. The possibility that blockchain could help alleviate problems or foster greater efficiency across industries hasn’t been enough incentive to abandon the status quo. Proponents of blockchain-based artificial intelligence would be looking to buck the trend.