Compilation: The Way of DeFi, Kyle
Artificial intelligence (AI) is rapidly changing the way we live and work. Meanwhile, the challenges posed by AI data bias have come to the fore. As we move towards a Web3 future, we will naturally see innovative products, solutions and services that use both Web3 and AI. And, while some commentators believe that decentralized technology can solve the problem of data bias, this is not the case.
Image Credit: Generated by Maze AI
The Web3 market size is still relatively small and difficult to quantify because the Web3 ecosystem is still in its early stages of development and the exact definition of Web3 is still evolving. While the Web3 market size in 2021 is estimated to be close to $2 billion, various analysts and research firms report a projected compound annual growth rate (CAGR) of around 45%, coupled with rapid growth in Web3 solutions and consumer adoption , by 2030, the Web3 market will be worth around $80 billion.
While Web3 is growing rapidly, the current state of the industry combined with other tech industry factors is why AI data bias is going the wrong way.
The Link Between Data Bias, Quality, and Quantity
AI systems rely on large amounts of high-quality data to train their algorithms. OpenAI’s GPT-3 (including the ChatGPT model) is trained on a large amount of high-quality data. OpenAI did not disclose the exact amount of data used for training, but it is estimated to be in the order of hundreds of billions of words or more.
The data is filtered and preprocessed to ensure it is high quality and relevant for language generation tasks. OpenAI uses advanced machine learning (ML) techniques such as Transformer to train models on this large dataset, enabling it to learn patterns and relationships between words and phrases and generate high-quality text.
The quality of AI training data has a significant impact on the performance of ML models, and the size of the dataset is also a key factor in determining the model’s ability to generalize to new data and tasks. However, it is also true that both quality and quantity can have a significant impact on data bias.
Unique Risks of Data Bias
Data bias in AI is an important issue because it can lead to unfair, discriminatory, and harmful outcomes in areas such as employment, credit, housing, and criminal justice.
In 2018, Amazon was forced to scrap an AI recruiting tool that was shown to be biased against women. The tool was trained on resumes submitted to Amazon over a 10-year period that included mostly male candidates, causing the AI to reduce resumes that contained words like “female” and “woman.”
In 2019, researchers found that a commercial AI algorithm used to predict patient outcomes was biased against black patients. The algorithm was trained primarily on white patient data, resulting in a higher false positive rate for black patients.
The decentralized nature of Web3 solutions combined with AI presents a unique risk of bias. The quality and availability of data in this environment can be a challenge, making it difficult to accurately train AI algorithms, not only because of the lack of Web3 solutions in use, but also because of the lack of people capable of using them.
We can draw parallels from the genomic data collected by companies like 23andMe, which is biased against poor and marginalized communities. The cost, availability, and targeted marketing of DNA testing services such as 23andMe limit access to these services to individuals from low-income communities or living in areas where the service does not operate, which tend to be poorer, less developed countries.
As a result, the data collected by these companies may not accurately reflect the genomic diversity of the broader population, leading to potential bias in genetic research and the development of healthcare and medicine.
This brings us to another reason why Web3 increases bias in AI data.
Industry Bias and Ethical Concerns
The lack of diversity in the Web3 startup industry is a major problem. By 2022, women will hold 26.7 percent of tech jobs. Of those, 56 percent were women of color. The proportion of women in top executive positions in the technology industry is even lower.
In Web3, this imbalance is exacerbated. According to various analysts, less than 5% of Web3 startups have female founders. This lack of diversity means that AI data bias is likely to be unwittingly dismissed as a problem by male and white founders.
To overcome these challenges, the Web3 industry must prioritize diversity and inclusion within its data sources and teams. Additionally, the industry needs to change the story of why diversity, equity and inclusion are necessary.
From a financial and scalability standpoint, products and services designed from a different perspective are more likely to serve billions of customers than millions, making startups with diverse teams more likely to gain High returns and global scale capabilities. The Web3 industry must also focus on data quality and accuracy, ensuring that the data used to train AI algorithms is free of bias.
Can Web3 solve the AI data bias problem?
One solution to these challenges is the development of decentralized data marketplaces that allow secure and transparent exchange of data between individuals and organizations. This helps reduce the risk of data bias because it allows a wider range of data to be used when training AI algorithms. In addition, blockchain technology can be used to ensure the transparency and accuracy of data, so that the algorithm will not be biased.
Ultimately, however, we will face the significant challenge of finding widespread data sources for years before Web3 solutions are adopted by mainstream audiences.
While Web3 and blockchain continue to be in the mainstream news, such products and services are most likely to appeal to those in the start-up and tech communities – communities that we know lack diversity but have a relatively large share of the global market smaller.
It is difficult to estimate the percentage of the world’s population working in Web3 startups. In recent years, the industry has created approximately 3 million jobs in the United States. When you compare that figure to the total U.S. population — and don’t factor in lost jobs — the tech industry is far from representative of working-age citizens.
Obtain sufficient quantities of high-quality data to train AI before Web3 solutions become more mainstream and expand their appeal and use to those with an intrinsic interest in the technology and become affordable and accessible to a wider population The system remains a significant hurdle. The industry must now take steps to address this issue.