The quality of cryptocurrency market data is critical for academic research and financial applications, and yet, the industry faces a number of challenges, including pervasive mislabeling, measurements errors and discrepancies in reported market metrics.
A new academic research by Gustavo Schwenkler of the Santa Clara University, alongside Aakash Shah and Darren Yang of Indicia Labs, highlights these persistent issues, emphasizing the difficulty of relying on a single provider for all crypto market data needs.
Data inconsistencies across providers
The research reviewed 20 of the most common crypto market data providers, analyzing their crypto data from January 2022 to October 2024. An in-depth analysis of a subset of eight providers, namely CoinCap, CoinGecko, CoinMarketCap, Coinpaprika, CryptoCompare, Live Coin Watch, Nomics, and Santiment, between November 2018 and October 2024 revealed pervasive quality issues, including repeated mislabelling of cryptocurrencies within and across providers, as well as substantial measurement errors.
The research uncovered substantial inconsistencies with how cryptocurrencies are identified. In particular, it found that as much as 21% of all coins in a provider’s dataset could undergo ID changes without disclosure. Furthermore, some providers kept the same ID for a coin even after a fork or a swap that rendered a new coin, effectively misrepresenting the asset’s identity.
Additionally, certain provider used the same ID for distinct cryptocurrencies, a problem found in 16% of the sample from CoinGecko.
Discrepancies in market metrics
The research also uncovered significant inconsistencies in the reported data across providers. In particular, the daily close price for a cryptocurrency can vary significantly from one provider to another, sometimes by extreme amounts. These inconsistencies arise because providers collect and aggregate data from public platforms. Because they have discretion over how they source data, inconsistencies are inevitable.
Beyond price discrepancies, trading volume metrics can also exhibit strong variations. This is especially pervasive for reported volumes where in almost 70% of the daily instances in the six-year sample, the daily aggregate volume for a coin reported by a provider deviated by more than 5% from the median volume reported across providers.
This issue is particularly prominent for large coins that are listed on many exchanges. It’s exacerbated by the practice of wash trading, where exchanges artificially inflate their reported trading volumes to appear more liquid.
Growing demand for high-quality crypto data
In traditional capital markets, data is standardized across large vendors or by regulatory bodies. However, the crypto industry lacks such standardization, with providers differing in data definitions, delivery technologies, and reporting methodologies.
As digital assets gain traction among institutional investors, the demand for robust and reliable market data infrastructure has intensified as investors seek high-quality data for informed trading strategies and regulatory compliance.
This has fueled the growth of crypto data providers offering cleansed and normalized data from multiple independent sources, despite the absence of agreed data quality standards or auditing. These sources include blockchains themselves, but also centralized exchanges, decentralized finance (DeFi), and derivatives markets. Beyond consolidating and normalizing data, some vendors also deliver derived metrics, signals, and indicators, empowering clients with actionable information.
The booming crypto data industry
Crypto data is an emerging sub-industry that’s playing a significant role in the broader digital asset ecosystem. Explored in a new report by Financial Technology (FT) Partners, a fintech-focused investment bank, this ecosystem encompasses five main verticals:
- Centralized exchange data providers, such as CoinGecko, CoinMarketCap and The Block, which aggregate and provide market data, price tracking and analytics for cryptocurrencies traded on centralized exchanges;
- On-chain and DeFi data providers like DeepDAO, Kaiko, and Dune Analytics, which collect, analyze and offer insights into blockchain transactions, DeFi activities and smart contract integrations;
- Transaction surveillance and foreign analysis companies, such as Chainalysis, Elliptic, and TRM, which monitor blockchain activity to detect illicit transactions and track stolen funds;
- Know-your-customer (KYC) and anti-money laundering (AML) monitoring providers, such as iComply, Sumsub, and Coinfirm, which offer identity verification and compliance solutions to prevent fraud and money laundering in crypto transactions; and
- Tax reporting and compliance companies, such as Taxbit, CoinTracking and Lukka, which provide tools and services that help individuals and businesses calculate crypto-related taxes, generate reports and comply with regulatory requirements.

This sector has drawn significant investor interest, with major funding rounds such as Chainalysis’ US$170 million Series F, Lukka’s US$110 million Series E, and Kaiko’s US$53 million Series B.
Mergers and acquisitions (M&A) activity is also surging, with prominent players such as Chainalysis, Lukka and Amberdata acquiring smaller startups to enhance technological capabilities and expand their product offerings.
Chainalysis has acquired startups such as Excygent, a cybercrime investigation specialist; Transpose, a blockchain data and infrastructure company; and Alterya, an AI-powered fraud detection solution provider. Lukka has acquired Blox Finance, a crypto accounting and financial data management software business; Venato, a Web3 blockchain analytics startup; and Coinfirm, a top-tier European based blockchain analytics software company. Meanwhile, Kaiko has acquired Kesitys, a data analytics company; and Vinter, a leading European crypto index provider.
Featured image credit: edited from freepik