Issue #3: Big Tech is killing independent research

The Monolith by Kalim
8 min readMay 8, 2024

--

Photo: Created using GenAI

More than a month ago Meta announced that it would be shutting down CrowdTangle, a widely used social media monitoring and transparency tool employed by academic researchers, journalists, and open-source investigators to uncover various developments, including the dissemination of hate speech, misinformation, and disinformation on platforms like Facebook and Instagram. This announcement is amidst a backdrop where over 60 nations, representing approximately 45% of the global populace, are conducting national elections, along with which significant global events are unfolding. These include the ongoing war against Ukraine by Russia, which has entered its third year, and the war in Gaza and Sudan, alongside a concerning trend of diminishing civil liberties worldwide, catalyzed by technological advancements like commercial spyware and facial recognition technology (FRT).

Social platforms constitute a crucial component of this complex landscape, wherein the convergence of politics, technology, and global events molds the fabric of contemporary society and governance. Yet, in recent years, major tech corporations have recognized that they can undermine research efforts without facing repercussions. For many of these tech companies, platform accountability and research initiatives have become burdensome, but they now see a solution to this perceived problem. Meta saw CrowdTangle as a headache and as a result, decided to cripple the tool by depriving it of resources, and eventually announcing that it would cease to function starting August 14, 2024.

Meta isn’t the sole player in the trend of dismantling independent research, it’s a growing phenomenon among tech companies. Since Elon Musk assumed control of X (previously Twitter), the platform has restricted API (application programming interface) access. Technically, it still exists, but the free tier version is severely limited, while the enterprise version can cost as much as 42,000 USD. In another blow to transparency and research, the platform also stopped sharing copies of takedown requests to Lumen, an initiative by the Berkman Klein Center for Internet and Society at Harvard University, it aggregates external requests, such as legal notices and government orders, sent to online platforms directing them to remove content. Similarly, Reddit discontinued its API access last year, driven by its desire to go public. Other platforms such as TikTok (banned in India and potentially facing a ban in the United States or would be forced to sell) and YouTube provide academic API access with some conditions that are antithetical to academic research principles. TikTok’s API Terms of Service (ToS) require that researchers refresh data every fifteen days, bars researchers from sharing data, and demands that researchers submit their papers to TikTok thirty days prior to publication. Similarly, YouTube’s researcher program requires regular refreshment of data (though there are provisions that you can stop refreshing when data must remain fixed for analysis), also bars researchers from sharing data, and requires that researchers provide YouTube with a copy of research in advance of publication. Additionally, as highlighted by numerous members of the academic research community, even before commencing the stages of data collection, analysis, publication, or replication, researchers must undergo the process of applying for developer access. During this process, they are subjected to screening, with no assurance of acceptance or continued access without the risk of revocation at any given moment. Essentially, leaving them at the mercy of the very tech companies that they wish to research.

This prevailing trend significantly hampers research efforts, forcing experts to resort to manual data collection, which is a laborious and time-consuming endeavor, or resort to using third-party services. This is an intentional decision by these tech companies, knowing that manual data collection imposes inherent limitations. Researchers are then compelled to draw conclusions from a restricted dataset, making their findings susceptible to dismissal by tech company executives who argue that the data is insufficient or that third-party services don’t have the insights needed to draw unbiased conclusions. This strategic move not only obstructs comprehensive research but also empowers these companies to challenge and disregard the outcomes based on the limited scope of collected data. This is not some hypothetical scenario, this is straight out of Elon Musk’s playbook. Moreover, if simply dismissing research doesn’t suffice, figures like Musk go to lengths to litigate against the institutions conducting such studies, even if it results in the tech companies losing the case, which they plan to appeal further. This not only squanders everyone’s time, including that of the court, but also buys them enough time to sow doubt within the public regarding the findings of the research.

Although Meta’s decision to permit researchers to conduct studies on the 2020 US elections through industry-academic partnerships is commendable, it is not devoid of flaws. It is also pertinent to note that Meta’s Ad Library has undoubtedly empowered various research into surrogate advertisement and disinformation campaigns thriving on the platform. The company provides a level of transparency that is relatively better than its contemporaries. Yet, these pieces of research don’t have a policy-level impact on the day-to-day functions of the company, therefore as a result the various influence operations campaigns continue to thrive in new packaging while fundamentally exploiting the same policy inadequacies. The broader picture also reveals a different narrative. Last year, a leading disinformation researcher accused Harvard University of forcing her out and terminating her work to protect the institution’s ties with Meta’s co-founder. While the university refutes these claims, reports by Semafor hint at the possibility that there is some validity to these accusations. Furthermore, there’s no guarantee that Meta or any other tech company will permit similar studies in 2024, and confining research solely to the US elections is in itself inadequate.

In its defense of shutting down CrowdTangle, the company said that it would replace the service with new research tools namely Meta Content Library and Content Library API, which will require researchers and nonprofits to apply for access to the company’s data, but it excludes journalists working in for-profit newsrooms from accessing this tool. This decision has also sparked protests from 140 civil society organizations including the Mozilla Foundation, urging the company to not shut down CrowdTangle until January 2025 and to swiftly onboard all existing CrowdTangle organizations through an expedited application process. Thus far, Meta’s actions are only impeding independent research efforts.

So if tech companies can strip away tools used for independent research, resort to legal intimidation tactics against researchers, or potentially exploit their influence as donors to major universities to hinder research endeavors, it ultimately leaves us with just one recourse i.e., government-mandated intervention and legislation. Such measures would compel these companies to open their platforms to researchers, granting access to vital data. The European Union’s Digital Services Act (DSA) which went into full effect this year, provides some mandate on platform data access and transparency. According to Article 40, which concerns itself with data access and scrutiny, Very Large Online Platforms (VLOPs) or Very Large Online Search Engines (VLOSEs) must furnish real-time public data to “vetted researchers” investigating various “systemic risks” to the European Union. These risks encompass threats to public health and security, civic discourse, freedom of expression, electoral integrity, and other specified areas outlined in Article 34(1).

While the law does not detail the method by which platforms will grant access to the requested data, there is a positive aspect to consider. Under this law, “vetted researchers” have their requests approved by the designated Digital Services Coordinator, hence there might be a possibility that non-EU researchers could potentially be vetted if they meet all the requirements outlined in the Digital Services Act (DSA). However, this is speculation at best, it’s a relatively new law and there’s a lot that needs to be clarified. Moreover, the vagueness of DSA is not necessarily a bad thing as rightly argued by France’s Digital Services Coordinator, Benoît Loutrel i.e., being too precise is a recipe for merely addressing yesterday’s problems.

Though DSA is being championed as the European Union’s “rulebook” for making the internet safer, fairer, and more transparent, a look back at the General Data Protection Regulation (GDPR) which was previously championed as the gold standard for data protection law tells us that there’s no guarantee that these tech companies would comply to the rule of law as innocently as it is presumed. Previously, a German citizen called Matthias Marx filed a complaint against Clearview AI, an American facial recognition company for unlawfully scrapping his face from the internet without his consent. The German regulators ordered Clearview AI to delete the hash value mathematically generated by the company and to confirm the same. The decision also noted that despite Clearview having its registered office in the US and not maintaining an EU establishment, they are still required to comply with the order as the GDPR applies to the processing of personal data of data subjects who are based in the EU. However, Matthias and many other privacy-focused activists argue that it’s technically impossible for Clearview to permanently delete a face. He believes that Clearview’s technology, which is constantly crawling the internet for faces, would simply find and catalog him all over again. Additionally, despite facing fines from European regulators for numerous violations, tech companies often neglect to pay these fines, and the penalties fail to effectively deter their unlawful conduct.

In an ideal world, the laws enacted by the EU would be flawless, and there would be seamless compliance between Big Tech and regulators, with no friction regarding regulatory adherence. However, this is not the case despite positive intentions on the side of lawmakers in crafting these laws. If we move beyond the EU and take a look at just India and Bangladesh, regulatory interventions are a nightmare for democratic values. In India, there’s a growing concern that the competition regulator refrains from holding tycoons to account that have close ties to the government, resulting in escalating monopolies and duopolies across sectors such as airlines, ports, and sports broadcast and entertainment. Simultaneously, the Indian lawmakers exhibit swift and stringent measures against Big Tech, ranging from imposing fines to issuing blocking orders targeting numerous social media accounts. While these fines may have some merit, the arbitrary nature of these blocking orders often results in little more than stifling individual freedom of speech and expression, which is already considerably restricted. This approach not only fails to effectively address underlying issues but also aggravates concerns regarding censorship and the erosion of digital liberties. Likewise, Bangladesh’s draconian Digital Security Act, passed in 2018, poses significant challenges to civil liberties. With reports of arrests targeting journalists, farmers, authors, and even minors, the law’s broad application affects individuals across various sectors and demographics, raising serious concerns about its impact on freedom of expression and human rights, especially online. After facing criticism for several years, months ahead of the January 2024 elections, Bangladesh’s Parliament approved the Cyber Security Act, intended to replace the stringent Digital Security Act and is also a supposedly milder version compared to its predecessor. Nonetheless, critics characterize it as “old wine in a new bottle,” suggesting that it fails to address fundamental concerns and merely rehashes previous legislation.

In the broader public interest, Big Tech must facilitate access to platform data for academic research and journalism while refraining from engaging in deceptive practices. Failure to do so may trigger a domino effect, encouraging authoritarian and authoritarian-adjacent regimes to exploit the resultant information vacuum, ultimately leading to the headache that they so desperately try to avoid.

--

--

The Monolith by Kalim

a non-award-winning blog by a non-award-winning journalist