Breaking News

Machine Learning, Indian Social Media’s Biggest Challenge Yet – Analytics India Magazine

“Freedom of expression is not absolute and it is subject to reasonable restrictions.”
MeitY, citing Article 19 (2)

Earlier this month, the Government of India reprimanded Twitter for allowing fake, unverified, anonymous and automated bot accounts to be operated on its platform. The Secretary of MeitY raised doubts about the platform’s commitment to transparency and healthy conversation on this platform. The way Twitter and Facebook handled the events leading upto the elections in the US and the aftermath, has served as a wake up call to the governments around the world. Heads of states like France and Germany openly opposed the “problematic” nature of these platforms. As the debate around privacy and freedom of speech grows stronger, individuals and nations are looking out for better, more transparent alternatives.

The differential treatment in practising censorship on these platforms has been exposed quite often. Recently, Twitter and the Indian Government had a brief stand-off during the farmers protests. In a strong worded statement, the government rebuked Twitter for showing differential treatment to what happened at the Red Fort in comparison to its action during the recent Capitol Hill protests. Twitter caved in. The platform banned few accounts under the guidance of MeitY. 

MeitY in a recent meeting with Twitter executives said that the social media company is free to formulate its own rules and guidelines like any other business entity does, but the  laws which are enacted by the Parliament of India “must be followed” irrespective of Twitter’s own rules and guidelines.

A private entity with power to silence the local elements in a foreign land is problematic in my ways. This is the reason why nations across the world root for their own spin-offs of these popular platforms. Indian app makers have responded whenever there is a radical push for digitisation: The demonetisation propelled Paytm, and the ban of Chinese apps birthed half a dozen startups. Koo is the latest entrant to the club: The Twitter alternative was launched 10 months ago and is the winner of the Aatmanirbhar App Challenge organised by the Indian government. The app raked in nearly a million users in the second week of this month. Parler is a good study in this regard. Launched in 2018, Parler was supposed to be the free speech alternative to Twitter. But the company’s lacklustre content moderation allowed radicals from all walks of life to disrupt conversations on the platform. Google and Apple disowned the app. AWS kicked them out of their servers and the app disappeared overnight.

If Whatsapp’s dubious policies helped Signal, then Twitter’s selective censorship incentivised the likes of Parler and Koo. However, Twitter and Facebook are machine learning powerhouses tirelessly working on building scalable systems that can skim through petabytes of data being generated everyday. The local spin-offs have their task cut out with increasing user base. The content moderation eventually has to be automated. Mayank Bidawatka of Koo recently said that his team “will use algorithms and machine learning” to tackle malicious content on their platform. Using machine learning for content moderation is not new. Twitter and Facebook do it. However, the jury is still out on the outcomes.

Hits And Misses with Algorithms

(Image credits: Twitter engineering blog)

Twitter and Facebook are the most used social media platforms across the world. They have become modern day gatekeepers of information and are even powerful enough to ‘cancel’ the President of the United States. These platforms are built around the idea of freedom of expression but as these platforms attract more people, content moderation becomes a challenge.

Twitter and Facebook use algorithms to ease the burden of human moderators. But, how good are these algorithms? 

What can ML tackle when it comes to content moderation:

Spreading misinformation.
Malicious use of bots to undermine and disrupt the public conversation.
Multiple or overlapping accounts for spamming.
Engaging in bulk or aggressive tweeting, engaging, or following
Spammy hashtags

Image credits: Facebook AI

For instance, Facebook deployed AI systems that can fuse different types of inputs — such as text and speech (as shown above) to predict if two pieces of content are making the same claim, even when they look very different from each other. To tackle deep fakes, FB generates similar deepfake examples like the one the model encounters on the feed to serve as large-scale training data for the deep fake detection model. They use generative adversarial networks (GANs) for this. They have also developed LASER for  cross-language sentence-level embedding. This technique embeds multiple languages jointly in a single shared embedding space, allowing us to more accurately evaluate semantic similarity of sentences.

Whereas, Twitter’s technology proactively identifies and removes malicious behaviors across the service.

See Also

Questions that the Indian social media platforms will eventually face:

Does your ML model rightly detect a malicious user?
How can you prove your model is devoid of bias?
How does your ML model scale when you have 100 million active users every day?
Is the language model good at understanding all languages equally?

Using machine learning and other automated techniques to look over content brings in another challenge— productionising the ML models.

For enterprises, it is an uphill task to deploy machine learning models at full scale.Lack of talent, lack of quality data, improper model selection are few of the pressing challenges that impede an organisation’s ML ambitions. It is usually recommended to build technologies that service-specific tools and frameworks instead of relying on third party providers. For example, Twitter’s success with scaling ML systems is the result of its continuous re-engineering of infrastructure by taking advantage of the latest open standards in technology. According to a 2019 report, Twitter has over 300 million monthly active users and stores 1.5 petabytes of logical time series data while handling 25K query requests per minute. 

This is one of the reasons why Twitter chose Google Cloud to perform ML operations. Google’s tools such as BigQuery and Data Studio come in handy for building simple pipelines. However, Twitter had to build their own infrastructure called Airflow. In the area of data governance, BigQuery services for authentication and auditing do well but, for metadata management and privacy compliance, Twitter had to custom design systems.

Koo wants Indian diaspora to listen to the views of some of the sharpest Indian minds and also “speak their mind” by sharing their thoughts. But, speaking one’s mind is where all the problems start. It is a regulatory nightmare. More so for platforms that rely on AI. From cropping photos to running a sentiment analysis machine learning is a crucial part of any social media platform and as the state of the art models get better at grasping nuances, the GPTs of the world might someday be able to identify malice from humor and misinformation from banter!

Subscribe to our Newsletter
Get the latest updates and relevant offers by sharing your email.

You can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.