Outside
Cloud Cloud Cloud
School
Rahul EVM

We begin in Wayanad, Kerala. It is voting day.

You step into the voting booth. The electronic voting machine hums in the converted classroom as the fan whirs overhead. Other voters behind you wait their turn. Your eyes search for the familiar symbol and name amid the rows of text.

This is a little weird. There's Rahul Gandhi that you recognize, but below sits another — Raghul Gandhi, with a different symbol. There's another man with the Gandhi last name right below. And one Rahul Gandhi KE more. That's a lot of Gandhis in one place.

In that election, 2196 voters chose Rahul Gandhi KE – a candidate whose campaign began at the last moment and politcal aspirations ended right after the polls. Whether these votes were because of confusion for the other Rahul Gandhi, or real conviction, is unclear.

In 2014, before photos of the candidates were put on EVMs, voters of a constituency in Chhattishgarh were faced between the choice of the BJP candidate Chandu Lal Sahu and 11 other Chandu Lal Sahus. All spelt almost all the same. Symbols, names, but no photos.

There is one real Chandu Lal Sahu in here..

...did you keep track of him?

The real Chandu still won, but would have gotten at least 30,000 more votes without his duplicates in the running.

This doesn't just happen to a single party. Between 1960 to 2024, there are over 8000 candidates who share very similar names with their rivals – close enough to give careful voters pause, and perhaps misdirect the less aware into voting for namesakes who would disappear after election day.

Diagram Chasing presents
✦ नाम तो सुना ही होगा ✦

Votes in a name

Namesakes in Indian Elections and How to Find Them

By Aman Bhargava
Art by Reechik Banerjee

India never really stops voting. At any given time, you’ll probably find some state gearing up for polls, managing an electorate that spans from city professionals to farmers across 28 states and dozens of languages.

To handle this diversity, the system relies heavily on symbols - INC’s hand, BJP’s lotus, AAP’s broom. These visual markers help voters navigate ballots that appear in multiple languages and help unify the message in a country where illiteracy is still high. While party candidates carry these established symbols, independents choose from Election Commission-approved ones, which are usually common objects like pressure cookers or ceiling fans.

But here’s where it gets interesting. While the system carefully prevents any confusion with party symbols, names are something of a blind spot. No two parties can share a lotus or a hand, but almost nothing stops multiple candidates from having nearly identical names. Until recently, the situation was exacerbated by the fact that the choices only appeared to voters as the names and symbols (which could also be alike) and no photographs of the people they were voting for. That changed less than a decade ago. Even so, sometimes, voters can find themselves doing double-takes at the ballot box, faced with candidates whose names are suspiciously similar to major contenders.

That list is quite long. In their constituencies, names like Alimineti Uma Madhava Reddy faced off with Alimineti Madhava Reddy, Raghubir Singh appeared on the same ballot as Raghvir Singh, H.D. Kumaraswamy faced D. Kumaraswamy, S.V. Ramani competed with S. Veeramani, Virender Singh and Virandar Singh, a specific Mohan Lal Badoli against a more ambiguous Mohan Lal, Rajendran A. versus Rajendaran B… you get the idea.

If you’re thinking, ‘Well of course these are confusing to me, I just heard of them right now’, that’s sort of the point. Unless you’re getting the same coverage as major candidates that generally dominate the headlines, a part of your electorate may be just as clueless if your campaign has limited reach. And that’s precisely what makes this potentially effective — it doesn’t need to fool everyone, just enough voters in what might be a tight race.

It’s of course unlikely that most of these namesakes could effectively sway an entire electorate, and some cases might be genuine coincidence — after all, common names exist. But it is an interesting exercise to find where all such cases appear and just how similar these names can get.

Name games

The Rahul and Raghul case, while high-profile, is less interesting precisely because it’s so visible. Voters are unlikely to be confused by such a well-known example. Most voters first see their local candidate’s written name at the EVM, where they must match what they’ve heard to what they see, in whichever script or language that may be. The more interesting cases lie in smaller constituencies and lesser-known candidates, where similar names have a greater impact.

The numbers are big, though. Between 1960 and 2023, across general and state elections, we’re examining approximately individual names, so I took some calls on how I filter it to a smaller, more relevant pool. After cleaning the data, the people in each constituency were divided into two groups — candidates who might face a namesake and the rest, for whom I had to figure out if they might be a namesake.

Name similarity can be measured in two ways: look-alike and sound-alike comparisons. To understand potential voter confusion, I needed ways to capture as much name similarity as possible (ideally, both visual and phonetic, but for this story, only visual).

The usual approaches to measuring similarity proved inadequate on their own. Levenshtein distance, which counts character edits, misses structural patterns in names. Dice’s letter-pair matching can stumble when similar names have different arrangements. Even the Jaro-Winkler algorithm, typically reliable for ordered names, struggles with variations of name order on ballots and spelling differences. Another, BI-SIM, offers a solution by analysing sequential letter pairs while maintaining word order, which can capture the flow of names much better, somewhat avoiding the shortcomings of simple letter-matching.

Have a look at a sample of the matches and how each of these algorithms scores them.

Differences in scoring

StateReference NameSimilar NameJW¹BI-SIM¹Dice¹
AssamProdip HazarikaPradip Hazarika81.3%86.7%85.7%
BiharDevandra Prasad YadavDevendra Prasad Yadav83.5%90.5%80%
GujaratRajeshbhai Chimanbhai VasavaRamanbhai Chimanbhai Vasava77.0%82.1%64.2%
Madhya PradeshMahendra Singh BhadoriyaDevendra Singh Bhadoriya75.0%81.2%87%
Tamil NaduNageswari Thirumathi.n.Maheswari Thirumathi .a.76.9%84.1%66.7%
Uttar PradeshDr. Narendra Kumar Singh GaurNarendra Kumar Singh Yadava70.8%78.6%71.7%
¹ JW = Jaro-Winkler, BI-SIM = BI-SIM (Kondrak, 2003), Dice = Dice Coefficient

They all do well in most cases, some are more eager to falsely match names than others, and some miss the obvious extent of similarity. So instead of relying on a single algorithm, I used all of them with different weights and only considered a pair of names to be a similar match if at least two algorithms agreed they were.

Let the algorithms vote too, if you will.

Even so, this process turned out to be incredibly sensitive to small changes in what we considered a threshold for matching. Names in India vary greatly by region, so after processing the entire dataset, I plotted what kind of names they identify and how it varied.

BiharUttar PradeshPunjabTamil NaduMadhya PradeshHaryanaKarnatakaRajasthanKeralaAndhra PradeshJammu & KashmirWest BengalMaharashtraOdishaGujaratJharkhandDelhiAssamHimachal PradeshChhattisgarh0.50.60.70.80.91.0🠐 Less SimilarityMore Similarity 🠒

I think the above plot is fascinating because it shows you the range of names that each metric manages to capture, but we also see how certain states have names that are easier to capture. You’ll also notice that this is being shown to three decimal places. Since these values are sensitive to small changes, the difference between a 0.754 and 0.775 is not insignificant.

Each algorithm brings its bias - Jaro-Winkler tends toward much more optimistic matching (clustering toward higher similarity scores), BI-SIM remains conservative and grades similarity much more strictly, while Dice coefficient usually splits the difference. When two or more agree, we’re more likely to catch genuine name similarities rather than algorithmic quirks. Given the mixed salad that these ballots can be - last names before first names, fondly used terms (’bhaiya’, ’sahab’) popping up randomly, honorifics appearing wherever they may, or transliteration chaos switching between scripts, this is useful.

We’re not that different, you and I

That was a long explanation, but now I could finally stop looking for names and start looking at what I can learn about them. Up to this point, a lot of it had been numbers, numbers, numbers. Yes, alright, 0.9 is more similar to something than 0.57, but it would be nicer to translate that back into a visual.

Look-alike contests are all the rage now, want to have one of our own? $50 cash prize.

We’ll work our way back from the numbers and try to construct what the idea of visual similarity means for people actually look at things together for the first time and trying to make sense of which is which.

Enter any two names

...and see what your face cards look like

compared to
EVM

JAGDISH YADAV

JAGDISH CHANDER

BI-SIM 0.0%
Dice 0.0%
JW 0.0%
Total Similarity 0.0%

Sufficiently different names are easy to spot, but when they’re not, it takes a bit of work.

I still can’t imagine having to sort between around a dozen Chandu Lal Sahus.

These similar candidates come in various forms. You’ve already seen how algorithms differ based on which state the names are from, so certainly there must be some underlying structure? Well, there is!

So what changed between us?

JKPBHPHRUKARRJDLUPBRSKASNLGJMPJHWBMLMNMHCGODTRMZTSGAKAAPKLTNPunjab

Certain states are much more represented here than others, but there are still some interesting patterns. First names are more likely to vary a little, and that seems understandable. It’s easier to find someone with a matching last name but a different first name in the same constituency, since surnames often indicate caste/community groups that cluster geographically. For instance, in Punjab, the last name ‘Singh’ usually remains the same and first names are more likely to differ, whereas in Tamil Nadu the trend goes the opposite way. Tamil Nadu also has the most number of cases where only an ‘initial’ change is required – such as the case is with A.K Moorthy’s namesake, K. Moorthy.

Does it ever make a difference?

So far, you’ve seen some ways we can identify and search for namesakes. The question that kept nagging at me was: were there cases where these namesake candidates actually flipped elections?

Most of these candidates barely register in the final tallies - typically getting less than 1% of the vote. Can even small vote shares can matter in some circumstances?

00.20.40.60.81.01.2Vote Share (%)02468KNamesake candidates

For each pair of similar names, I ran 400 simulations asking a simple question: what if those votes had gone elsewhere?

I gave the simulation different scenarios where varying portions of the namesake candidate’s votes (20%, 40%, 60%, etc.) went to the candidate they shared a name with. These scenarios were weighted based on our similarity scores from the name matching.

While it’s not perfect — voters are complex and elections are messy — it gives us a way to identify the races where namesake candidates might have had their biggest impact.

I narrowed this down to show us examples where the simulation worked in favour of the non-independent candidate 50% of the time.

Elections that could have flipped

Instances with similarly-named candidates and close margins

YearElection (State/Constituency)
Candidate
Election
Margin
Namesake
Similarity
2023KA Jayanagar
C K Ramamurthy
57,797 votes
16
B Ramamurthy
320 votes
84.5%
2023KA Jayanagar
Sowmya Reddy
57,781 votes
16
Sowmya A Reddy
320 votes
84.5%
2019AP Vijayawada Central
Bonda Umamaheswara Rao
70,696 votes
25
Gondesi Umamaheswara Reddy
88 votes
68.5%
2019AP Vijayawada Central
Malladi Vishnu
70,721 votes
25
Malladi Vishnu Vardhan(vishnu)
88 votes
68.5%
2005BR Mokameh
Anant Kumar Singh
35,877 votes
94
Anjani Kumar Sinha
1,141 votes
73.2%
2001KL Nedumangad
Palode Ravi
62,114 votes
156
Vazhode Ravi
442 votes
68.0%
2008MP Sonkatch
Phool Chand Verma
54,596 votes
191
Phoolchand Verma
1,318 votes
87.6%
2012PB Jagraon
Ishar Singh
52,825 votes
206
Avtar Singh
1,192 votes
65.8%
2008MP Dimani
Shiv Mangal Singh Tomar
24,777 votes
256
Shiv Singh Tomar (divakar Giri)
6,340 votes
77.8%
2016TN Cheyyur
Munusamy A
63,142 votes
304
Munusamy S
803 votes
90.3%
2022UP Bilaspur
Amarjeet Singh
101,691 votes
307
Harjeet Singh
702 votes
81.1%
2006TN Thirumayam
Radhakrishnan M
47,044 votes
314
Radhakrishnan A
1,462 votes
93.6%
2001KL Irinjalakuda
T.sasidharan
53,836 votes
406
Sasidharan
1,867 votes
79.4%
2020BR Bhorey
Jitendra Paswan
73,605 votes
462
Jitendra Ram
1,790 votes
69.2%
2002UP Seorahi
Ramsakal Tiwari
3,845 votes
465
Ramdhar Tiwari
4,222 votes
71.3%
2006KL Pattambi
C.p. Mohammed
57,752 votes
566
A.p. Mohamed
1,286 votes
65.7%
2006KL Pattambi
K.e. Esmail
57,186 votes
566
K. Ismail
1,286 votes
65.7%
2002UK Champawat
Hayat Singh Mahra
7,743 votes
637
Madan Singh Mahrana
7,985 votes
68.4%
2002UK Champawat
Narayan Singh Mahar
2,375 votes
637
Madan Singh Mahrana
7,985 votes
68.4%
2004KA Hosakote
Bachegowda B N
74,973 votes
835
Bachegowda A N
2,468 votes
87.3%
2004KL Alleppey
V M Sudheeran
334,485 votes
1009
V S Sudheeran
8,282 votes
85.7%
2005BR Bhore
Anil Kumar
34,299 votes
1218
Anil Kumar Baitha
3,849 votes
70.1%
2019JH Khunti
Kali Charan Munda
381,193 votes
1445
Masih Charan Munda
6,964 votes
75.3%
2014MH Hingoli
Wankhede Subhash Bapurao
465,765 votes
1632
Wankhede Subhash
6,157 votes
76.3%
2009KL Palakkad
M.b. Rajesh
338,070 votes
1820
N.v. Rajesh
5,478 votes
66.1%
2009KL Palakkad
Satheesan Pacheni
336,250 votes
1820
Satheesan. E.v
5,478 votes
66.1%
2009KA Davanagere
G.m. Siddeswara
423,447 votes
2024
G. N. Siddesh
5,694 votes
82.1%
2009KA Davanagere
S.s. Mallikarjuna
421,423 votes
2024
L.s Mallikarjun
5,694 votes
82.1%
Only the state assembly elections are considered
¹ Table shows examples where the simulation returned a 50% or more probability of the candidate winning

Some of these are incredibly close margins! What was interesting to me was that all of these happened in state assembly elections and not the general elections. Which, I suppose, could make sense? The campaigns in states tend to be more local, and the candidates often don’t get as much attention and press coverage as they do for national races.

To repeat what I said in the beginning, it seems futile to prove, only statistically, that such examples are intentional rather than coincidental. At least we know something about how to look for them now.


Sources and notes

The data on election candidates comes from Trivedi Centre for Political Data. We ran the analysis using R. For the Jaro-Winkler algorithm, we used the stringdist package. Code for the BI-SIM algorithm was adapted from strcmp2. Dice algorithm was adapted from the dice-coefficient package available here. This website is built using Sveltekit and uses D3 for visualizations. Website and analysis is available on GitHub.



The artwork has been made available under CC-BY-SA. Find them here.


This project won the 2024 Pudding Cup

References

(0) http://www.rochester.edu/college/faculty/alexander_lee/wp-content/uploads/2022/12/paper2.pdf. [link]

(2019) Lok Sabha polls 2019: This election season, it’s all about the names. Hindustan Times. [link]

Ananay Agarwal et al. (2021) TCPD Indian Elections Data v2.0. Trivedi Centre for Political Data, Ashoka University. [link]

Niraj Aswani and Robert Gaizauskas (0) English-Hindi Transliteration using Multiple Similarity Metrics.

Peter Christen (2006) A Comparison of Personal Name Matching: Techniques and Practical Issues. Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06). doi:10.1109/ICDMW.2006.2

Lynne Emmerton et al. (2020) Development and exploratory analysis of software to detect look-alike, sound-alike medicine names. International Journal of Medical Informatics. doi:10.1016/j.ijmedinf.2020.104119

Grzegorz Kondrak and Bonnie Dorr (0) Identification of Confusable Drug Names: A New Approach and Evaluation Methodology.

Acknowledgements

I am incredibly thankful to Rhea, who came up with the idea of generating faces from numbers. It is my favorite part of this piece.

A nod to my faulty chair, who finally called off our year-long fight and let me sit at the proper height while I wrapped this project.

AI Disclosure

No LLM was used for data cleaning, annotation or generating insights. No words were generated with an LLM. The author uses Cursor and Claude as coding assistants.

Read more at Diagram Chasing