This data visualization project was completed by Zihui (Chris) Fang and Hongtao Hao, with Equal Contribution, as the term paper for Professor Yong-Yeol Ahn's Data Visualization course in 2019 Fall.

Github repo

This study analyzed the historical data which contains information on athletes in all the past modern Olympic Games. We tried to answer four questions:

  • How did female participation change over the years and how did these changes differ between continents?
  • Is there a home-field advantage at the Olympics?
  • How “efficient” is each participating country or region to get medals? and
  • Which sports had the highest number of participants?

Both the total number of athletes and the rate of female participation have been increasing in the past 120 years.

Results showed that both the total number of athletes and the rate of female participation have been increasing in the past 120 years. Also, there seems to exist a home-field advantage. Third, medal efficiency is highly correlated with participating country or region’s economic development. Finally, athletics, gymnastics, and swimming have the highest number of athletes.

You can also download the PDF version of our paper, whose LaTex source code can be found here .


  • One of first few attempts to compare changes in female participation in the Summer Olympics over 120 years in different continents. This new insight will be important because although data shows that overall female participation in the Olympics has been increasing dramatically, this might have been the case for each continent. Further investigation into the percentage of female athletes in each of the six continents (Asia, Europe, Africa, South America, North America, and Oceania) will reveal deeper information about the trend of female athletes participation.

  • Introducing KDE in examining home-field advantage in the Olympics. This method makes it possible to show the density of each host country’s performance distribution, which makes the comparison between “home” and “away” performance clearer.

  • Introduction of the concept of “medal efficiency”. Earlier attempts either lacked taking into account of “medal points” or unfairly included GDP and population in their formula. Our way of calculating medal efficiency, i.e., weighted medal points per athlete participating, can more accurately estimate each country’s efficiency of producing medals in the Summer Olympics.

The main data set used in this project is the awesome Olympic_history by rgriff23.

Other complementary data sets:

For more detailed information about our data sources, and our codes for data manipulation and visualization, please check out our olymvis-data repository on GitHub.

Data Processing


Zihui (Chris) Fang

Chris Fang earned his Bachelor’s degree in Management Information Systems (General) from Kelly School of Business at Indiana University Bloomington, where is now a Master’s student majoring in the same field. He is always passionate to engage in hands-on projects to gain a more comprehensive understanding of technology utilization in different industries. Some interesting projects he has done are database system configuration for city hall, web database development for local non-profit, prediction model and visualization for stock data analysis in Python, OCR prototyping with ABBYY.

Hongtao Hao

Hongtao Hao graduated with a Master’s degree in Media Arts and Sciences from the Media School of Indiana University in 2020, advised by Dr. Nicole Martins. He is currently working as a research assistant at YY Lab for Professor Yong-Yeol Ahn. He earned his first Master’s degree in Journalism from Renmin University of China, Beijing. You can view his journalism pieces in St.Gallen Symposium Magazine. He is a fan of The Three-Body Problem. The most memorable experience he had is attending the 47th St.Gallen Symposium in 2017.

This website is built with Hugo, using Yihui Xie's hugo-prose theme.

You can find the source code of this website here.

The user experience of paper viewing is much better when your screen is wide enough so that the table of contents will float on the left.