There is no doubt that foreign players are important in the NBA, as in the last seasons we have seen a foreign player winning the MVP two years in a row, Giannis Antetokounmpo. Or dominant and important players like Luka Dončić or Nikola Jokić for their respective teams in the playoffs.
But how many foreign are today in the NBA? Or what proportion do they represent in the total proportion of players?. In this project, we are going to find out these questions and also get a better picture of the NBA foreign players, in terms of how the numbers have change through time, in which positions they play and which teams have more foreign players.
To do this analysis, we are going to use two datasets:
First, we are going to import the two datasets and have a quick look of them.
import pandas as pd
import numpy as np
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import json
import plotly
plotly.offline.init_notebook_mode()
nba_players = pd.read_csv('all_seasons.csv')
nba_players.head()
nba_players.info()
positions = pd.read_csv('positions.csv')
positions.head()
positions.info()
There aren't any null values in both datasets, but as we can see, the dataframes need some cleaning, like removing unnecessary columns and have an agreement with the names.
We are going to start the cleaning with the nba_players dataframe, first we are going to isolate the columns of interest and generate two more columns. The first column will be a Boolean column if a player is foreign or not and the second one will be a label if a player is foreign or from the USA.
nba_players = nba_players[['player_name','team_abbreviation', 'age', 'country', 'season']]
nba_players['foreign'] = nba_players.apply(lambda x: False if x['country']=='USA' else True, axis=1)
nba_players['foreign_label'] = nba_players.apply(lambda x: 'USA' if x['country'] == 'USA' else 'Foreign', axis=1)
nba_players.tail()
From this dataframe, we are going to generate another one just from the last season, 2019 - 2020. This will be the input for the merge with the positions dataframe. Also, we are going to modify the names in this dataset to match the ones in the positions one.
The problem is related of how the names were written, in the nba_players dataset, the names don't include special characters and in the positions dataset they do. So, we are going to replace the names in the nba_players dataset with the names including special characters.
Note: The analysis of which names needed to be changed was previously done.
names={'Goran Dragic':'Goran Dragić', 'Harry Giles III':'Harry Giles', 'Frank Mason':'Frank Mason III',
'Dzanan Musa':'Džanan Musa','Ersan Ilyasova':'Ersan İlyasova','Jonas Valanciunas':'Jonas Valančiūnas',
'Boban Marjanovic':'Boban Marjanović','Bogdan Bogdanovic':'Bogdan Bogdanović',
'Bojan Bogdanovic':'Bojan Bogdanović','Alen Smailagic':'Alen Smailagić','Ante Zizic':'Ante Žižić',
'Anzejs Pasecniks':'Anžejs Pasečņiks','Davis Bertans':'Dāvis Bertāns','Dennis Schroder':'Dennis Schröder',
'Dario Saric':'Dario Šarić','Cristiano Felicio':'Cristiano Felício','Skal Labissiere':'Skal Labissière',
'Svi Mykhailiuk':'Sviatoslav Mykhailiuk','Vlatko Cancar':'Vlatko Čančar','Willy Hernangomez':'Willy Hernangómez',
'Timothe Luwawu-Cabarrot':'Timothé Luwawu-Cabarrot','Tomas Satoransky':'Tomáš Satoranský',
'Kristaps Porzingis':'Kristaps Porziņģis','Juancho Hernangomez':'Juan Hernangómez',
'Luka Doncic':'Luka Dončić','Luka Samanic':'Luka Šamanić','Nicolo Melli':'Nicolò Melli',
'Nikola Jokic':'Nikola Jokić','Nikola Vucevic':'Nikola Vučević'}
nba_players_2020 = nba_players[nba_players['season'] == '2019-20'].copy()
nba_players_2020.replace(names, inplace=True)
Prior to the merge, we need to correct the player names and give the same format that the nba_players_2020 dataset is.
positions['Player'] = positions.Player.replace(to_replace=r'\\.+', value='', regex=True)
positions.rename(columns={'Player': 'player_name'}, inplace=True)
positions.drop_duplicates(keep='first',inplace=True)
positions = positions[['player_name','Pos']]
positions.head()
The merge will be a left joint of the nba_players_2020 with the positions dataset.
nba_players_2020 = nba_players_2020.merge(positions, how='left', on='player_name')
nba_players_2020.head()
As we want to know how important foreign players are in the NBA, we are going to analyze the data in terms of four factors:
To get an idea of how the importance of foreign player has changed in time, we are going to obtain the total number of players in each season of the dataset, and also generate a proportion of foreign vs nationals in the 1996 season and in the last season.
After that we are going to plot this information to obtain some insight.
foreigns_overtime = nba_players.groupby('season').sum()
proportion_players_1996 = nba_players.loc[nba_players['season'] == '1996-97', 'foreign_label'].value_counts()
proportion_players_2020 = nba_players.loc[nba_players['season'] == '2019-20', 'foreign_label'].value_counts()
fig = make_subplots(
rows=2, cols=2,
specs=[[{"colspan": 2}, None],
[{"type": "pie"}, {"type": "pie"}]],
subplot_titles=('Foreign players in NBA from 1996 to 2020',"Proportion of foreign players in 1996", "Proportion of foreign players in 2020"))
fig.add_trace(go.Scatter(x=foreigns_overtime.index, y=foreigns_overtime.foreign, mode='lines',
showlegend=False, line={'color':'#e63946'},
hovertemplate='<b>Season %{x}</b>' + '<br>Number of foreigns: %{y}<extra></extra>'),
row=1, col=1)
fig.update_xaxes(tickangle=90,type='category', row=1, col=1)
fig.update_yaxes(title_text='Number of foreigns in NBA', row=1, col=1)
fig.add_trace(go.Pie(labels=proportion_players_1996.index, values=proportion_players_1996, textinfo='label+percent',
marker={'colors':['#1d3557','#e63946']}, hole=.4, name=''), row=2, col=1)
fig.add_trace(go.Pie(labels=proportion_players_2020.index, values=proportion_players_2020, textinfo='label+percent',
marker={'colors':['#1d3557','#e63946']}, hole=.4, name=''), row=2, col=2)
fig.update_layout(height=800, width=800, plot_bgcolor="#F9F9F9", legend={'y':.15})
fig.update_annotations(y=.45, selector={'text':'Proportion of foreign players in 1996'})
fig.update_annotations(y=.45, selector={'text':'Proportion of foreign players in 2020'})
plotly.offline.iplot(fig, filename='proportion_foreigns')
As we can see from the graphs above, the number of foreign players has dramatically increased from 1996 to the last season. In 1996-97 season, there were only 9 players, making just 2% percent of the total players in NBA. And in the 2019-20 season, the number has increased to 120, making 23.3% of the league players, almost a quarter of all players.
Also, by looking at the top graph, we can see that the increase has been constant and it wasn't a sudden increment.
Now, we are going to find out from which countries the foreign players of 2019-20 season come from. And see if there any trends in this information.
In order to do this, we are going to generate a choropleth map of the world denoting the number of players per country, excluding the USA.
players_by_country= nba_players[nba_players.season == '2019-20'].groupby('country').sum()
players_by_country
players_by_country.drop('USA', inplace=True)
json_file = open('custom_coun.json')
countries = json.load(json_file)
palette = plotly.colors.make_colorscale(["#f5e0e4","#eeabb2","#ea6973","#e84c58","#e63946"]
)
fig = px.choropleth_mapbox(players_by_country, geojson=countries, color="foreign",
locations=players_by_country.index, featureidkey="properties.admin",
center={"lat": 23, "lon": 3},
mapbox_style="carto-positron", zoom=.5,
color_continuous_scale=palette, labels={'foreign':'Players'},
opacity=.9,custom_data=[players_by_country.index, 'foreign'],
title='Number of foreign players in NBA by country')
fig.update_layout(margin={"b":30,'l':30,'t':50})
fig.update_traces(
hovertemplate='<b>%{customdata[0]}</b>' + "<br>Number of players: %{customdata[1]}<extra></extra>")
fig.update_layout(title={
'y':0.95,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'})
plotly.offline.iplot(fig, filename='map_word')
The country with the most foreign players in the NBA is Canada, 19, this makes total sense as they have big ties with the US and also are right next to each other. But the closeness doesn't mean anything as Mexico have no player in NBA in the last season, and this is true also for almost all Latin America excluding Brazil, 3.
The second country with the majority of foreign players in NBA is France, 11. Also, a lot of European countries have players in the NBA as we can see in the map. This is no surprise as there is a big culture of Basketball in Europe and the most important leagues in the world besides the NBA.
Australia is the third country in terms of foreign players, 9. Africa has a good number of players distributed in different countries of the continent.
Asia is the continent with the least number of players in NBA, as it just has two and both are Japanese.
In terms of positions, we are going to analyze if there is a specific position where foreign players play more or if they are distributed evenly in every position.
foreigns_by_position = nba_players_2020[nba_players_2020['foreign']==True].Pos.value_counts()
foreigns_by_position
foreigns_by_position.drop('SF-SG', inplace=True)
fig = px.bar(x=foreigns_by_position.index, y=foreigns_by_position, title='Number of foreign players by position in NBA Season 19-20')
fig.update_traces(marker_color=['#e63946',"#e28413","#81adc8","#8d86c9",'#1d3557'])
fig.update_traces(
hovertemplate='<b>%{x}</b>' + "<br>Number of players: %{y}<extra></extra>")
fig.update_layout(xaxis={'title':'Positions',
'tickvals':[0,1,2,3,4],
'ticktext':['Center', 'Power Forward', 'Shooting Guard', 'Small Forward', 'Point Guard']},
yaxis={'title':'Number of players'},
title={
'y':0.85,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
plot_bgcolor="#F9F9F9")
plotly.offline.iplot(fig, filename='number_by_position')
The position with more foreign players is the Center, with 41 players. Then it is followed by the Power Forward with 25 players. The positions were more than half of the foreign players play are the ones with the highest average height, meaning that the majority of these players are tall people.
Now that we know how many foreign players are in the NBA, where do they come from and which position do they play, let's analyze how these players are distributed through the NBA teams.
foreigns_by_teams = nba_players_2020[nba_players_2020['foreign']==True].team_abbreviation.value_counts()
fig = px.bar(x=foreigns_by_teams, y=foreigns_by_teams.index, title='Number of foreign players by position in NBA Season 19-20',
color=foreigns_by_teams, color_continuous_scale='Blues')
fig.update_yaxes(autorange="reversed")
fig.update_traces(
hovertemplate="<br>Number of players: %{x}<extra></extra>")
fig.update_layout(xaxis={'title':'Number of players'},
yaxis={'title':'Teams'},
title={
'y':0.9,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
plot_bgcolor="#F9F9F9",
height=700)
fig.update(layout_coloraxis_showscale=False)
plotly.offline.iplot(fig, filename='number_by_team')
Some insights can be derived from the graph above:
Foreign players have become a crucial part of NBA, as right now in this season, they are almost a quarter of the total number of players. And it seems that this will only increase, as this has had a constant increment rate in previous years.
These foreign players come from different parts of the world, but the majority comes from Europe, Canada and Australia. They tend to play more in positions where the height is a clear factor like Centers or Power Forwards.
The great majority of teams in NBA have 3 or more foreign players, just four teams are under this level.