How To Scrape A Public Tableau Dashboard?
Every day I need to downlaod the data available on a public Tableau dashboard. After defining the parameters of interest (time series frequency, time series interval, etc) the dash
Solution 1:
Edit
I've made a tableau scraper library to extract the data from Tableau worksheets
You can get the data from worksheets in a pandas dataframe directly. Also, the parametered values are supported.
The following example get the data from worksheet Simples Demanda Máxima Ano
, then switch to daily mode, shows the worksheet Simples Demanda Máxima Semana Dia
data and then set start date to 01/01/2017
:
from tableauscraper import TableauScraper as TS
url = "https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima"
ts = TS()
ts.loads(url)
wb = ts.getWorkbook()
# show dataframe with yearly data
ws = wb.getWorksheet("Simples Demanda Máxima Ano")
print(ws.data)
# switch to daily
wb = wb.setParameter("Escala de Tempo DM Simp 4", "Dia")
# show dataframe with daily data
ws = wb.getWorksheet("Simples Demanda Máxima Semana Dia")
print(ws.data)
# switch to daily
wb = wb.setParameter(
"Início Primeiro Período DM Simp 4", "01/01/2017")
# show dataframe with daily data from 01/01/2017
ws = wb.getWorksheet("Simples Demanda Máxima Semana Dia")
print(ws.data)
Original post
This answer is similar to this one but the initial URL page and tableau base URL differ. The process/algo remains the same essentially but I will details the steps :
The graphic is generated in JS from the result of an API :
POST https://tableau.ons.org.br/ROOT_PATH/bootstrapSession/sessions/SESSION_ID
The SESSION_ID
parameter is located (among other things) in tsConfigContainer
textarea in the URL used to build the iframe.
- there is a textarea with id
tsConfigContainer
with a bunch of json values - extract the
session_id
and root path (vizql_root
) - make a POST on
https://tableau.ons.org.br/ROOT_PATH/bootstrapSession/sessions/SESSION_ID
with thesheetId
as form data - extract the json from the result (result is not json)
Code :
import requests
from bs4 import BeautifulSoup
import json
import re
url = "https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima"
r = requests.get(
url,
params= {
":embed":"y",
":showAppBanner":"false",
":showShareOptions":"true",
":display_count":"no",
"showVizHome": "no"
}
)
soup = BeautifulSoup(r.text, "html.parser")
tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)
dataUrl = f'https://tableau.ons.org.br{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'
r = requests.post(dataUrl, data= {
"sheet_id": tableauData["sheetId"],
})
dataReg = re.search('\d+;({.*})\d+;({.*})', r.text, re.MULTILINE)
info = json.loads(dataReg.group(1))
data = json.loads(dataReg.group(2))
print(data["secondaryInfo"]["presModelMap"]["dataDictionary"]["presModelHolder"]["genDataDictionaryPresModel"]["dataSegments"]["0"]["dataColumns"])
Post a Comment for "How To Scrape A Public Tableau Dashboard?"