Google Sheets in Observable

Observable JS
Google Sheets
Tutorial
Web Scraping
Author

Scott Franz

Published

July 7, 2023

Abstract
Tutorial on how to use Google Sheets with Observable in Quarto.

Introduction

Google Sheets are a great way to stream data into your Quarto workflow with Observable. D3 has a handy way of fetching different data files if you provide the URL. This is Observable’s recommended way of going about it. I find Google Sheets great for small datasets that are updated frequently.

Read a Google Sheet

Here is an example that is taken from Observable’s Google Sheets Starter Notebook.

getCsvUrl = url => {
  url = new URL(url);
  const id = url.pathname.split("/")[3]
  const gid = new URLSearchParams(url.hash.slice(1)).get("gid") || 0;
  return `https://docs.google.com/spreadsheets/d/${id}/export?format=csv&gid=${gid}`
}

// Above converts a Google Sheets website URL to its CSV URL.
// You can also go to “File → Publish to web”, select the “Comma-separated values (.csv)” 
// type, select the sheet with your data, and use that CSV URL directly with `d3.csv` above.
// If you need data to update faster, try the Google Sheets 
// API: https://stackoverflow.com/questions/30082277/accessing-a-new-style-public-google-sheet-as-json/44479726#44479726

data = d3.csv(getCsvUrl("https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4juclhjFgqIY8fQFMemwKL2c64vk/edit#gid=0"), d3.autoType)

I replaced the URL with the Data is Plural Google Sheet.1 I hooked it up to a search input and a table where you can select a row to see the details and dataset links.

Code
viewof search = Inputs.search(data)

viewof dataset = Inputs.table(search, {sort: "edition", reverse: true, multiple: false, required: false, value: search[1600]})
Code
check(dataset)
Code
check = (dataset) => {
  return dataset ? md`### ${dataset.headline}
${dataset.text}
${dataset.links}` : md`Ooops! Please select a row from the table.`
}

Try it Yourself

You can link to any public Google Sheet below and see the results. Go ahead and copy the URL and click submit. This is The Pudding’s backlog of ideas.

Code
viewof url = Inputs.text({label: "Google Sheet URL", width:width, value: "https://docs.google.com/spreadsheets/d/1neuenck0xy5piNZaaxPhPL0Wsmpe_-gqFjeXqV3MbZE/edit#gid=0", submit: true})

example = d3.csv(getCsvUrl(url), d3.autoType)

Inputs.table(example)

Wikipedia Tables

Google Sheets also have a super simple way of web scraping. I find it pretty useful for scraping Wikipedia tables. For example, here is a table of state capitals from Wikipedia with some relevant statistics. All I need to do is enter this formula =IMPORTHTML("url", "table", index) in Google Sheets. You just need the URL, the script tag2, and the index of the tag on the page.3 Here is the full formula I used below:

=IMPORTHTML("https://en.wikipedia.org/wiki/List_of_capitals_in_the_United_States", "table", 2)

When using tables from Wikipedia some of the column headers may have sub-categories that make the data a little messy. But it is easy to clean up in a Google Sheet. Just copy the values and paste only the values ctrl + shift + v back into the sheet. Then you can edit the values directly. d3.autoType did not detect the date variable here so I fixed it. Also I created new variables with a new names for convenience.

Code
wikipedia = d3.csv(getCsvUrl("https://docs.google.com/spreadsheets/d/1nBuDIKv67qX73UXsQy-Gd_cJc609NJPugLQy3Dlp4Tw/edit#gid=0"), d3.autoType)

parseDate = d3.utcParse("%Y")

clean_wiki = wikipedia.map(d => ({...d, population: d.MSA, date: parseDate(d.Since)}))

Inputs.table(clean_wiki)

The charts below are heavily inspired by Mike Bostock’s Bank Failures Plot.

State Capitals by Land Area

Code
Plot.plot({
  height: 500,
  width,
  insetRight: 70,
  y: {axis: null},
  r: {range: [0, 100]},
  marks: [
    Plot.dot(clean_wiki, Plot.dodgeY({x: "date", anchor: "bottom", r: "Area", title: (d) => `${d.Capital}\n${d.Area} square miles`,
        fill: "powderblue", strokeWidth: 1, stroke: "#000"})),
     Plot.text(
      clean_wiki,
      Plot.filter((d) => d.Area > 100,
       Plot.dodgeY({
        anchor: "bottom",
        padding: 2,
        x: "date",
        lineWidth: 5,
        r: "Area",
        text: (d) => d.Area > 350 
          ? `${d.Capital}\n${d.Area} sq. miles` 
          : `${d.Area}`,
        pointerEvents: "none",
        fill: "#000",
        stroke: "powderblue"})))
        ]
})

State Capitals by Population Size

Code
Plot.plot({
  height: 800,
  width,
  insetRight: 40,
  y: {axis: null},
  r: {range: [0, 60]},
  marks: [
    Plot.dot(clean_wiki, Plot.dodgeY({x: "date", anchor: "bottom", r: "population", title: (d) => `${d.Capital}\n${(d.population/ 1000000).toFixed(1)}M`,
        fill: "thistle", strokeWidth: 1, stroke: "#000"})),
     Plot.text(
      clean_wiki,
      Plot.filter((d) => d.population > 1000000,
       Plot.dodgeY({
        anchor: "bottom",
        //padding: 2,
        x: "date",
        //lineWidth: 5,
        r: "population",
        text: (d) => d.population > 1900000 
          ? `${d.Capital}\n${(d.population/ 1000000).toFixed(1)}M` 
          : `${(d.population/ 1000000).toFixed(1)}M`,
        pointerEvents: "none",
        fill: "#000",
        stroke: "thistle"})))
        ]
})

Write to a Google Sheet

You can also embed a Google Form in an <iframe>. After you submit the form, if you refresh this page the form data appears. Check it out!

Code
html`<iframe src="https://docs.google.com/forms/d/e/1FAIpQLSe_Cw481ZPBImeBoqhrYRmXaci6iHZcfBsWUMovS4dGNRl_kw/viewform?embedded=true" width="${width}$" height="800" frameborder="0" marginheight="0" marginwidth="0">Loading…</iframe>`
Code
google_form = d3.csv(getCsvUrl("https://docs.google.com/spreadsheets/d/1gV7Rg3Vz3nAQKQhi0fXGfa63qIABDYVr-msD96qSD1w/edit?resourcekey#gid=1990814415"), d3.autoType)

Inputs.table(google_form)
Code
Plot.plot({
  inset: 10,
  grid: true,
  marks: [
    Plot.line(google_form, {x: "Timestamp", y: "Pick a Number", curve: "catmull-rom", marker: true}),
    Plot.text(google_form, {x: "Timestamp", y: "Pick a Number", text: (d) => `${d["Write Something Here"]}`, dy: -8})
  ]
})

That is all I have for now. Thanks for reading!