Christian Martin

HomeProjectsTalksAboutResume

oEmbed Provider

While not SEO, oEmbed is another way to give information about web pages, specifically intended for embedding media like photos and video. I used Cloudflare Workers to create a serverless oEmbed provider that scrapes the web page for metadata, including Schema.org markup and <meta> tags.

Here's an example using this blog post. The following Schema is generated for the post:

{
  "@context":"https://schema.org",
  "@type":"WebPage",
  "datePublished":"2020-04-19",
  "description":"Schema.org is a semantic, structured way to write metadata, and also part of how Google understands your site\u0026rsquo;s content.",
  "image":"https://ctmartin.me/blog/2020/04/schema-api-and-search/preview.jpg",
  "name":"Building a metadata API \u0026 Search",
  "url":"https://ctmartin.me/blog/2020/04/schema-api-and-search/"
}

Which then gets turned into the following oEmbed:

{
  "type": "link",
  "version": "1.0",
  "title": "Building a metadata API & Search",
  "provider_name": "Christian Martin's Blog",
  "author_name": "Christian Martin"
}

The code uses Cloudflare's HTMLRewriter API to extract and parse Schema JSON-LD & <meta> tags. The data is then mapped to oEmbed properties, preferring Schema and falling back to less specific/vendor-branded tags as needed. According to the Cloudflare dashboard, the median CPU time (as of writing) is 2.3ms.

If the code can't find all the information it needs in Schema, it can also fall back to <meta> and other tags.

For example, these tags are also in the blog post and could be used if the Schema wasn't there:

<title>Building a metadata API &amp; Search | Christian Martin</title>
<meta name="application-name" content="Christian Martin">

<meta name="description" content="Schema.org is a semantic, structured way to write metadata, and also part of how Google understands your site&rsquo;s content.">
<meta name="generator" content="Hugo 0.77.0" />
<meta name="publisher" content="Christian Martin">
<meta property="og:title" content="Building a metadata API &amp; Search" />
<meta property="og:description" content="Schema.org is a semantic, structured way to write metadata, and also part of how Google understands your site&rsquo;s content." />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://ctmartin.me/blog/2020/04/schema-api-and-search/" />
<meta property="article:published_time" content="2020-04-19T00:00:00+00:00" />
<meta property="article:modified_time" content="2020-04-19T00:00:00+00:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Building a metadata API &amp; Search"/>
<meta name="twitter:description" content="Schema.org is a semantic, structured way to write metadata, and also part of how Google understands your site&rsquo;s content."/>
<meta itemprop="name" content="Building a metadata API &amp; Search">
<meta itemprop="description" content="Schema.org is a semantic, structured way to write metadata, and also part of how Google understands your site&rsquo;s content.">
<meta itemprop="datePublished" content="2020-04-19T00:00:00+00:00" />
<meta itemprop="dateModified" content="2020-04-19T00:00:00+00:00" />
<meta itemprop="wordCount" content="829">