Scraping an API

Used Components

Setup

For this setup, we will periodically scrape a public API, map it with RML to Linked Data, Transform it to a Version Object and write it to console.

RML Mapping

Since RML can sometimes be hard on human eyes, we’ll convert our YARRRML to RML via Matey.

Through this, we can convert this YARRRML to the following RML.

prefixes:
 ex: "http://example.com/"
 cs: "http://www.cheapshark.com/"
 ldi: "http://www.vlaanderen.be/ns/ldi#"

mappings:
  person:
    sources:
      - ['deals.json~jsonpath', '$[*]']
    s: http://www.cheapshark.com/gamedeals/$(gameID)
    g: http://www.cheapshark.com/gamedeals/$(gameID)/$(lastChange)
    po:
      - [a, cs:GameDeal]
      - [cs:title, $(title)]
      - [cs:metacriticLink, $(metacriticLink)]
      - [cs:thumb, $(thumb)]
      - p: cs:releaseDate
        o:
            function: ldi:epochToIso8601
            parameters:
            - [ldi:epoch, $(releaseDate) ]
            datatype: xsd:DateTime
      - p: cs:lastChange
        o:
            function: ldi:epochToIso8601
            parameters:
            - [ldi:epoch, $(lastChange) ]
            datatype: xsd:DateTime
            
      - [cs:isOnSale, $(isOnSale), xsd:Boolean]
      - [cs:normalPrice, $(normalPrice), xsd:Double]
      - [cs:salePrice, $(salePrice), xsd:Double]

mapping.ttl

Let’s save the mapping.ttl in our current directory.

ldio.config.yaml:

orchestrator:
  pipelines:
    - name: data
      input:
        name: Ldio:HttpInPoller
        config:
          url: https://www.cheapshark.com/api/1.0/deals?pageSize=1000
          interval: PT30M
        adapter:
          name: Ldio:RmlAdapter
          config:
            mapping: "mapping.ttl"
      transformers:
        - name: Ldio:VersionObjectCreator
          config:
            date-observed-property: "http://www.cheapshark.com/lastChange"
            member-type: "http://www.cheapshark.com/GameDeal"
            generatedAt-property: "https://w3id.org/ldes#timestampPath"
            versionOf-property: "https://w3id.org/ldes#versionOfPath"
      outputs:
        - name: Ldio:ConsoleOut
          config:
            content-type: text/turtle

Execution

Once started, you should be seeing data in your console similar to

<http://www.cheapshark.com/gamedeals/157072/2023-06-28T21:31:20.000Z>
    a       <http://www.cheapshark.com/GameDeal> ;
    <http://www.cheapshark.com/isOnSale> "1"^^<http://www.w3.org/2001/XMLSchema#Boolean> ;
    <http://www.cheapshark.com/lastChange> "2023-06-28T21:31:20.000Z"^^<http://www.w3.org/2001/XMLSchema#DateTime> ;
    <http://www.cheapshark.com/metacriticLink> "/game/pc/one-piece-burning-blood---gold-edition" ;
    <http://www.cheapshark.com/normalPrice> "74.98"^^<http://www.w3.org/2001/XMLSchema#Double> ;
    <http://www.cheapshark.com/releaseDate> "2016-09-01T00:00:00.000Z"^^<http://www.w3.org/2001/XMLSchema#DateTime> ;
    <http://www.cheapshark.com/salePrice> "6.45"^^<http://www.w3.org/2001/XMLSchema#Double> ;
    <http://www.cheapshark.com/thumb> <https://gamersgatep.imgix.net/a/3/4/026d064cc7e1fb721f497398a3435dfcfbe0c43a.jpg?auto=&w=> ;
    <http://www.cheapshark.com/title> "ONE PIECE BURNING BLOOD GOLD EDITION" ;
    <https://w3id.org/ldes#timestampPath> "2023-06-28T21:31:20.000Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
    <https://w3id.org/ldes#versionOfPath> <http://www.cheapshark.com/gamedeals/157072> .