LDIO LDES Client
The LDES Client is a component which can be used by data consumers to replicate and synchronize an LDES.
When replication or synchronization is halted, the LDES Client is able to resume where it has stopped. More information on how consumption of an LDES works can be found here.
Processing fragments
One or more URLs need to be configured in the LDES Client. If more URLs are configured, they need to be part of the same LDES.
The configured fragments (URLs) will be processed and all relations will be added to the (non-persisted) queue.
As long as the LDES Client runs, new fragments that need to be processed can be added to the queue. The LDES Client will keep track of the mutable and immutable fragments it did already process. When an immutable fragment that already has been processed is added to the queue, it will be ignored.
Mutable fragments usually have a max-age
set in the Cache-control header. If this isn’t the case, a default expiration interval will be used to set an expiration date on the fragment. When the max-age
or default expiration interval of a fragment expires, the fragment will be put into the queue again so that the LDES Client fetches it again.
Processing members within fragments
The LDES Client keeps track of the processed members of mutable fragments, to make sure members are only processed once within a fragment. When the fragment is marked as immutable, and no members can be added anymore, the LDES Client will stop keeping track of members processed within that fragment.
Members within a fragment can be processed in order of time based on a timestamp. The path to this timestamp needs to be configured.
If the patch is missing, members will be processed in random order.
Filtering
Exactly-once-filter
To have the possibility to filter out already received members, the “exactly-once-filter” can be enabled in configuration. The filter will check whether a member was already processed in other fragments.
The IDs of all processed members will be remembered by the filter and when a duplicate member is processed, it will be filtered out before sending it to the output of the Client.
Latest-state-filter
When version materialisation is enabled, state objects that does not represent the latest state can be filter out by enabling the “latest-state-filter” in the configuration.
Both the versionOf and the timestamp of the version object members will be remembered by the filter. When a new member with the same versionOf and a timestamp that is before or equal to the latest remembered timestamp, the member will be filtered out. When a member is processed that has a later timestamp than the last remembered member, that last remembered member will be overwritten and the new member will be processed.
Flow of the Latest State Filter
flowchart LR
;
CLIENT[LDIO LDES Client] --> Version_Object((Version\n object));
Version_Object --> Latest_State_Filter(Latest State\nFilter);
Latest_State_Filter --> Filtering{Newer then\n last processed\n member?};
Filtering -->|Yes| Version_Materialiser(Version\nMaterialiser);
Version_Materialiser --> State_Object((State\n object));
State_Object --> Sender[Ldio Sender];
Filtering ---->|No| Ignore[Ignore member];
This Latest State Filter is only available for the version materialiser within the LDIO LDES Client, not for the transformer component
Persistence strategies
The Client offers different ways to persist state of the processed members:
Strategy | Description | Advantages | Disadvantages |
---|---|---|---|
Memory | Store the state of members in the memory of the LDES Client |
|
|
SQLite | A SQLite database is used to store state of members |
|
|
PostgreSQL | A PostgreSQL database is used to store state of the members |
|
|
** We use a transaction for every processed record and SQLite is limited by the CPU (source).
Config
General properties
Property | Description | Required | Default | Example | Supported values |
---|---|---|---|---|---|
urls | List of URLs of the LDES data sources | Yes | N/A | http://localhost:8080/my-ldes | HTTP and HTTPS URLs |
source-format | The ‘Content-Type’ that should be requested to the server | No | text/turtle | application/n-quads | Any type supported by Apache Jena |
state | ‘memory’, ‘sqlite’ or ‘postgres’ to indicate how the state should be persisted | No | memory | sqlite | ‘memory’, ‘sqlite’ or ‘postgres’ |
keep-state | Indicates if the state should be persisted on shutdown (n/a for in memory states) | No | false | false | true or false |
timestamp-path | The property-path used to determine the timestamp on which the members will be ordered, and used for the latest-state-filter when enabled | No | N/A | http://www.w3.org/ns/prov#generatedAtTime | A property path |
enable-exactly-once | Indicates whether a member must be sent exactly once or at least once | No | true | true | true or false |
The default source-format
is text/turtle
, as this RDF format supports relative URIs. However, if relative URIs are not used, application/n-quads
or even the binary format application/rdf+protobuf
are better options, as these formats are faster to parse.
Setting the keep-state property to true
makes it so that the state can not be deleted through the pipeline-management api
Version materialisation properties
Property | Description | Required | Default | Example | Supported values |
---|---|---|---|---|---|
materialisation.enabled | Indicates if the client should return state-objects (true) or version-objects (false) | No | false | true | true or false |
materialisation.version-of-property | Property that points to the versionOfPath | No | http://purl.org/dc/terms/isVersionOf | http://purl.org/dc/terms/isVersionOf | true or false |
materialisation.enable-latest-state | Indicates whether all state or only the latest state must be sent | No | true | false | true or false |
Don’t forgot to provide a timestamp-path in the general properties, as this property is not required, but necessary for this filter to work properly!
LDIO Http Requester properties
Authentication properties
Property | Description | Required | Default | Supported values | Example |
---|---|---|---|---|---|
auth.type | The type of authentication required by the LDES server | No | NO_AUTH | NO_AUTH, API_KEY or OAUTH2_CLIENT_CREDENTIALS | OAUTH2_CLIENT_CREDENTIALS |
auth.api-key | The api key when using auth.type ‘API_KEY’ | No | N/A | String | myKey |
auth.api-key-header | The header for the api key when using auth.type ‘API_KEY’ | No | X-API-KEY | String | X-API-KEY |
auth.client-id | The client identifier when using auth.type ‘OAUTH2_CLIENT_CREDENTIALS’ | No | N/A | String | myId |
auth.client-secret | The client secret when using auth.type ‘OAUTH2_CLIENT_CREDENTIALS’ | No | N/A | String | mySecret |
auth.token-endpoint | The token endpoint when using auth.type ‘OAUTH2_CLIENT_CREDENTIALS’ | No | N/A | HTTP and HTTPS urls | http://localhost:8000/token |
auth.scope | The Oauth2 scope when using auth.type ‘OAUTH2_CLIENT_CREDENTIALS’ | No | N/A | HTTP and HTTPS urls | http://localhost:8000/token |
Retry properties
Property | Description | Required | Default | Supported values | Example |
---|---|---|---|---|---|
retries.enabled | Indicates if the http client should retry http requests when the server cannot be reached. | No | true | Boolean value | true |
retries.max | Max number of retries the http client should do when retries.enabled = true | No | 5 | Integer | 100 |
retries.statuses-to-retry | Custom comma seperated list of http status codes that can trigger a retry in the http client. | No | N/A | Comma seperated list of Integers | 410,451 |
When retries are enabled, the following statuses are always retried, regardless of the configured statuses-to-retry:
- 429
- 5xx (500 and above)
Rate limit properties
Property | Description | Required | Default | Supported values | Example |
---|---|---|---|---|---|
rate-limit.enabled | Indicates if the http client should limit http requests when calling the server. | No | false | true or false | false |
rate-limit.limit | Limit of requests per period, which is defined below, that the http client should do when rate-limit.enabled = true | No | 500 | Integer | 100 |
rate-limit.period | Period in which the limit of requests, which is defined above, can be reached by the http client when rate-limit.enabled = true | No | PT1M | ISO 8601 Duration | PT1H |
Http headers
Property | Description | Required | Default | Supported values | Example |
---|---|---|---|---|---|
http.headers.[].key/value | A list of custom http headers can be added. A key and value has to be provided for every header. | No | N/A | String | role |
Example Http Requester config
config:
http:
headers:
- key: role
value: developer
- key: alt-role
value: programmer
auth:
type: API_KEY
api-key: my-secret
api-key-header: x-api-key
retries:
enabled: true
max: 10
statuses-to-retry: 410,451
rate-limit:
enabled: true
period: P1D
limit: 1000
SQLite properties
Property | Description | Required | Default | Example | Supported values |
---|---|---|---|---|---|
sqlite.directory | Directory wherein the .db file can be saved | No | N/A | /ldio/sqlite | String |
Postgres properties
Property | Description | Required | Default | Example | Supported values |
---|---|---|---|---|---|
postgres.url | JDBC URL of the Postgres database | No | N/A | jdbc:postgresql://test.postgres.database.azure.com:5432/sample | String |
postgres.username | Username used to connect to Postgres database | No | N/A | myUsername@test | String |
postgres.password | Password used to connect to Postgres database | No | N/A | myPassword | String |
Configuration Examples
input:
name: Ldio:LdesClient
config:
urls:
- http://localhost:8080/my-ldes
sourceFormat: text/turtle
materialisation:
enabled: true
retries:
enabled: true
auth:
type: OAUTH2_CLIENT_CREDENTIALS
client-id: clientId
client-secret: secret
token-endpoint: http://localhost:8000/token
input:
name: Ldio:LdesClient
config:
urls:
- http://localhost:8080/my-ldes
sourceFormat: text/turtle
retries:
enabled: true
state: postgres
postgres:
url: jdbc:postgresql://test.postgres.database.azure.com:5432/sample
username: myUsername@test
password: myPassword
Pausing the LDES Client
- When paused, the LDES Client will stop processing the current fragment and will not request new fragments from the server.
- When resumed, the LDES Client will continue processing the fragment where it has stopped and it will request new fragments form the server.