Metadata Search

Provides information about the metadata search API of SIDRE.

Endpoint Overview

You can query the data from the search index on-the-fly and use it directly in your application, as well as download the data in bulk for further use via PIT.

The metadata of the resources of any SIDRE instance are stored in an Elasticsearch index. The name of the metadata index of the specific SIDRE instance is shown via the application info. The Elasticsearch API is open and available to everyone (readonly). Search queries should always address the metadata index, otherwise conflicts with other indexes may occur. For information on using the Elasticsearch API, see

Metadata search endpoint: /api/search/

Search a fixed data set with PIT (point in time)

The most stable access to query multiple data-result-sets is with Elasticsearchs PIT (point in time). See https://www.elastic.co/guide/en/elasticsearch/reference/current/point-in-time-api.html

The idea is to initiate a view of the data at a fixed point in time and then query this data in chunks using the same point in time. Data changes between searches have no effect on the pit-search. This is useful for bulk downloads of data in which the data is downloaded continuously by a process without a break between the requests. For other use cases, use _search directly without PIT.

API description

_search

Build your Elasticsearch Query, submit it to the SIDRE metadata search endpoint and use the results directly.

POST /api/search/{metadataIndexName}/_search

Parameters

name	type	data type	description
`metadataIndexName`	(path) required	string	Name of the metadata index - see applicationInfo

Request body

(required, application/json) Elasticsearch search request body including query. See here

Responses

http code	content-type	response
`200`	`application/json`	metadata search result

Example cURL

curl -X 'POST' 'https://oersi.org/api/search/oer_data/_search?pretty' \
  -H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
  -d '{"size":20,"from":0,"query": { "multi_match": { "query": "Klimawandel", "fields": ["name", "description", "keywords"]}},"sort": [{"id":"asc"}]}'

_pit (point in time)

Create and delete Elasticsearchs PIT (point in time).

POST /api/search/{metadataIndexName}/_pit

Parameters

name	type	data type	description
`metadataIndexName`	(path) required	string	Name of the metadata index - see applicationInfo
`keep_alive`	(query) required	string	How long the time to live of the point in time should be. The value (e.g. 1m, see Time units) does not need to be long enough to process all data - it just needs to be long enough for the next request.

Responses

http code	content-type	response
`200`	`application/json`	the created point in time

Example cURL

curl -X 'POST' 'https://oersi.org/api/search/oer_data/_pit?keep_alive=1m&pretty' \
  -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)'

DELETE /api/search/_pit

Request body

(required, application/json)

name	type	data type	description
`id`	(field) required	string	the id of the pit to delete

Responses

http code	content-type	response
`200`	`application/json`	pit delete result

Example cURL

curl -X DELETE https://oersi.org/api/search/_pit \
  -H 'Content-Type: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
  -d '{"id":"<YOUR_PIT_ID>"}'

_search with point in time

Use a point in time to query the data in chunks.

Combine with Elasticsearch Query as described above for _search.

Notes:

You do not need to specify the metadata index name in the parameters, as the point in time is already bound to the metadata index.
Create a pit first and use the pit id in the search request.
Delete the pit after you are done with the search.
Repeat the ongoing search until no more hits are found.

POST /api/search/_search

Request body

(required, application/json) Elasticsearch search request body including query. See here

required fields:

name	type	data type	description
`pit`	(field) required	object	the information about the pit
`pit.id`	(field) required	string	the id of the pit
`pit.keep_alive`	(field) required	string	How long the next time to live of the point in time should be (e.g. 1m, see Time units).
`sort`	(field) required	array	the sorting of the results (e.g. `[{"id":"asc"}]`, see Sort search results
`search_after`	(field) not required for the initial search, but required for ongoing searches	array	the starting point for ongoing searches. this is the last “sort” entry of the last hit of the last result list

Responses

http code	content-type	response
`200`	`application/json`	metadata search result

The response contains the hits and the “sort” entry of the last hit. This entry is needed for the next search. It looks like

        "sort" : [
          "https://resource.identifier",
          35297
        ]
      }
    ]
  }
}

Example cURL

curl -X 'POST' 'https://oersi.org/api/search/_search?pretty' \
-H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"size":1000,"query": {"match": {"mainEntityOfPage.provider.name": "twillo"}},"pit": {"id": "<YOUR_PIT_ID>", "keep_alive": "1m"}, "sort": [{"id":"asc"}], "track_total_hits": true}'

curl -X 'POST' 'https://oersi.org/api/search/_search?pretty' \
-H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"size":1000,"query": {"match": {"mainEntityOfPage.provider.name": "twillo"}},"pit": {"id": "<YOUR_PIT_ID>", "keep_alive": "1m"}, "sort": [{"id":"asc"}], "track_total_hits": true, "search_after": <YOUR_LAST_SORT_RESULT>}'

Metadata Search

Tags:

Categories:

Endpoint Overview

Search a fixed data set with PIT (point in time)

API description

_search

Parameters

Request body

Responses

Example cURL

_pit (point in time)

Parameters

Responses

Example cURL

Request body

Responses

Example cURL

_search with point in time

Request body

Responses

Example cURL