Metadata Search

Provides information about the metadata search API of SIDRE.

Endpoint Overview

You can query the data from the search index on-the-fly and use it directly in your application, as well as download the data in bulk for further use via PIT.

The metadata of the resources of any SIDRE instance are stored in an Elasticsearch index. The name of the metadata index of the specific SIDRE instance is shown via the application info. The Elasticsearch API is open and available to everyone (readonly). Search queries should always address the metadata index, otherwise conflicts with other indexes may occur. For information on using the Elasticsearch API, see

Metadata search endpoint: /resources/api/search/

Search a fixed data set with PIT (point in time)

The most stable access to query multiple data-result-sets is with Elasticsearchs PIT (point in time). See https://www.elastic.co/guide/en/elasticsearch/reference/current/point-in-time-api.html

The idea is to initiate a view of the data at a fixed point in time and then query this data in chunks using the same point in time. Data changes between searches have no effect on the pit-search. This is useful for bulk downloads of data in which the data is downloaded continuously by a process without a break between the requests. For other use cases, use _search directly without PIT.

API description

Build your Elasticsearch Query, submit it to the SIDRE metadata search endpoint and use the results directly.

POST /resources/api/search/{metadataIndexName}/_search
Parameters
name type data type description
metadataIndexName (path) required string Name of the metadata index - see applicationInfo
Request body

(required, application/json) Elasticsearch search request body including query. See here

Responses
http code content-type response
200 application/json metadata search result
Example cURL
curl -X 'POST' 'https://oersi.org/resources/api/search/oer_data/_search?pretty' \
  -H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
  -d '{"size":20,"from":0,"query": { "multi_match": { "query": "Klimawandel", "fields": ["name", "description", "keywords"]}},"sort": [{"id":"asc"}]}'

_pit (point in time)

Create and delete Elasticsearchs PIT (point in time).

POST /resources/api/search/{metadataIndexName}/_pit
Parameters
name type data type description
metadataIndexName (path) required string Name of the metadata index - see applicationInfo
keep_alive (query) required string How long the time to live of the point in time should be. The value (e.g. 1m, see Time units) does not need to be long enough to process all data - it just needs to be long enough for the next request.
Responses
http code content-type response
200 application/json the created point in time
Example cURL
curl -X 'POST' 'https://oersi.org/resources/api/search/oer_data/_pit?keep_alive=1m&pretty' \
  -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)'
DELETE /resources/api/search/_pit
Request body

(required, application/json)

name type data type description
id (field) required string the id of the pit to delete
Responses
http code content-type response
200 application/json pit delete result
Example cURL
curl -X DELETE https://oersi.org/resources/api/search/_pit \
  -H 'Content-Type: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
  -d '{"id":"<YOUR_PIT_ID>"}'

_search with point in time

Use a point in time to query the data in chunks.

Combine with Elasticsearch Query as described above for _search.

Notes:

  • You do not need to specify the metadata index name in the parameters, as the point in time is already bound to the metadata index.
  • Create a pit first and use the pit id in the search request.
  • Delete the pit after you are done with the search.
  • Repeat the ongoing search until no more hits are found.
POST /resources/api/search/_search
Request body

(required, application/json) Elasticsearch search request body including query. See here

required fields:

name type data type description
pit (field) required object the information about the pit
pit.id (field) required string the id of the pit
pit.keep_alive (field) required string How long the next time to live of the point in time should be (e.g. 1m, see Time units).
sort (field) required array the sorting of the results (e.g. [{"id":"asc"}], see Sort search results
search_after (field) not required for the initial search, but required for ongoing searches array the starting point for ongoing searches. this is the last “sort” entry of the last hit of the last result list
Responses
http code content-type response
200 application/json metadata search result

The response contains the hits and the “sort” entry of the last hit. This entry is needed for the next search. It looks like

        "sort" : [
          "https://resource.identifier",
          35297
        ]
      }
    ]
  }
}
Example cURL
curl -X 'POST' 'https://oersi.org/resources/api/search/_search?pretty' \
-H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"size":1000,"query": {"match": {"mainEntityOfPage.provider.name": "twillo"}},"pit": {"id": "<YOUR_PIT_ID>", "keep_alive": "1m"}, "sort": [{"id":"asc"}], "track_total_hits": true}'
curl -X 'POST' 'https://oersi.org/resources/api/search/_search?pretty' \
-H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"size":1000,"query": {"match": {"mainEntityOfPage.provider.name": "twillo"}},"pit": {"id": "<YOUR_PIT_ID>", "keep_alive": "1m"}, "sort": [{"id":"asc"}], "track_total_hits": true, "search_after": <YOUR_LAST_SORT_RESULT>}'