Metadata Search
Categories:
Endpoint Overview
You can query the data from the search index on-the-fly and use it directly in your application, as well as download the data in bulk for further use via PIT.
The metadata of the resources of any SIDRE instance are stored in an Elasticsearch index. The name of the metadata index of the specific SIDRE instance is shown via the application info. The Elasticsearch API is open and available to everyone (readonly). Search queries should always address the metadata index, otherwise conflicts with other indexes may occur. For information on using the Elasticsearch API, see
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
Metadata search endpoint: /resources/api/search/
Search a fixed data set with PIT (point in time)
The most stable access to query multiple data-result-sets is with Elasticsearchs PIT (point in time). See https://www.elastic.co/guide/en/elasticsearch/reference/current/point-in-time-api.html
The idea is to initiate a view of the data at a fixed point in time and then query this data in chunks using the same point in time. Data changes between searches have no effect on the pit-search. This is useful for bulk downloads of data in which the data is downloaded continuously by a process without a break between the requests. For other use cases, use _search
directly without PIT.
API description
_search
Build your Elasticsearch Query, submit it to the SIDRE metadata search endpoint and use the results directly.
POST
/resources/api/search/{metadataIndexName}/_search
Parameters
name | type | data type | description |
---|---|---|---|
metadataIndexName |
(path) required | string | Name of the metadata index - see applicationInfo |
Request body
(required, application/json) Elasticsearch search request body including query. See here
Responses
http code | content-type | response |
---|---|---|
200 |
application/json |
metadata search result |
Example cURL
curl -X 'POST' 'https://oersi.org/resources/api/search/oer_data/_search?pretty' \
-H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"size":20,"from":0,"query": { "multi_match": { "query": "Klimawandel", "fields": ["name", "description", "keywords"]}},"sort": [{"id":"asc"}]}'
_pit (point in time)
Create and delete Elasticsearchs PIT (point in time).
POST
/resources/api/search/{metadataIndexName}/_pit
Parameters
name | type | data type | description |
---|---|---|---|
metadataIndexName |
(path) required | string | Name of the metadata index - see applicationInfo |
keep_alive |
(query) required | string | How long the time to live of the point in time should be. The value (e.g. 1m, see Time units) does not need to be long enough to process all data - it just needs to be long enough for the next request. |
Responses
http code | content-type | response |
---|---|---|
200 |
application/json |
the created point in time |
Example cURL
curl -X 'POST' 'https://oersi.org/resources/api/search/oer_data/_pit?keep_alive=1m&pretty' \
-H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)'
DELETE
/resources/api/search/_pit
Request body
(required, application/json)
name | type | data type | description |
---|---|---|---|
id |
(field) required | string | the id of the pit to delete |
Responses
http code | content-type | response |
---|---|---|
200 |
application/json |
pit delete result |
Example cURL
curl -X DELETE https://oersi.org/resources/api/search/_pit \
-H 'Content-Type: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"id":"<YOUR_PIT_ID>"}'
_search with point in time
Use a point in time to query the data in chunks.
Combine with Elasticsearch Query as described above for _search
.
Notes:
- You do not need to specify the metadata index name in the parameters, as the point in time is already bound to the metadata index.
- Create a pit first and use the pit id in the search request.
- Delete the pit after you are done with the search.
- Repeat the ongoing search until no more hits are found.
POST
/resources/api/search/_search
Request body
(required, application/json) Elasticsearch search request body including query. See here
required fields:
name | type | data type | description |
---|---|---|---|
pit |
(field) required | object | the information about the pit |
pit.id |
(field) required | string | the id of the pit |
pit.keep_alive |
(field) required | string | How long the next time to live of the point in time should be (e.g. 1m, see Time units). |
sort |
(field) required | array | the sorting of the results (e.g. [{"id":"asc"}] , see Sort search results |
search_after |
(field) not required for the initial search, but required for ongoing searches | array | the starting point for ongoing searches. this is the last “sort” entry of the last hit of the last result list |
Responses
http code | content-type | response |
---|---|---|
200 |
application/json |
metadata search result |
The response contains the hits and the “sort” entry of the last hit. This entry is needed for the next search. It looks like
"sort" : [
"https://resource.identifier",
35297
]
}
]
}
}
Example cURL
curl -X 'POST' 'https://oersi.org/resources/api/search/_search?pretty' \
-H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"size":1000,"query": {"match": {"mainEntityOfPage.provider.name": "twillo"}},"pit": {"id": "<YOUR_PIT_ID>", "keep_alive": "1m"}, "sort": [{"id":"asc"}], "track_total_hits": true}'
curl -X 'POST' 'https://oersi.org/resources/api/search/_search?pretty' \
-H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"size":1000,"query": {"match": {"mainEntityOfPage.provider.name": "twillo"}},"pit": {"id": "<YOUR_PIT_ID>", "keep_alive": "1m"}, "sort": [{"id":"asc"}], "track_total_hits": true, "search_after": <YOUR_LAST_SORT_RESULT>}'