Architecture and Components

Concept and architecture of the SIDRE components
graph TD
    subgraph SIDRE
        SIDRE_INDEX
    end
    SIDRE_INDEX[**Central Index**] --> repo1[**Repo 1**
    Edu-Sharing abc]
    SIDRE_INDEX --> repo2[**Repo 2**
    DSpace xyz]
    SIDRE_INDEX --> repo3[**Repo 3**
    sitemap]
    SIDRE_INDEX --> repo4[**Repo 4**
    OAI-PMH]
    SIDRE_INDEX --> repo5[**Repo 5**
    Json-API]
    SIDRE_INDEX --> repo6[...]

Central Index

The metadata of the individual repositories is collected in a central index and made searchable. A process regularly (e.g. daily) harvests the metadata of the repositories, or generally data sources that offer metadata, and indexes them.

Components

  • ETL: The ETL-module (extract, transform, load) connects to the individual repositories and fetches metadata updates according to a configured schedule (e.g. daily).
  • Import Scripts: Another channel to connect individual metadata repositories and import metadata according to a configured schedule (e.g. daily). Contains scripted imports (e.g. Python)
  • API / Backend: Provides interfaces to retrieve data from the index (external) and to import / update data into the index (internal). A read-only user is used to retrieve data. When creating / updating the data, the data is written into an elasticsearch index.
  • Frontend: Website that displays the data from the index and offers the possibility to search
graph LR
    Repo1[**Repo1**
    OAI-PMH] <--> ETL_OAI-Client
    Repo2[**Repo2**
    Sitemap] <--> ETL_Sitemap-Client
    Repo2[**Repo2**
    Sitemap] <--> Import_Scripts_Sitemap-Client
    Repo3[**Repo3**
    Individual] <--> Import_Scripts_individual-Client
    subgraph ETL
        ETL_OAI-Client[OAI Client] <--> ETL_Scheduler[Scheduler]
        ETL_Sitemap-Client[Sitemap Client] <--> ETL_Scheduler[Scheduler]
        ETL_Source-Config[Source Config]
    end
    subgraph Import_Scripts[Import Scripts]
        Import_Scripts_individual-Client[individual Client] <--> Import_Scripts_Scheduler[Scheduler]
        Import_Scripts_Sitemap-Client[Sitemap Client] <--> Import_Scripts_Scheduler[Scheduler]
        Import_Scripts_Source-Config[Source Config]
    end
    ETL --> Backend_Write_Endpoint
    Import_Scripts --> Backend_Write_Endpoint
    subgraph Backend[Backend]
        Backend_Write_Endpoint[Create
        Update
        Delete] --> Backend_Service[Service]
        Backend_Service[Service] <--> Backend_Index[(Elasticsearch)]
        Backend_Service[Service] --> Backend_Read_Endpoint[Query Service]
    end
    subgraph Frontend
        Search
        Filter
        Results
    end
    Frontend --> Backend_Read_Endpoint