SDC Newsletter (01/2021)

SDC got funded

@Svetlana Berdyugina

The establishment of the Science Data Center (SDC) as the third strategic pillar of KIS is an important strategic development of the institute because the SDC connects the two other pillars, the fundamental research and observatory operation. Establishing the SDC was identified as one of the strategic goals of KIS yet in 2013, and it was one of the recommendations of the Leibniz Evaluation in 2015. Our funding application to the national Joint Scientific Conference (GWK) for the strategic expansion of KIS dedicated to the SDC was finally approved on November 13, 2020. From 2021, 1.4 M€ per year will be allocated to the SDC. This will lead to a noticeable staff count increase, an innovative computing infrastructure, as well as broadening scientific collaborations in the area of scientific computing, big data, and artificial intelligence. In addition to enhancing scientific activities at KIS, the SDC will become a valuable big-data infrastructure in Germany and beyond.

 

Introduction to the SDC Team

The tasks at the start of SDC are manifold. They range from establishing standards and policies, curating and injecting data from the OT and other sources, setting up the necessary IT infrastructure, to creating a new project plan in record time.

As of January 2021, this challenge is being met by twelve members with overlapping responsibilities and expertise (at least two more engineers are expected to join in the course of 2021).

 

Name (% FTE in SDC)

Responsibility

Name (% FTE in SDC)

Responsibility

Svetlana Berdyugina (10%)
BoardScience

Project PI, internal & external collaborations, high-level data

Janek Beck (100%)
devops

IT specialist trainee, application and pipeline development

Nazaret Bello González (50%)
BoardScienceData

Project Lead Scientist, data calibration, curation & dissemination, high-level data, internal and external scientific collaborations, workshops and training events

Peter Caligari (80%)
BoardITTechnology

Project head, technology development and realization, compute & data infrastructure, network, budget & personnel responsibility, head of IT

Vigeesh Gangadharan (60%)
ScienceDatadevops

high-level data tools development

Andriy Gorobets (50%)
ScienceDatadevops

Data curation & analysis, high-level data development

Marco Günter (20%)
ITTechnologydevops

System IT, compute, infrastructure and data servers, network,

Petri Kehusmaa (100%)
BoardTechnology

Project manager & senior systems architect, project plan development, governance, risk assessment

Markus Knobloch (40%)
ITData

 

devops

OT interfaces and data injection (hardware side), meta-data standards, non-scientific IT, instructor for trainees

Sophie Müller (100%)
devops

IT specialist trainee, application and pipeline development

Carl Schaffer (100%)
DataTechnologydevops

System architect, lead software developer, OT data injection (software side), ML, AI, Web-applications

Taras Yakobchuk (50%)
ScienceDataTechnologydevops

System architect, high-level data & data management

devops We plan to outsource the development of APIs (Application Programmable Interfaces) for access to science ready data (L1 onwards) to Drew Leonard from Aperio, the software development firm that designed and implemented the prototype of the VTF pipeline in Python. API development will begin with Python to be later followed by IDL and maybe other languages.

We are currently shaping a contract and expect Aperio to begin as soon as possible. The development will be split into a design and implementation phase. The first will indeed require input from scientists intending to use SDC to get a product that benefits future uses as much as possible. We expect the first version of these APIs to be ready for the initial launch of SDC at the end of 2021.

@Peter Caligari

 


SDC Project Status 02-2021


@Petri Kehusmaa

SDC Project kick-off was held in October 2020. Since then, we have come a long way. We have had a workshop with scientists to gather requirements for this new service platform. We have also tested many different technology options to be selected as building blocks for the future infrastructure.

Now we have entered the Solution Analysis and Design phase. In this phase will work with different conceptual design components to be selected into our final blueprint and be further implemented later this year.

The project team has also deployed Jira Atlassian for Service Management and Documentation. In the future, Jira Service Management will serve SDC as a “Help Desk” ticketing system and knowledge base for platform users. It will provide automated workflows, support portal, knowledge base, feedback system and many more new services for the community.

Solution Analysis and Design Phase steps.

Summary Feb 4, 2021

Current project health

Current project status

Project constraints

Current project health

Current project status

Project constraints

Green

Solution Analysis and Design in progress.

The project is slightly behind schedule.

Resources.

Technology POCs taking more time than predicted.

Project status

Accomplishments

  • Workshop

  • Technology POCs

  • Requirements validation

  • Jira Atlassian deployment for the project team

Next steps

  • Solution Analysis and Design

  • Define data policies

  • Define embargo rules

 

Risks & project issues

  • Lack of resources

  •  

Definition & Status of the Project Requirements

In addition to political, technical and security requirements, the workshops in late 2021 helped us clarify expectations that scientists from different disciplines have of the SDC. We continuously update a list of all identified demands. The list includes each requirement's current priority, a possibly assigned responsible, and its current status.

https://leibniz-kis.atlassian.net/wiki/spaces/SP/pages/124813373

 


SDC Products & Tools


@Nazaret Bello Gonzalez

SDC data archive

http://sdc.leibniz-kis.de:8080/

Get access to data from GRIS/GREGOR and LARS/VTT instruments and the ChroTel full-disc telescope at OT.

Speckle reconstruction 

https://gitlab.leibniz-kis.de/sdc/speckle-cookbook

This tutorial helps the user run KISIP (Wöger & von der Lühe, 2008) on her favourite BBI and/or HiFI imaging data.
Contact: @Vigeesh Gangadharan (vigeesh@leibniz-kis.de)

Coming soon: 

A Jupyter Notebook to assist the user on VFISV inversions for GRIS data by @Vigeesh Gangadharan , including features like, e.g., wavelength calibration. Stay tuned!

 


Conferences & Workshops


@Nazaret Bello Gonzalez

Forthcoming Conferences/Workshops of Interest 2021

Feb 11

PUNCH4NFDI Open Data Workshop (Registration required!)

Feb 8-12

ESCAPE School:  First Science with interoperable data

The Virtual Observatory (VO) is opening new ways of exploiting the huge amount of data provided by the ever-growing number of ground-based and space facilities, as well as by computer simulations. The goal of the school is twofold:

  • Expose participants to the variety of  VO tools and services available today so that they can use them efficiently for their own research.

  • Gather requirements and feedback from participants

Every second Thursdays, 12:30-13:30 CET

PUNCH Lunch Seminar (see SDC calendar invitation for zoom links)

  • 11 Feb 2021: PUNCH4NFDI and ESCAPE - towards data lakes

  • 25 Feb 2021: PUNCH Curriculum Workshop

April week 12-16 (3 days, TBD)

ESCAPE WP4 Technology Forum 

June 10-11

3th International Workshop on Science Gateways |  IWSG 2021

Topics:

  • Architectures, frameworks and technologies for science gateways

  • Science gateways sustaining productive collaborative communities

  • Support for scalability and data-driven methods in science gatewayS

  • Improving the reproducibility of science in science gateways

  • Science gateway usability, portals, workflows and tools

  • Software engineering approaches for scientific work

  • Aspects of science gateways, such as security and stability

June 28, 2021:

Data-intensive radio astronomy: bringing astrophysics to the exabyte era

Topics: 

  • Data-intensive radio astronomy, current facilities and challenges

  • Data science and the exascale era: technical solutions within astronomy

  • Data science and the exascale era: applications and challenges outside astronomy

 

SDC participation in Conferences & Workshops

Nov. 26, 2020:

2nd SOLAR net Forum Meeting for Telescopes and Databases

Talk:  Big Data Storage -- The KIS SDC case, NBG, PC & PK, 2nd SOLARNET Forum (Nov 26)
@Nazaret Bello Gonzalez@Petri Kehusmaa @Peter Caligari

 


SDC Collaborations


 @Nazaret Bello Gonzalez

SOLARNET https://solarnet-project.eu

KIS coordinates the SOLARNET H2020 Project that brings together European solar research institutions and companies to provide access to the large European solar observatories, supercomputing power and data. KIS SDC is actively participating in WP5 and WP2 in coordinating and developing data curation and archiving tools in collaborations with European colleagues.
Contact on KIS SDC activities in SOLARNET: @Nazaret Bello Gonzalez nbello@leibniz-kis.de

 

 ESCAPE https://projectescape.eu/

KIS is a member of the European Science Cluster of Astronomy & Particle Physics ESFRI Research Infrastructures (ESCAPE H2020, 2019 - 2022) Project aiming to bring together people and services to build the European Open Science Cloud. KIS SDC participates in WP4 and WP5 to bring ground-based solar data into the broader Astronomical VO and the development tools to handle large solar data sets. 

Contact on KIS SDC activities in ESCAPE: @Nazaret Bello Gonzalez nbello@leibniz-kis.de

 

EST https://www.est-east.eu/

KIS is one of the European institutes strongly supporting the European Solar Telescope project. KIS SDC represents the EST data centre development activities in a number of international projects like ESCAPE and the Group of European Data Experts (GEDE-RDA).

Contact on KIS SDC as EST data centre representative: @Nazaret Bello Gonzalez nbello@leibniz-kis.de

 

PUNCH4NFDI https://www.punch4nfdi.de

KIS is a participant (not a member) of the PUNCH4NFDI Consortium. PUNCH4NFDI is the NFDI (National Research Data Infrastructure) consortium of particle, astro-, astroparticle, hadron and nuclear physics, representing about 9.000 scientists with a Ph.D. in Germany, from universities, the Max Planck Society, the Leibniz Association, and the Helmholtz Association. PUNCH4NFDI is the setup of a federated and "FAIR" science data platform, offering the infrastructures and interfaces necessary for the access to and use of data and computing resources of the involved communities and beyond. PUNCH4NFDI is currently competing with other consortia to be funded by the DFG (final response expected in spring 2021). KIS SDC aims to become a full member of PUNCH and federate our efforts on ground-based solar data dissemination to the broad particle and astroparticle communities.

Contact on KIS SDC as PUNCH4NFDI participant: @Nazaret Bello Gonzalez nbello@leibniz-kis.de & @Peter Caligari mailto:cale@leibniz-kis.de 

 


IT news


@Peter Caligari

Ongoing & Future developments

Webpage

KIS Our website is based on a 5-year-old version of Typo3, which poses an increased security risk. Furthermore, the current design only fulfils the most basic means for access by disabled people and a presentation on mobile devices with small displays - both requirements that the Ministry expected us to consider and implement.

Therefore, at the beginning of 2020, KIS-IT intended a relaunch based on a commercially available template for Typo3. However, this had to be stopped due to an increasing workload on IT from SDC and the fact that we were only 2 people in IT last year.

The KIS, therefore, decided to outsource the relaunch. The aforementioned template will still be the basis, but an external web agency will do the design and initial implementation. We have first design drafts and are currently streamlining our site-tree. Feel free to drop by anytime if you are interested in the process.
Costs for the relaunch are of the order of 13 k€.

Network

Dedicated 10 Gbit line between KIS & OT

KIS OT We are currently setting up a dedicated 10 Gbit line between the OT and the KIS (technically, we get a wavelength in a multiplexed dark-fibre, a so-called Lambda).

On Spanish territory, the line is completely free of charge. Nevertheless, the (throughput independent) annual costs for the remaining distance amount to approx. 45 k€ (incl. VAT). Additionally, there are one-time set-up costs of about 30 k€. The costs are

We expect a significantly lower latency on the new line than on the existing internet connection (which will not be affected in any way).
Therefore, we will mainly use it for remote observations and data transport within the framework of the SDC.

The SDC is striving for cooperation with the SCC/KIT in Karlsruhe. By coincidence, the 10 Gbit line enters German territory at the DFN node in Campus North of the KIT. We hope to connect KIT and KIS using the same line without significantly higher costs (despite the high expected load).

New (application) firewalls at KIS & OT

KIS OT Parallelly, we will replace the existing firewalls at both locations (KIS and OT) with modern application firewalls. The current ones are simple packet filters, classifying traffic by port only (all traffic on ports 80 and 443 is web traffic, any web-traffic that is not on 80 on 443 would not be recognized as such). So-called application firewalls do not rely on ports but rather examine all traffic for specific patterns to assign it to a particular application or usage.
Those machines need to be put under maintenance to cope with new threats efficiently. Per site, we expect initial costs of the order of 20-30 k€ for a redundant cluster of two machines, and around 2-3 k€ maintenance costs, each.

Storage

KIS The storage at the KIS is becoming increasingly scarce. In contrast to the storage nodes used at OT, such can no longer be bought for the KIS system.

We are currently scanning all files at KIS to determine the amount of data not accessed for a considerable amount of time. We are planning to invest in a slower tier where such data will automatically be moved. While still visible, access would take longer than for frequently accessed data on the primary tier (however, accessing data on the slower tier will automatically and transparently move it back to the near-line tier). Even this to be introduced slower tier will still not primarily have the character of an archive. Expected costs for such a system are 50-70 k€ (for approx 0,5 PB, incl VAT).

SDC Mainly static files of common interest that are too huge to store on offline media like external disks or tape will go to SDC once operational.

OT We will upgrade the central storage at OT (jane) with two additional nodes á 32 TB to cope with the tight storage situation during observations in 2020 (partly due to mainly running remote observations due to Covid19). Jane will then consist of a total of 6 nodes.

Current Resources

Compute nodes

hostname

# of CPUs & total cores

ram [GB]

hostname

# of CPUs & total cores

ram [GB]

patty KIS
marge & homer KIS (coming soon…)

2 x AMD EPYC 7742, 128 cores

1024

itchy & selma KIS

4 x Xeon(R) CPU E5-4657L v2 @ 2.40GHz, 48 cores

512

scratchy KIS
quake &halo KIS/seismo

hathi OT

4 x Intel(R) Xeon(R) CPU E5-4650L @ 2.60GHz, 32 cores

512

Central storage space

Total available disk space for /home (KIS OT), /dat (KIS OT), /archive (KIS), /instruments (OT)

name

total [TB, brutto]

free [TB, brutto]

name

total [TB, brutto]

free [TB, brutto]

mars KIS

758

39

quake KIS/seismo

61

0

halo KIS/seismo

145

44,5

jane OT

130 (-> 198)

23

 


References

Products & Tools

Forthcoming Conferences/Workshops

Collaborations

Quick links