SDC Newsletter (01/2022)

Welcome to our first newsletter of the year.  

Hopefully, you will find this newsletter helpful and informative.

As always, feedback is welcome and helps us to deliver a better product.



Editorial


@Petri Kehusmaa

SDC Project launched on 15.11.2021

We started our 5 years journey in latter half of 2020 and in mid November 2021 we released our SDC v1.0 which means we are now in service delivery mode.

But the work is not done, not even close. With our first year we focused on getting the science requirements for SDC and designing and building an infrastructure best matching those requirements. The focus was to select such building blocks which will allow our services and infrastructure to be flexible and scalable. We also implemented lightweight service management functionality to allow our users to request support or new features. In this newsletter you can see latest status of some of our existing features and some that are still to come.

Next step is to prepare our next release in June 2022. There we hope to introduce RUCIO as a key infrastructure component of SDC. RUCIO will be in key role of data transportation and integrity.

To remind you about our Mission and Goals here they are again.

SDC Mission

The basis of the SDC will be a team of solar data experts, software and hardware engineers embedded in the research activities at KIS. The SDC will initially focus on the calibration, curation, archiving and dissemination of data from the largest European solar telescope GREGOR on Tenerife and from the world's largest Daniel K. Inouye Solar Telescope (DKIST) of the National Solar Observatory (US), to which the KIS contributes significantly with the Visible Tunable Filter (VTF) instrument.

The SDC research will be focused on the development of sophisticated methods and tools for high-level data products including AI and machine learning (ML) techniques to accelerate new science for solar physics. SDC will also introduce user support and management processes to achieve high user satisfaction and tool utilization.

Also in the future SDC welcomes all solar observatories to join this innovative and evolving platform.

SDC Goals

  • Provide innovative data analytics platform for solar community

  • Provide better user support to increase user satisfaction

  • Provide user feedback channel to listen solar community needs

  • Increase asset utilization by automation and monitoring

  • Increase user base to reduce cost per transaction

  • Implement service management best practices to reduce service breaks

 

Thank you for the project team for your dedication and hard work to get this done. While we are just in the beginning I truly believe that we can build a platform which will be very successful in serving solar scientists.

 


Policies, Frameworks & Governance


  • ITIL v4 process model going to be partially adopted for service management purposes

  • Data policies definition started

  • SDC governance model and scope to be decided

 


Architecture & Technical Infrastructure


SDC setup, state of the transition towards Rucio

@Peter Caligari

In late 2021 we got all hardware for SDC to make the switch to Rucio. It consists of 

  • 3 infrastructure servers 

  • 9 dCache storage servers tier 1 (near-line) with a total capacity of 3 PB (Brutto)

  • 1 dCache storage server tier 2 (long-term storage) with a capacity of 1 PB (Brutto)

  • 4 Compute nodes of the same type as patty and legs (DELL R7525, 2x AMD 7742 á 64 cores, 1 TB Ram)

The 3 infrastructure servers and 3 dCache tier 1 servers are already deployed. The remaining hosts will be integrated into the cluster once the switch to Rucio is successfully done on the installed hosts. 

SDC’s Kubernetes Cluster

Except for the 4 compute servers, all hosts run the vSphere Hypervisor Edition, the free version of VMware ESXi. Any services are deployed in VMs, exchanging a much-reduced risk on updates due to the ability to do snapshots for around 10% performance loss.  

Rucio basically runs inside a Kubernetes (k8s) cluster. The infrastructure servers built the core of this (high-available) k8s cluster, with 2 VM hosts on each server (one k8s control plane node and one k8s-worker, each). High availability in this context means, that the cluster runs several instances of the control panel (through which it is controlled) and the critical etcd daemon.

Access to the control plane is provided via a REST interface behind a (redundant) haproxy distributing the command requests to one of the underlying control planes. We use a Stacked topology where each control plane node also runs one instance of k8s' etcd as a pod (a kind of virtual machine with a certain service/process within the cluster), and with haproxy (controlled by keepalive) as static pods. The following picture shows this neatly self-contained setup.

 

Rucio setup

Rucio consists of many different components playing together. The most crucial one, however, is a database storing Rucio's internal state and references to all data (and metadata) within. Just like the k8s cluster itself, this database can be set up totally within k8s in a redundant way. For this, we chose Kubegres, with one read-write, and two read-only database pods on the k8s workers. Kubegres provides automatic failover and restarts if one of the pods fails. The actual database files on the workers reside on https://kubernetes.io/docs/concepts/storage/persistent-volumes/ (pvc) mapping to a local directory in the underlying VM. See the picture below.

All other components like the actual Rucio server (with the REST interface) and Rucio’s authentication servers run in a high availability setup, again. Each server has an associated k8s-service with a fixed Cluster-IP within the cluster. It acts as a proxy within the cluster to the application running in its associated pods. It’s the services that talk to each other and not the individual pods on the workers. While pods may fail and cease to exist, services do not as long as there is at least one pod with the respective application up and running.

From outside the cluster (KIS, other SDC hosts, or hosts on the Internet) the service for the Rucio server and auth-server can be accessed via an (ingress) proxy, mapping both services to the worldwide addressable host rucio.sdc.leibniz-kis.de.

dCache Setup

@Marco Günter

dCache forms the storage layer below Rucio. In contrast to Rucio it runs on bare metal instead of inside a k8s cluster. However, like above, we (SDC) chose to run dCache nodes on VMware, instead. Again the central component is a https://www.postgresql.org database, storing metadata and pointers to the actual files on the filesystems of the participating nodes. All nodes sharing the same database belong to the same dCache cluster.

Redundancy between files on a cluster is defined in dCache and transparent (not visible) to Rucio. Tier 1 will eventually consist of 9 dCache nodes with approximately 1 TB (Brutto) each. Any file will physically reside on at least 2 nodes. The redundancy will remain as long as the file is stored on that tier, and Rucio, one level up, is not aware of this redundancy. Redundancy between tiers (different dCache clusters) is managed via (lifetime and location-aware) Rucio rules.

Another critical component for the operation of dCache is https://zookeeper.apache.org used by dCache as a distributed directory and coordination service for the nodes of a cluster.

Redundancy of data

The following picture shows the structure and setup of SDC we will reach shortly (mid 2022). The coloured dots represent files. Tier 1 at the KIS will store 2 copies of each file on two physically different dCache nodes. This replication is handled solely by dCache and transparent to Rucio.

Moreover, the third copy of each file is stored on tier 2 (with a possibly longer lifetime than data on tier 1). This redundant is managed by Rucio (there are different nodes and lifetimes involved).

Files in dCache reside on ZFS. Each dCache node has 21 16 TB disks, arranged in 2 RAID1Z á 10 disks plus one global spare, and all metadata is lying on an SSD mirror (2 800 GB SSDs).

Therefore, each pool may suffer from the loss of 2 disks (one in each raidz1). That loss would automatically be compensated by the spare disks. All data on ZFS is checksummed and self-repairing (if enough redundancy within the node is still available).

 


SDC Products & Tools


SDC GRIS VFISV-Inversion pipeline

@Vigeesh Gangadharan

Several tests were carried out as part of assessing the quality of the VFISV (Borrero et al. 2011, SoPhy 273, 267) inversions provided by SDC. This includes comparing the results from VFISV inversions with inversions obtained with SIR (Ruiz Cobo & del Toro Iniesta, 1992, ApJ 398, 375) code. The weights used in the pipeline code are computed for each observation based on the noise level. A comparison of different weights were performed. We plan to implement fixed weights or an option to provide weights, for a better reproducibility.

SDC data archive

The URL for the archived has changed! It is now available at

Access data from GRIS/GREGOR and LARS/VTT instruments and the ChroTel full-disc telescope at OT.

Updates as of January 2022

  • IFU Integration: We are currently preparing IFU data for publication through the archive. The processing and preparation is largely done, we will release the data as soon as the corresponding frontend changes are ready. Look out for the announcement.

  • A prototype of the python API as an interface to the archive has been implemented by our partners. Find out more here.

  • GRIS Reprocessing of 2020 data: Due to improvements in the reduction pipeline, we’ve reprocessed all data from 2020. It is available through the archive, check the HISTORY keyword in the FITS headers for the reduction date. All data processed after October 2021 uses the new routines. For more information check the SDC website.

  • GRIS 2021 data: We are currently processing all data from 2021 and will be releasing it shortly through the archive. Most of it will not be immediately accessible due to the default 1-year embargo. But all datasets will become visible as soon as the embargo expires. Keep an eye out for this data.

  • As always, we’re currently collecting ideas for changes to the UI here, we will dedicate some time in the first half from 2022 to rework the UI accordingly.

 

GRISView – A visualisation tool for archived GRIS data

@Taras Yakobchuk

GRISView ver. 0.7.0

New features:

  • Easy Quiet Sun region selection and interactive continuum fitting using Chebyshev polynomials

  • More parameters for visualisation, derived from input Stokes parameters and normalized to continuum

  • IFU datasets support, including IFU time-series

  • Import inversion maps and fitted Stokes profiles as obtained by SDC VFISV pipeline

  • Plot FTS atlas intensity spectrum for comparison

  • Manual wavelength scale offset for better matching with FTS data

  • New view mode to quickly switch between single and multiple plots

  • Compass to show map orientation and direction to the disc center

  • Load/save sessions to save and restore program configuration for a given observation

  • Customizable line styles for spectral plots

  • Export visible map and spectrum plots as images

  • In-program user manual with GUI description and screenshots

Feedback welcome

We strongly encourage all colleagues to try out this new tool and provide feedback. Instructions for installing and using the program can be found on the tool's GitLab page:

https://gitlab.leibniz-kis.de/sdc/gris/grisview

Please report any issues and bugs on the program GitLab page or using the direct link:

https://gitlab.leibniz-kis.de/sdc/gris/grisview/-/issues/new?issue

 


Conferences & Workshops


Participation of SDC in Conferences/Workshops

  • 21th OT Technical Meeting, Jan 24-25 (2022)

  • 2nd PUNCH General Meeting, Jan 18 (2022)

  • NFDI Onboarding Event, Jan 12 (2022)

  • Presentation of SDC to the Data Intensive Radio Astronomy Group, Dec 9 (2021)

  • 3rd SOLARNET Forum for Telescopes and Databases, Nov 15 (2021) (See Talks)

Forthcoming Conferences/Workshops/Grants

  • SOLARNET Simulation Metadata Meeting, Feb 2, 14:00CET (Zoom link tbd)

  • ONLINE BOOTCAMP: N-Ways to GPU Programming
    March 14, 09:00 - Tuesday March 15, 12:30

  • PUNCH Lunch Seminar (see SDC calendar invitation for zoom links)
    Every second Thursday, 12:30-13:30 CET

  • NVIDIA Academic Hardware Grant Program
    Submission window: July 2022

 


SDC Collaborations and Partners


 @Nazaret Bello Gonzalez

SOLARNET https://solarnet-project.eu

KIS coordinates the SOLARNET H2020 Project that brings together European solar research institutions and companies to provide access to the large European solar observatories, supercomputing power and data. KIS SDC is actively participating in WP5 and WP2 in coordinating and developing data curation and archiving tools in collaborations with European colleagues.
Contact on KIS SDC activities in SOLARNET: @Nazaret Bello Gonzalez nbello@leibniz-kis.de

 

 ESCAPE https://projectescape.eu/

KIS is a member of the European Science Cluster of Astronomy & Particle Physics ESFRI Research Infrastructures (ESCAPE H2020, 2019 - 2022) Project aiming to bring together people and services to build the European Open Science Cloud. KIS SDC participates in WP4 and WP5 to bring ground-based solar data into the broader Astronomical VO and the development tools to handle large solar data sets. 

Contact on KIS SDC activities in ESCAPE: @Nazaret Bello Gonzalez nbello@leibniz-kis.de

 

EST https://www.est-east.eu/

KIS is one of the European institutes strongly supporting the European Solar Telescope project. KIS SDC represents the EST data centre development activities in a number of international projects like ESCAPE and the Group of European Data Experts (GEDE-RDA).

Contact on KIS SDC as EST data centre representative: @Nazaret Bello Gonzalez nbello@leibniz-kis.de

 

PUNCH4NFDI https://www.punch4nfdi.de

KIS is a participant (not a member) of the PUNCH4NFDI Consortium. PUNCH4NFDI is the NFDI (National Research Data Infrastructure) consortium of particle, astro-, astroparticle, hadron, and nuclear physics, representing about 9.000 scientists with a Ph.D. in Germany, from universities, the Max Planck Society, the Leibniz Association, and the Helmholtz Association. PUNCH4NFDI is the setup of a federated and "FAIR" science data platform, offering the infrastructures and interfaces necessary for the access to and use of data and computing resources of the involved communities and beyond. PUNCH4NFDI started officially its activities on October 1, 2021. KIS SDC aims to become a full member of PUNCH and federate our efforts on ground-based solar data dissemination to the broad particle and astroparticle communities.

Contact on KIS SDC as PUNCH4NFDI participant: @Nazaret Bello Gonzalez nbello@leibniz-kis.de & @Peter Caligari mailto:cale@leibniz-kis.de 

 

Aperio Software Ltd. (UK) 

SDC is partners with Aperio Software for the development of support services like APIs. SDC and Aperio are also partners in the development of a data calibration pipeline for the VTF (KIS) instrument for DKIST within the SOLARNET H2020 project.


IT news


@Peter Caligari

Ongoing & Future developments

Webpage

KIS The new website is online (at the usual address: www.leibniz-kis.de). Many thanks to all who contributed content before Christmas. However, not all content from the old website has been transferred yet. Please stay on the ball.

The old website can be reached at oldwww.leibniz-kis.de . However, one cannot log in to the backend. If you need to get into the backend by all means, please contact us.

We created a small internal style guide for the KIS. You can find it together with the design drafts of the agency that created the website and the script from the editor training in the cloud at (only internally accessible).

The SDC was finished earlier and was therefore temporarily outsourced. The agency will move SDC back and incorporate it again into the regular tree in the next few days. There may still be short-term interruptions in the next few weeks, as the designing agency will be carrying out further final work. The definitive site tree and the menus are also not yet decided finally.

There is no longer an Intranet, at least for the time being. This simplifies our life with the DSGVO enormously and saves us the annoying cookie dialogues, among other things. The documents that you used to find there are now in the cloud in the "Forms & Templates" folder. This group folder must be explicitly subscribed to on desktops (under Windows or macOS) for existing accounts. However, it should be visible immediately in the browser.

Cloud

Due to ongoing problems when editing documents online (collaboratively in the browser), we moved OnlyOffice from nextcloud to a seperate (virtual) machine. For the time being, the latter is only accesible from the KIS network or via VPN.

Network

Status of the dedicated 10 Gbit line between KIS & OT

KIS OT All equipment on the German side is installed and on the KIS side is up. The line between the University of Freiburg and KIS proved to be of extraordinary quality with extremely low damping, which allowed us to use a very cheap passive CWDM. RedIRIS, made some late time changes to the connection at the residencia at OT and is still missing the interface card towards VTT. This will be installed at the end of January 2022.

We will then try to establish a temporary L2 Link between Freiburg and OT to test conectivity. The final setup requires a renumbering of the VLANs at OT and will be done when we can travel to OT again.

Test of (application) firewalls at KIS

KIS OT Firewall are ordered and will be delivered to KIS in Januray. In the end, we decided for firewalls from Paloalto (PA-440 for OT, and PA-460 for KIS). On both sides we will use a redundant HA setup allowing the failure of one of them without loosing the service. This is especially important for the OT, as there is no IT staff permanently on site.

The 10 Gbit line will not pass through these firewalls, but directly connects the switches at OT with the switches at KIS.

Storage

KIS SDC All hardware for SDC was bought and is currently set up (see above). This amounts to nearly 5 PB (Brutto). However, by the nature of this storage, this is not intended to be a general-purpose read-write near-line storage like mars, because any files put there cannot be deleted again.

The first nodes of mars are now 5 years old (time flies). We are currently looking into an affordable prolongation of the service contract. The first offer by DELL was prohibitively expensive (a one year prolongation was of the order of the initial costs of the whole storage cluster including g maintenance for five years).

SDC The 100 TB space on Microsoft Azure is ready for use. The remaining issues with files not being transferred to the cloud were solved by the manufacturer.

Again: do not copy any sensitive data there. Do not recursively copy whole trees with zillions of small files there, as transferring and accessing such files is extremely inefficient. Use it for large scientific files you do not plan to access during the next month, but don’t want to delete either.

Software

  • We prolonged the maintainance contract for Matlab for another 3 years (same toolboxes as before).

  • We also prolonged our 5 seat floating license of Zemax for 3 years.

 

Current Resources

Compute nodes

hostname

# of CPUs & total cores

ram [GB]

hostname

# of CPUs & total cores

ram [GB]

patty, legs & louie KIS

2 x AMD EPYC 7742, 128 cores

1024

itchy & selma KIS

4 x Xeon(R) CPU E5-4657L v2 @ 2.40GHz, 48 cores

512

scratchy KIS
quake &halo KIS/seismo

hathi OT

4 x Intel(R) Xeon(R) CPU E5-4650L @ 2.60GHz, 32 cores

512

Central storage space

Total available disk space for /home (KIS OT), /dat (KIS OT), /archive (KIS), /instruments (OT)

name

total [TB, brutto]

free [TB, brutto]

name

total [TB, brutto]

free [TB, brutto]

mars KIS

758

39

quake KIS/seismo

61

0

halo KIS/seismo

145

44,5

jane OT

130 (-> 198)

23

 


 References

References

Products & Tools

Collaborations