SDC Newsletter (01/2021)
SDC got funded
@Svetlana Berdyugina
The establishment of the Science Data Center (SDC) as the third strategic pillar of KIS is an important strategic development of the institute because the SDC connects the two other pillars, the fundamental research and observatory operation. Establishing the SDC was identified as one of the strategic goals of KIS yet in 2013, and it was one of the recommendations of the Leibniz Evaluation in 2015. Our funding application to the national Joint Scientific Conference (GWK) for the strategic expansion of KIS dedicated to the SDC was finally approved on November 13, 2020. From 2021, 1.4 M€ per year will be allocated to the SDC. This will lead to a noticeable staff count increase, an innovative computing infrastructure, as well as broadening scientific collaborations in the area of scientific computing, big data, and artificial intelligence. In addition to enhancing scientific activities at KIS, the SDC will become a valuable big-data infrastructure in Germany and beyond.
Introduction to the SDC Team
The tasks at the start of SDC are manifold. They range from establishing standards and policies, curating and injecting data from the OT and other sources, setting up the necessary IT infrastructure, to creating a new project plan in record time.
As of January 2021, this challenge is being met by twelve members with overlapping responsibilities and expertise (at least two more engineers are expected to join in the course of 2021).
Name (% FTE in SDC) | Responsibility |
---|---|
Svetlana Berdyugina (10%) | Project PI, internal & external collaborations, high-level data |
Janek Beck (100%) | IT specialist trainee, application and pipeline development |
Nazaret Bello González (50%) | Project Lead Scientist, data calibration, curation & dissemination, high-level data, internal and external scientific collaborations, workshops and training events |
Peter Caligari (80%) | Project head, technology development and realization, compute & data infrastructure, network, budget & personnel responsibility, head of IT |
Vigeesh Gangadharan (60%) | high-level data tools development |
Andriy Gorobets (50%) | Data curation & analysis, high-level data development |
Marco Günter (20%) | System IT, compute, infrastructure and data servers, network, |
Petri Kehusmaa (100%) | Project manager & senior systems architect, project plan development, governance, risk assessment |
Markus Knobloch (40%)
devops | OT interfaces and data injection (hardware side), meta-data standards, non-scientific IT, instructor for trainees |
Sophie Müller (100%) | IT specialist trainee, application and pipeline development |
Carl Schaffer (100%) | System architect, lead software developer, OT data injection (software side), ML, AI, Web-applications |
Taras Yakobchuk (50%) | System architect, high-level data & data management |
devops We plan to outsource the development of APIs (Application Programmable Interfaces) for access to science ready data (L1 onwards) to Drew Leonard from Aperio, the software development firm that designed and implemented the prototype of the VTF pipeline in Python. API development will begin with Python to be later followed by IDL and maybe other languages.
We are currently shaping a contract and expect Aperio to begin as soon as possible. The development will be split into a design and implementation phase. The first will indeed require input from scientists intending to use SDC to get a product that benefits future uses as much as possible. We expect the first version of these APIs to be ready for the initial launch of SDC at the end of 2021.
@Peter Caligari
SDC Project Status 02-2021
@Petri Kehusmaa
SDC Project kick-off was held in October 2020. Since then, we have come a long way. We have had a workshop with scientists to gather requirements for this new service platform. We have also tested many different technology options to be selected as building blocks for the future infrastructure.
Now we have entered the Solution Analysis and Design phase. In this phase will work with different conceptual design components to be selected into our final blueprint and be further implemented later this year.
The project team has also deployed Jira Atlassian for Service Management and Documentation. In the future, Jira Service Management will serve SDC as a “Help Desk” ticketing system and knowledge base for platform users. It will provide automated workflows, support portal, knowledge base, feedback system and many more new services for the community.
Solution Analysis and Design Phase steps.
Summary Feb 4, 2021
Current project health | Current project status | Project constraints |
---|---|---|
Green | Solution Analysis and Design in progress. The project is slightly behind schedule. | Resources. Technology POCs taking more time than predicted. |
Project status
Accomplishments
Workshop
Technology POCs
Requirements validation
Jira Atlassian deployment for the project team
Next steps
Solution Analysis and Design
Define data policies
Define embargo rules
Risks & project issues
Lack of resources
Definition & Status of the Project Requirements
In addition to political, technical and security requirements, the workshops in late 2021 helped us clarify expectations that scientists from different disciplines have of the SDC. We continuously update a list of all identified demands. The list includes each requirement's current priority, a possibly assigned responsible, and its current status.
https://leibniz-kis.atlassian.net/wiki/spaces/SP/pages/124813373
SDC Products & Tools
@Nazaret Bello Gonzalez
SDC data archive
http://sdc.leibniz-kis.de:8080/
Get access to data from GRIS/GREGOR and LARS/VTT instruments and the ChroTel full-disc telescope at OT.
Speckle reconstruction
https://gitlab.leibniz-kis.de/sdc/speckle-cookbook
This tutorial helps the user run KISIP (Wöger & von der Lühe, 2008) on her favourite BBI and/or HiFI imaging data.
Contact: @Vigeesh Gangadharan (vigeesh@leibniz-kis.de)
Coming soon:
A Jupyter Notebook to assist the user on VFISV inversions for GRIS data by @Vigeesh Gangadharan , including features like, e.g., wavelength calibration. Stay tuned!
Conferences & Workshops
@Nazaret Bello Gonzalez
Forthcoming Conferences/Workshops of Interest 2021
Feb 11
PUNCH4NFDI Open Data Workshop (Registration required!)
Feb 8-12
ESCAPE School: First Science with interoperable data
The Virtual Observatory (VO) is opening new ways of exploiting the huge amount of data provided by the ever-growing number of ground-based and space facilities, as well as by computer simulations. The goal of the school is twofold:
Expose participants to the variety of VO tools and services available today so that they can use them efficiently for their own research.
Gather requirements and feedback from participants
Every second Thursdays, 12:30-13:30 CET
PUNCH Lunch Seminar (see SDC calendar invitation for zoom links)
11 Feb 2021: PUNCH4NFDI and ESCAPE - towards data lakes
25 Feb 2021: PUNCH Curriculum Workshop
April week 12-16 (3 days, TBD)
ESCAPE WP4 Technology Forum
June 10-11
3th International Workshop on Science Gateways | IWSG 2021
Topics:
Architectures, frameworks and technologies for science gateways
Science gateways sustaining productive collaborative communities
Support for scalability and data-driven methods in science gatewayS
Improving the reproducibility of science in science gateways
Science gateway usability, portals, workflows and tools
Software engineering approaches for scientific work
Aspects of science gateways, such as security and stability
June 28, 2021:
Data-intensive radio astronomy: bringing astrophysics to the exabyte era
Topics:
Data-intensive radio astronomy, current facilities and challenges
Data science and the exascale era: technical solutions within astronomy
Data science and the exascale era: applications and challenges outside astronomy
SDC participation in Conferences & Workshops
Nov. 26, 2020:
2nd SOLAR net Forum Meeting for Telescopes and Databases
Talk: Big Data Storage -- The KIS SDC case, NBG, PC & PK, 2nd SOLARNET Forum (Nov 26)
@Nazaret Bello Gonzalez@Petri Kehusmaa @Peter Caligari
SDC Collaborations
@Nazaret Bello Gonzalez
SOLARNET https://solarnet-project.eu
KIS coordinates the SOLARNET H2020 Project that brings together European solar research institutions and companies to provide access to the large European solar observatories, supercomputing power and data. KIS SDC is actively participating in WP5 and WP2 in coordinating and developing data curation and archiving tools in collaborations with European colleagues.
Contact on KIS SDC activities in SOLARNET: @Nazaret Bello Gonzalez nbello@leibniz-kis.de
ESCAPE https://projectescape.eu/
KIS is a member of the European Science Cluster of Astronomy & Particle Physics ESFRI Research Infrastructures (ESCAPE H2020, 2019 - 2022) Project aiming to bring together people and services to build the European Open Science Cloud. KIS SDC participates in WP4 and WP5 to bring ground-based solar data into the broader Astronomical VO and the development tools to handle large solar data sets.
Contact on KIS SDC activities in ESCAPE: @Nazaret Bello Gonzalez nbello@leibniz-kis.de
KIS is one of the European institutes strongly supporting the European Solar Telescope project. KIS SDC represents the EST data centre development activities in a number of international projects like ESCAPE and the Group of European Data Experts (GEDE-RDA).
Contact on KIS SDC as EST data centre representative: @Nazaret Bello Gonzalez nbello@leibniz-kis.de
PUNCH4NFDI https://www.punch4nfdi.de
KIS is a participant (not a member) of the PUNCH4NFDI Consortium. PUNCH4NFDI is the NFDI (National Research Data Infrastructure) consortium of particle, astro-, astroparticle, hadron and nuclear physics, representing about 9.000 scientists with a Ph.D. in Germany, from universities, the Max Planck Society, the Leibniz Association, and the Helmholtz Association. PUNCH4NFDI is the setup of a federated and "FAIR" science data platform, offering the infrastructures and interfaces necessary for the access to and use of data and computing resources of the involved communities and beyond. PUNCH4NFDI is currently competing with other consortia to be funded by the DFG (final response expected in spring 2021). KIS SDC aims to become a full member of PUNCH and federate our efforts on ground-based solar data dissemination to the broad particle and astroparticle communities.
Contact on KIS SDC as PUNCH4NFDI participant: @Nazaret Bello Gonzalez nbello@leibniz-kis.de & @Peter Caligari mailto:cale@leibniz-kis.de
IT news
@Peter Caligari
Ongoing & Future developments
Webpage
KIS Our website is based on a 5-year-old version of Typo3, which poses an increased security risk. Furthermore, the current design only fulfils the most basic means for access by disabled people and a presentation on mobile devices with small displays - both requirements that the Ministry expected us to consider and implement.
Therefore, at the beginning of 2020, KIS-IT intended a relaunch based on a commercially available template for Typo3. However, this had to be stopped due to an increasing workload on IT from SDC and the fact that we were only 2 people in IT last year.
The KIS, therefore, decided to outsource the relaunch. The aforementioned template will still be the basis, but an external web agency will do the design and initial implementation. We have first design drafts and are currently streamlining our site-tree. Feel free to drop by anytime if you are interested in the process.
Costs for the relaunch are of the order of 13 k€.
Network
Dedicated 10 Gbit line between KIS & OT
KIS OT We are currently setting up a dedicated 10 Gbit line between the OT and the KIS (technically, we get a wavelength in a multiplexed dark-fibre, a so-called Lambda).
On Spanish territory, the line is completely free of charge. Nevertheless, the (throughput independent) annual costs for the remaining distance amount to approx. 45 k€ (incl. VAT). Additionally, there are one-time set-up costs of about 30 k€. The costs are
We expect a significantly lower latency on the new line than on the existing internet connection (which will not be affected in any way).
Therefore, we will mainly use it for remote observations and data transport within the framework of the SDC.
The SDC is striving for cooperation with the SCC/KIT in Karlsruhe. By coincidence, the 10 Gbit line enters German territory at the DFN node in Campus North of the KIT. We hope to connect KIT and KIS using the same line without significantly higher costs (despite the high expected load).
New (application) firewalls at KIS & OT
KIS OT Parallelly, we will replace the existing firewalls at both locations (KIS and OT) with modern application firewalls. The current ones are simple packet filters, classifying traffic by port only (all traffic on ports 80 and 443 is web traffic, any web-traffic that is not on 80 on 443 would not be recognized as such). So-called application firewalls do not rely on ports but rather examine all traffic for specific patterns to assign it to a particular application or usage.
Those machines need to be put under maintenance to cope with new threats efficiently. Per site, we expect initial costs of the order of 20-30 k€ for a redundant cluster of two machines, and around 2-3 k€ maintenance costs, each.
Storage
KIS The storage at the KIS is becoming increasingly scarce. In contrast to the storage nodes used at OT, such can no longer be bought for the KIS system.
We are currently scanning all files at KIS to determine the amount of data not accessed for a considerable amount of time. We are planning to invest in a slower tier where such data will automatically be moved. While still visible, access would take longer than for frequently accessed data on the primary tier (however, accessing data on the slower tier will automatically and transparently move it back to the near-line tier). Even this to be introduced slower tier will still not primarily have the character of an archive. Expected costs for such a system are 50-70 k€ (for approx 0,5 PB, incl VAT).
SDC Mainly static files of common interest that are too huge to store on offline media like external disks or tape will go to SDC once operational.
OT We will upgrade the central storage at OT (jane) with two additional nodes á 32 TB to cope with the tight storage situation during observations in 2020 (partly due to mainly running remote observations due to Covid19). Jane will then consist of a total of 6 nodes.
Current Resources
Compute nodes
hostname | # of CPUs & total cores | ram [GB] |
---|---|---|
patty KIS | 2 x AMD EPYC 7742, 128 cores | 1024 |
itchy & selma KIS | 4 x Xeon(R) CPU E5-4657L v2 @ 2.40GHz, 48 cores | 512 |
scratchy KIS hathi OT | 4 x Intel(R) Xeon(R) CPU E5-4650L @ 2.60GHz, 32 cores | 512 |
Central storage space
Total available disk space for /home (KIS OT), /dat (KIS OT), /archive (KIS), /instruments (OT)
name | total [TB, brutto] | free [TB, brutto] |
---|---|---|
mars KIS | 758 | 39 |
quake KIS/seismo | 61 | 0 |
halo KIS/seismo | 145 | 44,5 |
jane OT | 130 (-> 198) | 23 |
References
Products & Tools
SDC data archive: http://sdc.leibniz-kis.de:8080/
Speckle reconstruction: https://gitlab.leibniz-kis.de/sdc/speckle-cookbook
Forthcoming Conferences/Workshops
Feb 11, 2021: PUNCH4NFDI Open Data Workshop (Registration required!)
Feb 8-12, 2021: ESCAPE School: First Science with interoperable data
June 10-11, 2021: 3th International Workshop on Science Gateways | IWSG 2021
June 28, 2021: Data-intensive radio astronomy: bringing astrophysics to the exabyte era
Collaborations
SOLARNET: https://solarnet-project.eu
ESCAPE: https://projectescape.eu/
PUNCH4NFDI: https://www.punch4nfdi.de
Quick links
Computer load: http://ganglia.leibniz-kis.de