Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

It's a pleasure to present you our second newsletter. We try to keep the release schedule close to one, not exceeding two months, balanced between being informational and not too chatty.

Apart from the regular project progress and IT news, there are quite some chapters on policies that will affect how observations will be done and what is required to access data in the future. There's also a section on licensing issues of data no longer embargoed.Taras Yakobchuk introduces the new tool he is developing for visualizing and analyzing calibrated GRIS/GREGOR data. The tool is not only intended to help experts analyzing data offered by SDC, but it should also allow access to laypersons who are not experts in dealing with this type of data.We Welcome to our third newsletter. Regretfully, we are rather approaching a two-month release schedule than the initially envisioned one-month plan — much work going on in parallel. 

Hopefully, you will still find this newsletter helpful and informative. The tools section is updated, and IT news is up to date. 

As always, we would like to encourage you to comment openly comment on any parts. Feedback on any subject or raise new topics to which you think we do not pay enough attention in the context of SDC. Feedback is always welcome and helps us to deliver a better product.


Table of Contents
maxLevel3
minLevel2

Editorial


📰 Editorial

🔒 Embargos: Definitions, Planned Realization, and Proof of Concept 

Peter Caligari

Embargoes restrict access to observational data and derived higher data products for a certain period after the observation campaign. All data in SDC is planned to be subject to such an embargo. The envisioned period for data where no Ph.D. student is involved (neither in the observation nor in the later evaluation of the data) is 1 year; if there are Ph.D. students involved, the period is prolonged to 2 years. 

From a technical standpoint, raw data (and derived data products) belong to a particular group of people. Initially, this group of people consists of the PI of the observation and all participating COIs. We assume that this group of people is relatively static during the embargo, and people are added to or removed from this list only very rarely. It is not intended that users can make such changes themselves, but instead need to contact an admin. 

A campaign is a particular group of instruments used during a specific time interval. Any specific instrument can only pertain to one campaign at a time. All data gained with that instrument during that time belongs to the associated group of observers specified in the observing proposal. A posteriori, it's sufficient to know when and with what instrument the original data was taken to associate it to a campaign. Naturally, campaigns do not overlap, but each observer (PIs and COIs) might be part of multiple campaigns (even simultaneous ones if different instruments are involved). 

To keep track of campaigns, we are currently developing an online database that will provide a web interface so that any observer can register with this database before observation. This website's functionality is not yet completely defined, but it might well replace the entire current process of submitting observation requests by PDF and mail.

Authentication on the website is done via certificates, which are prerequisites for registering, submitting observation requests and accessing data. The certificates will then be mapped to Linux users, which are then assorted into the ephemeral Linux groups used to restrict access to embargoed data (one group per campaign). 

Any data that does not belong to a campaign cannot be subject to an embargo and becomes freely available from the start!

Embargoes on derived higher level products

Enforcing the embargo on raw data is relatively straightforward, however, this is not the case for higher-level data products. Nevertheless, derived data products should be subject to the same embargo as the original data. How this will be done is not completely clear at the moment. Users will probably be able to put data back to their personal scope (a namespace used to distinguish files with the same name). Each entity (an instrument, telescope, user, camera, etc.) will have its own scope. By default, personal scopes are world-readable temporary storage within Rucio (see below). This is a global setting (the same for all users) and can only be changed by admins. We might need to remove this world access to those scopes to guarantee embargoes. That, however, would mean that not even other members of the campaign (COIs) would have access to derived data products during the embargo period. Should that be required, we might end up with one scope per campaign. We would appreciate feedback on the necessity of this feature. 

Data products of general interest to a greater public (data that will remain in SDC and become openly available after the embargo) will probably need to be reviewed anyway and be put back by some kind of privileged embargo-aware procedure. 

Visibility of embargoed data

We use Rucio to distribute, manage and access data in SDC. A side effect of that decision will be that even embargoed data is listable and will show up as a result of matching data searches. The data will, however, not be downloadable while embargoed.

Third-party users (not part of the original campaign) need to contact the PI to get access. The latter would then have this new user added to the campaign by an administrator or provide the desired data by other means than direct access via Rucio.

Even embargoed data should give some context data like quick looks to make searches useful. It needs to be discussed whether such low-quality data that is not scientifically exploitable can be exempt from the original embargo?

Proof of concept. 

We currently run a demo setup of Rucio with a storage unit based on dCache. Uploads will be performed using Webdav. Technically, this is very close to the envisioned design of SDC.

In its simplest version, the one we will probably use in V 1.0 of SDC at the end of 2021, users will point any client to his certificate by setting an environment variable appropriately. A user's key and the certificate need to reside in (adequately protected) files in pem-format within the user's home directory: 

Code Block
export CLIENT_CERT=~/usercert.pem
export CLIENT_KEY=~/userkey.pem

The user in Rucio needs a so-called identity corresponding to this certificate's Subject for the authentication mechanism X509: 

Code Block
[root@client tmp]# rucio-admin account list-identities root
Identity: tutorial, type: USERPASS
Identity: /C=DE/O=GridGermany/OU=Leibniz-Institut fuer Sonnenphysik (KIS)/OU=SDC/CN=Peter Caligari, type: X509

On the storage units, this identity is then mapped to local user-IDs. For this proof of concept, this is done manually in a hard-coded file on the dCache node: 

Code Block
[root@dcache0 ~]# cat /etc/dcache/multi-mapfile 
"dn:/C=DE/O=GridGermany/OU=Leibniz-Institut fuer Sonnenphysik (KIS)/OU=SDC/CN=Peter Caligari" username:tester uid:1000 gid:1000,true

Now uploads by any means (Rucio, gfal-copy, the dCache native method, or even curl using Webdav) will result in files with an owner and group ID of 1000. Files by default are world-readable! 

Enforcing an embargo is relatively straightforward: 

  1. upload the file to Rucio (but do not register it yet) 

  2. remove world-readability from the file

  3. associate it from the default group of the PI to the group representing the campaign

  4. register it with Rucio (now the file would pop up in searches and listings). 

In a real-world setup, the mapping between dCache users and certificates will probably be done via an online database instead of a hard-coded file. This mechanism still needs to be developed. 

Info

Feedback welcome

We would welcome any comments on anything said above, especially on embargoes' assumptions, like the infrequency of adding users to campaign groups, the inheritance of embargoes of higher-level data products, the lag thereof for quick-look data and the like, etc. Please submit any comments and suggestions, preferably before the end of April 2021. 

Project Status

SDC Project Status 02-2021

Petri Kehusmaa

Inc drawio
zoom1
simple0
pageId219152389
custContentId218824807
lbox1
diagramDisplayNameUntitled Diagram.drawio
hiResPreview0
baseUrlhttps://leibniz-kis.atlassian.net/wiki
diagramNameUntitled Diagram.drawio
imgPageId212008969
pCenter0
aspect12e1b939-464a-85fe-373e-61e167be1490 1
width924
includedDiagram1
aspectHashe2dc1af971366644803e9ec7f18c9983475ec2d4
linksauto
tbstyletop
height421

Solution Analysis and Design Phase steps.

SDC Project team has been working hard to find the best possible hardware and software components to build a robust platform for the solar community. The project has now entered a phase where we are creating a detailed solution design meaning that we have already identified many technical pieces which are going to be included in the final version of SDC. The team is now trying to find the best possible ways to integrate these different pieces together. This means a lot of investigations on technical details and testing different scenarios.

📋 Summary

Current project health

Current project status

Project constraints

Status
colourGreen
titleGREEN

“Create Detailed Solution Design” phase in progress.

 

Resources and their availability.

Technology POCs taking more time than predicted.

At the core of everything at this point is RUCIO. It’s going to be the most essential piece of software for SDC. RUCIO is going to take care of data transfers, data dissemination, data embargoes, data security and a lot of automation at the same time. SDC’s primary goal is to automate data transfers from OT to SDC archive and make sure that all necessary data policies are being applied at the same time. RUCIO is also going to take care of the data lifecycle meaning that the most relevant data is always available once the oldest data is being archived into long-term storage.

SDC team is also developing pipeline and analytic tools to allow solar scientists to get quick results out from our science-ready L1 data. These tools include GRIS inversions, BBI speckle image reconstruction and GRIS data visualization.

Inc drawio
zoom0.8
simple0
pageId219152389
custContentId218890407
lbox1
diagramDisplayNameSDC Components v01
hiResPreview0
baseUrlhttps://leibniz-kis.atlassian.net/wiki
diagramNameSDC Components v01
imgPageId212008969
pCenter0
aspectVhEeLwoprRTZtBJZUcRu 1
width579
includedDiagram1
aspectHashddabdec45aeebd5ca6cdcdcd657be7290a369b1c
linksauto
tbstyletop
height473

SDC Technical Components (image by Carl Schaffer and Petri Kehusmaa).

 📊 Project status

Tip

Accomplishments

  • High-level solution design

  • Started collecting data policies

  • Clarified embargo policies

  • Listed essential use cases for SDC

 

Next steps

  • Continue selecting solution components and planning component integrations

 

Warning

Risks & project issues

  • Lack of resources

  • Resource availability

  • Multiple process implementations at the same time


Governance

👩‍⚖️ Policies, Frameworks & Governance

📜 Proposal for Data License in SDC

Peter Caligari

There is a fundamental difference between copyright (in the sense of ownership) and the right to use data. It's the latter that is clarified by a license specifying how third parties can use the data.

After an initial embargo period (envisioned: 1 year, 2 years if Ph.D. students are involved), all data in SDC is expected to become freely available. Nevertheless, the first question that must be answered in this context is whether or not post-embargo data retains copyright by either the original observers or SDC? The choice of the licence to which post-embargo data is subject depends on the answer to this question (see below). 

Either way, post-embargo data should be put under a license (not to be confused with the license given to software tools and workflows developed in the framework of SDC). Basically, all treatises on this subject agree that it's a terrible idea to publish data without any license at all. Even data in the public domain should be subject to a licence whose sole purpose is to make that fact irrevocably clear. Similarly, most sources agree that one should neither come up with a completely novel proper licence. Incompatibilities with different legal national norms would almost certainly be pre-programmed in this case. Likewise, the combined use of the data offered with data from other sources will practically inevitably lead to licensing conflicts. 

The Creative Commons Licenses are standardized data licenses, compatible with most national standards, that can modularly be amended by attributes to prohibit specific data usage. The licences of CC relevant in the context of SDC are: 

  • CC0 1.0 Universal: data is completely public domain, no rights whatsoever are retained, no citation of the source is required upon its use, and data can be freely reused, modified and redistributed (even commercially). Suitable if no copyright is retained. 

  • CC BY 4.0 Attribution 4.0 International: requires to give credit to the source of the data upon reuse; reuse can be amended with further restrictions, which are:

    • NC: non-commercial use, only

    • ND: no derivatives (data must stay as-is and cannot be redistributed in modified form) 

    • SA: share-alike (any data derivatives must also be redistributed under a similar license). 

These additions are cumulative, so a license stating exclusively non-commercial use and redistribution only if the derived data products are offered freely under the same license would be: 

CC BY-NC-SA 4.0

Any additional restriction might prevent combining data from different sources, though. Suppose we subjected our data to the SA-building block. You could then only combine it with third-party data if the latter's license would also allow publication under SA (see e.g. the faqs of CC).

If we allow commercial use, even re-licensing the data and products thereof might probably possible (there are licenses outside CC that explicitly exclude re-licensing only while still allowing commercial use (ODC-By); we did not look into those in detail, though). 

Discussions within SDC tend towards using plain CC0 1.0 or CC BY 4.0 for any non-embargoed data, and, following general recommendations, declare meta-data completely public domain by using CC0 1.0. CC BY 4.0 is mainly used for publications and rarely for scientific data, though. The latter is primarily put under CC0 1.0 due to a lack of copyright in the first place. If the sole intention of using CC BY 4.0 is to have users acknowledge the use of our data in their publications, CC0 1.0 might probably enough, as attribution might probably be enforced by other means. 

SDC might require written (or implied) consent from the observer for their data to be published under the chosen licence after the embargo expires. The application-form for observing proposals should be modified to include such consent.

It's worth noting that attributing a license of this type to a particular data set is a once-and-forever decision: once put under such a license, it cannot be put under a different license later if one changes one's mind! Therefore, this is not a decision to be made hastily but demands a consciously well-considered and balanced judgment, right from the start.

We would welcome your view and feedback on the above subjects and on the intention to use either plain CC0 1.0 or CC BY 4.0. Especially if you do not agree with either! If you are fine with one of them, which one would you see more appropriate for SDC? If you would like to comment, please send a mail to 

sdc-board@leibniz-kis.de

We will consider any contributions until the end of April 2021. 

Links for further reading: 

http://creativecommons.org.au/content/licensing-flowchart.pdf

https://chooser-beta.creativecommons.org

https://creativecommons.org/licenses/

https://www.dfg.de/foerderung/info_wissenschaft/2014/info_wissenschaft_14_68/

https://openaccess.mpg.de/Berliner-Erklaerung

https://www.forschungsdaten.info/typo3temp/secure_downloads/72984/0/cda7fa0aa53a45b87c0f97d34c3c96ab7b1e7346/Leitfaden_-_Verantwortungsvoller_Umgang_mit_Forschungsdaten.pdf

For the postitions of Leibniz, EU, and others, see e.g.:

https://www.mdc-berlin.de/system/files/migrated_files/fiona/ag-oa_0.pdf

🇪🇸->🇩🇪 Data Transport from OT to KIS before Rucio

Peter Caligari

Background

As long as on-the-spot campaigns were possible, data between OT and KIS was transported using external hard-drives. The latter also served as backup-disks for this valuable data. This raw data was then copied to the central storage at KIS, mainly used to process it and produce higher-level products. The latter were then written back to external disks and tapes again to free space for further data processing.

Similarly, OT's central storage is meant as temporary storage between observation and transport of the raw data to the observer's home institution. As of the beginning of 2021, we will have about 150 TB (usable) disk space.  

Current Situation

Due to Covid-19, all observations at OT are done remotely only. Contrary to our expectations at the beginning of 2021, it currently looks more likely that the current lockdown restrictions will be tightened rather than relaxed. Coordinating the data transport of different campaigns over the network becomes a pressing issue. We, therefore, decided to mirror /instruments at OT to the same directory at KIS. Whatever data is written to the former will be replicated with the best effort to KIS. Data on the KIS side is read-only, retains permissions and ownership of the original data on at OT, and will remain even if the source data at OT is deleted. Copying data to other directories at KIS for further processing should not consume disk space, as identical data is really kept only once on the disks (the technology behind that is called deduplication)

Problems with this approach

While entirely automated (and as such very comfortable from a users point of view), there are several drawbacks of this approach: 

  1. Any data is copied: excellent data just as well as rubbish. According to what was said above, once transferred to KIS, the latter cannot be deleted there again. 

  2. Some data is post-processed right at OT (after arriving on /instruments), and only the outcomes of this process are worth being transported. The procedure described above is prone to unnecessarily copying the unprocessed data to KIS and would not consider the processed data for transport (unless copied back to /instruments on Tenerife).

  3. All raw data is copied. Even data from partners or international campaigns that were not intended for KIS but would instead have been transported directly to their respective home institutions is copied.

  4. The procedure used for copying is intended for an entirely different use case. It's meant to maintain an off-site copy on a best-effort basis for disaster recovery. As such, it does not bother signalling the successful replication of individual files; what makes it to the replica before failure, made it, and about those files that did not, nothing can be done anyway. In our scenario, that means it becomes difficult to see when a file was replicated entirely (and can thus be deleted at OT to free up space there for buffering data from future observation). 

To cope with these problems, we might again (like already proposed in 2020) introduce an intermediate folder for copying data to KIS. We would then not automatically copy any data in /instruments but any data copied to that folder. That would allow for post-processing on-site and have only the outcome of that process transferred to KIS. Data not intended for KIS or data not being worth being kept would simply not be copied at all to that folder. However, this reduces the degree of automation as an additional manual step is required. 

Once this intermediate directory is set up, we will announce the switch away from copying /instruments by mail. For the time being, one would still have to delete successfully replicated data by hand from the intermediate directory on Tenerife. 

In late 2020 we tried to automate the process of checksumming the original data on Tenerife, replicating it to KIS, and, upon success, removing the original on Tenerife. This, however, turned out to be quite tricky to implement, as it requires the synchronization of two totally independent processes at two sites with no shared access. Even though we failed then, we might look into that effort again. Should we succeed this time, we'll let you know.  

Space requirements & costs

Tenerife's disk space can accommodate data from around 3 campaigns and thus sufficient if data is deleted promptly.
We have looked at different solutions to store new data from campaigns in 2021 at KIS before SDC V1.0 is fully established. A simple expansion of the existing system is impossible due to a manufacturer's switch in technology (DELL/EMC). One would have to buy a completely new cluster (again with minimal redundancy and size), making this option rather expensive. The price per TB usable disk space range from approximately 360 €/TB to 800 €/TB (including VAT).
We also looked into moving some dormant data (data not accessed for more than, let's say, one year) to the public cloud. Costs there, however, strongly depend on the access pattern; should, against the established access pattern of that data arise the need to download big junks of that data, traffic costs become prohibitive. A traffic-independent flat-rate amounts to about 5000€/month for 300 TB cloud storage; on top of that, license costs would need to be added to access this outsourced data seamlessly (those would not be required, however, if we used a local cloud-appliance that consists of many slow disks; the third variant we looked at). The price per TB/month ranges from approximately 60 €/TB to 200 €/TB (including VAT). The latter is a traffic-independent flat rate.
Please note that SDC will solve this problem: All raw data and most of the large static simulation outputs will go to SDC, freeing up enough space on the yet existing system for every-days work. So once SDC is up and running, there's no need to buy storage for this usage in the foreseeable time. Investing considerable amounts of money just to integrate smoothly with the existing system without a need in the future is a waste.

Solution

We, therefore, decided to buy a first storage node of SDC and use it as temporary storage for new raw data and existing static data. SDC nodes consist of 19" two-socket servers with 24x16 TB disks, two SSDs for caching, and a separate disk mirror for the system. One tier at SDC will span several of these. We might well already use the technology envisioned for each node in SDC to present this temporary solution to the network. This would give us storage at a price of around 170€/TB (usable and including VAT).

Products & Tools

🛠 SDC Products & Tools

GRISView (working title)

Taras Yakobchuk

Image Removed

GRISView (working title) is an upcoming visualization and analysis tool to work with calibrated GREGOR/GRIS observational datasets. It is intended to facilitate easy data preview and present interactive tools for quick plotting, analysis and export. Tested features in the first release will include:

  • Advanced view, pan and zoom functions for map images and spectra

  • Both single map and time-series observations support

  • Multiple POI (point-of-interest) and ROI (rectangle-of-interest) to study spatial features

  • Interactive map isocurves generation and profile cuts

  • Distance measurements between map pixels in different units

  • Spectral line identification, markers and relative wavelength scale

  • Supports data format that is distributed by the SDC web archive

  • Written entirely in Python with GUI using Qt cross-platform framework

Nazaret Bello Gonzalez

SDC data archive

https://sdc.leibniz-kis.de/

Get access to data from GRIS/GREGOR and LARS/VTT instruments and the ChroTel full-disc telescope at OT.

Updates as of April 2021

  • Overview Calendar fields are now clickable and will link you to an overview of all observations performed on a given day

  • HTTPS has been implemented port numbers have been removed from URLs

Speckle reconstruction 

https://gitlab.leibniz-kis.de/sdc/speckle-cookbook

This tutorial helps the user run KISIP (Wöger & von der Lühe, 2008) on her favourite BBI and/or HiFI imaging data.
Contact: Vigeesh Gangadharan (vigeesh@leibniz-kis.de)

Coming soon: 

A Jupyter Notebook to assist the user on VFISV inversions for GRIS data by Vigeesh Gangadharan , including features like, e.g., wavelength calibration. Stay tuned!

Conferences & Workshops

📊 Conferences & Workshops

Nazaret Bello Gonzalez

Forthcoming Conferences/Workshops of Interest 2021

Every second Thursdays, 12:30-13:30 CET

PUNCH Lunch Seminar (see SDC calendar invitation for zoom links)

  • 11 Feb 2021: PUNCH4NFDI and ESCAPE - towards data lakes

  • 25 Feb 2021: PUNCH Curriculum Workshop

April week 12-16 (3 days, TBD)

ESCAPE WP4 Technology Forum 

June 01-02 (16:00 - 17:30)

15th International dCache Workshop

June 10-11

3th International Workshop on Science Gateways |  IWSG 2021

Topics:

  • Architectures, frameworks and technologies for science gateways

  • Science gateways sustaining productive collaborative communities

  • Support for scalability and data-driven methods in science gatewayS

  • Improving the reproducibility of science in science gateways

  • Science gateway usability, portals, workflows and tools

  • Software engineering approaches for scientific work

  • Aspects of science gateways, such as security and stability

June 28, 2021:

Data-intensive radio astronomy: bringing astrophysics to the exabyte era

Topics: 

  • Data-intensive radio astronomy, current facilities and challenges

  • Data science and the exascale era: technical solutions within astronomy

  • Data science and the exascale era: applications and challenges outside astronomy

SDC participation in Conferences & Workshops

Nov. 26, 2020:

2nd SOLAR net Forum Meeting for Telescopes and Databases

Talk:  Big Data Storage -- The KIS SDC case, NBG, PC & PK, 2nd SOLARNET Forum (Nov 26)
Nazaret Bello GonzalezPetri Kehusmaa Peter Caligari

SDC Collaborations

🤲 SDC Collaborations

 Nazaret Bello Gonzalez

SOLARNET https://solarnet-project.eu

KIS coordinates the SOLARNET H2020 Project that brings together European solar research institutions and companies to provide access to the large European solar observatories, supercomputing power and data. KIS SDC is actively participating in WP5 and WP2 in coordinating and developing data curation and archiving tools in collaborations with European colleagues.
Contact on KIS SDC activities in SOLARNET: Nazaret Bello Gonzalez nbello@leibniz-kis.de

 ESCAPE https://projectescape.eu/

KIS is a member of the European Science Cluster of Astronomy & Particle Physics ESFRI Research Infrastructures (ESCAPE H2020, 2019 - 2022) Project aiming to bring together people and services to build the European Open Science Cloud. KIS SDC participates in WP4 and WP5 to bring ground-based solar data into the broader Astronomical VO and the development tools to handle large solar data sets. 

Contact on KIS SDC activities in ESCAPE: Nazaret Bello Gonzalez nbello@leibniz-kis.de

 

EST https://www.est-east.eu/

KIS is one of the European institutes strongly supporting the European Solar Telescope project. KIS SDC represents the EST data centre development activities in a number of international projects like ESCAPE and the Group of European Data Experts (GEDE-RDA).

Contact on KIS SDC as EST data centre representative: Nazaret Bello Gonzalez nbello@leibniz-kis.de

 

PUNCH4NFDI https://www.punch4nfdi.de

KIS is a participant (not a member) of the PUNCH4NFDI Consortium. PUNCH4NFDI is the NFDI (National Research Data Infrastructure) consortium of particle, astro-, astroparticle, hadron and nuclear physics, representing about 9.000 scientists with a Ph.D. in Germany, from universities, the Max Planck Society, the Leibniz Association, and the Helmholtz Association. PUNCH4NFDI is the setup of a federated and "FAIR" science data platform, offering the infrastructures and interfaces necessary for the access to and use of data and computing resources of the involved communities and beyond. PUNCH4NFDI is currently competing with other consortia to be funded by the DFG (final response expected in spring 2021). KIS SDC aims to become a full member of PUNCH and federate our efforts on ground-based solar data dissemination to the broad particle and astroparticle communities.

Contact on KIS SDC as PUNCH4NFDI participant: Nazaret Bello Gonzalez nbello@leibniz-kis.de & Peter Caligari mailto:cale@leibniz-kis.de 

 IT news

🖥 IT news

Peter Caligari

Ongoing & Future developments

Webpage

Status
colourYellow
titleKIS
The restart of our web page continues. We now have the design agency's first drafts for various page types. The page tree has been adapted to reflect the new departmental structure of the KIS.

I put both, the preliminary designs and the envisioned page tree in our cloud for anybody to inspect and comment on. Comments are really welcome. I will collect and focus on them and if feasible try to include them in the next design round.

Network

Status of the dedicated 10 Gbit line between KIS & OT

Status
colourYellow
titleKIS
Status
colourPurple
titleOT
The 10 Gbit link between OT and KIS (see previous newsletter 01/21) is nearly finished. All intermediate networks are linked and the line is already at VTT.

We still need to connect it to the switches at OT, though and need to buy some network equipment for the connection between the University of Freiburg and KIS. We expect the line to be functional within a few weeks. We’ll keep you updated.

Test of (application) firewalls at KIS

Status
colourYellow
titleKIS
Status
colourPurple
titleOT
We meanwhile have an appliance of the new firewall for testing at KIS. Physically it sits before our current firewall towards the internet. It’s hardly actively filtering but just monitors our traffic and alerts us of any problems. This setup allows us (IT) to get used to this kind of machine.

We might not buy exactly this machine (this one is a mid-sized appliance from Paloalto-Networks). This is but a first test of this kind of routers/firewalls for IT at KIS.

We expect the test to terminate in the next few weeks so that we can make a decision on which machine we would like for OT and KIS and how and when to buy them.

Storage

Status
colourYellow
titleKIS
Status
colourRed
titleSDC
We are currently in the process of ordering the first storage node for SDC. This node consists of a DELL R740XD2 (2 CPUs, 24x16 TB disks, 10 Gbit Ethernet). The price (including VAT) per usable TB storage is of the order of 160€/TB.

We will use this machine as a test-bed for the technology envisioned for SDC and already have all raw-data from observations in 2021 as well as the large files from simulations accumulating on mars stored there.

SDC will consist of at least 4 similar nodes. This is the first one. As soon as the remaining hosts are setup we will move any data still on this first host to the new SDC cluster and join it to the latter, also.

Status
colourRed
titleSDC
In parallel, we are looking into outsourcing seldomly accessed files to the public cloud. Within the framework of the SDC, it is planned to use the latter mainly to flexibly cover short-term peaks in demand. 

The costs per TB of storage space in the cloud are strongly dependent on capacity and, above all, the access pattern. They vary between approx. 60-200 €/TB/a. Access-independent models, in which only a fixed fee is charged per stored GB, but no fees for downloading or uploading, are at the upper end of this scale. At the lower end are public providers such as Amazon, Google and Microsoft, which charge a relatively high fee for each type of data access in addition to the (relatively cheap) price of simple storage.

Additionally, licence fees of a similar magnitude for the software that moves files between the cloud and the local storage at the KIS are required. 

We are currently obtaining concrete offers to outsource 100 TB for 1 year to a public cloud. The pricing models are so complicated that we can determine the resulting costs only through a limited real-world test. 

We will intentionally design the integration so that it will become apparent to all users which files are in the cloud and which are not. Although this is cumbersome (and artificially induced), we deem this awareness essential (at least initially, where we have no experience of the potential costs involved). The exact model is still to be worked out, and we will inform you about it again in due course. 

Status
colourPurple
titleOT
The two new nodes for jane arrived at OT. The installation will be done as soon as either Peter Caligari can travel there or we get a technician of DELL up to the telescopes. Due to Covid-19, the time scale for this installation remains unclear. We will keep you informed.

📜Aperio

Traditionally all data processing in solar physics is typically done on files. While this option will prevail in SDC, it is not the best way to deal with large data sets, where computations need to be done where the data resides and not vice versa. To interface data in SDC programmatically, APIs are needed for the most common programming languages like Python and IDL. 

We are pleased that we could win Aperio Software to develop a Python API for SDC. Aperio Software is heavily involved in the development of SunPy and Astropy, a community effort to develop Python packages for Solar Physics and Astronomy. Drew Leonard, one of the founders of Aperio, developed the prototype for the VTF pipeline. The contract divides into a design and implementation phase. During the former, Drew will clarify what is expected from the future API and what requirements it must meet through workshops and one-on-one meetings. Expect the first version for mid-2022. 

Project Status


SDC Project Status 03-2021 (06.07.2021)


Petri Kehusmaa

Solution Development and Integration Phases.Image Added

Solution Development and Integration

The project has now shifted into a phase where we are building the actual SDC platform and creating/acquiring all necessary components. These components are in-house developed software for instrument pipelines and analysis, compute, network and storage hardware, middleware (RUCIO, Kubernetes, Docker, etc.), and governance/management/documentation software like Jira Service Management and Confluence.

There is still some work to be done to find all suitable solution components and thus shaping the final scope of SDC. We aim to build SDC as a service platform for the solar community with a continuous focus on users and platform development.

📋 Summary

Current project health

Current project status

Project constraints

Status
colourYellow
titleYELLOW

Finalizing some tasks for solution design and creating solution components. 

Governance model not finalized and implementation not started yet.

Resources and their availability.

Technology POCs taking more time than predicted.

 📊 Project status

Tip

Accomplishments

  • High-level solution design

  • Some software components created (GRIS Viewer)

  • The hardware acquisition process started

  • RUCIO test environment established

 

Next steps

  • Continue selecting solution components and creating solution components

 

Warning

Risks & project issues

  • Lack of resources

  • Resource availability

  • Multiple process implementations at the same time

  • No agreed governance model

Governance


👩‍⚖️ Policies, Frameworks & Governance


  • ITIL v4 process model going to be partially adopted for service management purposes

  • Data policies definition started

  • SDC governance model and scope to be decided

Products & Tools


🛠 SDC Products & Tools


Standardized GRIS Pipeline

Carl Schaffer (Unlicensed)

The GRIS reduction pipeline was merged to a common version in collaboration with M. Collados (IAC, GRIS PI). The version running at OT and Freiburg now both produce data that is compatible with downstream SDC tools. The latest version of the pipeline can always be found on the KIS GitLab server. The current OT version will be synced to the ulises branch and merged into the main production branch periodically.

SDC GRIS VFISV-Inversion pipeline

Vigeesh Gangadharan

A pipeline code for performing Milne-Eddington inversions of GRIS spectropolarimetric data is now available at,

https://gitlab.leibniz-kis.de/sdc/grisinv

The pipeline uses the Very Fast Inversion of the Stokes Vector (VFISV, Borrero et al. 2011) code v5.0 (node for spectrograph data) as the main backend to carry out a Milne-Eddington Stokes inversion for individual spectral lines.
The current implementation of the pipeline is a Python MPI wrapper around the VFISV code to easily work with the GRIS data. The inversion for the desired spectral line is performed using VFISV and the buffer with the inversion results is communicated to the Python module. The Python module propagates the keywords from level 1 (L1) and packages the inversion results and outputs a FITS file (when used as a command-line interface) or returns an NDarray (when called within a python script).

For more information on installing and using the pipeline, check the above GitLab repository.

Please report any issues with the code using the link below,

https://gitlab.leibniz-kis.de/sdc/grisinv/-/issues/new?issue

SDC data archive

https://sdc.leibniz-kis.de/

Get access to data from GRIS/GREGOR and LARS/VTT instruments and the ChroTel full-disc telescope at OT.

Updates as of July 2021

  • The detail pages for observations have been reworked see an example here:

    • Added dynamic carousel of preview data products

    • Added flexible selection for downloading associated data

  • VFISV inversion results have been added for most of the GRIS observations. The website now includes information on line of sight velocity and magnetic field strength

  • The development process has streamlined:

    • automated test deployments for quicker iterations and fixes

    • Changes to the UI will occur in regular sprints. We’re currently collecting ideas here

  • Added historic ChroTel data for 2013, thanks to Andrea Diercke from AIP for contacting us and providing us with this supplemental archive.

GRISView

Taras Yakobchuk

GRISView is a new visualization and analysis tool to work with GRIS/GREGOR calibrated datasets as distributed by the SDC website. It is written in Python with GUI made using Qt cross-platform framework.

Image Added

Currently implemented features include:

  • Quick panning and zooming of map images and spectra using mouse

  • Multiple POI (point-of-interest) and ROI (rectangle-of-interest) for easy inspection of spectral changes across the map

  • Distance measurement between multiple map points given in different units

  • Intensity profile plots along a given line segment, linking several profiles for radial profiles checking

  • Interactive color bars used to view histogram, adjust image contrast, select and modify the viewing color scheme

  • Generating contours for map images, easy levels adjustment, and color setting

  • Browsing spectra with cursor moving using keyboard and mouse shortcuts, quick navigation using marker list

  • Relative scale for quick wavelengths difference evaluation at the cursor position

  • Viewing observation FITS files headers

  • Support for both individual observations and time-series

Next, it is planned to add the following:

  • Exporting current spectra and map plots as images and data files

  • Derived quantities visualization e.g. Q/I, V/I, DOLP (degree of linear polarization) etc.

  • Various normalizations of spectra e.g. to a selected signal level, local continuum, quiet Sun

  • Spectral line fitting and line parameters determination

  • Saving and restoring working sessions

Info

Feedback welcome

We strongly encourage all colleagues to try out this new tool and provide feedback. Instructions for installing and using the program can be found on the tool's GitLab page:

https://gitlab.leibniz-kis.de/sdc/gris/grisview

Please report any issues and bugs on the program GitLab page or using the direct link:

https://gitlab.leibniz-kis.de/sdc/gris/grisview/-/issues/new?issue

Conferences & Workshops


📊 Conferences & Workshops


Forthcoming Conferences/Workshops of Interest 2021

Every second Thursday, 12:30-13:30 CET (currently on summer break)

PUNCH Lunch Seminar (see SDC calendar invitation for zoom links)

KIS internal Typo3 Editors' training

July 13 & 14, 2021, 10:00 - 12:00 CEST registration needed!

SDC Collaborations


🤲 SDC Collaborations


 Nazaret Bello Gonzalez

SOLARNET https://solarnet-project.eu

KIS coordinates the SOLARNET H2020 Project that brings together European solar research institutions and companies to provide access to the large European solar observatories, supercomputing power and data. KIS SDC is actively participating in WP5 and WP2 in coordinating and developing data curation and archiving tools in collaborations with European colleagues.
Contact on KIS SDC activities in SOLARNET: Nazaret Bello Gonzalez nbello@leibniz-kis.de

 ESCAPE https://projectescape.eu/

KIS is a member of the European Science Cluster of Astronomy & Particle Physics ESFRI Research Infrastructures (ESCAPE H2020, 2019 - 2022) Project aiming to bring together people and services to build the European Open Science Cloud. KIS SDC participates in WP4 and WP5 to bring ground-based solar data into the broader Astronomical VO and the development tools to handle large solar data sets. 

Contact on KIS SDC activities in ESCAPE: Nazaret Bello Gonzalez nbello@leibniz-kis.de

 

EST https://www.est-east.eu/

KIS is one of the European institutes strongly supporting the European Solar Telescope project. KIS SDC represents the EST data centre development activities in a number of international projects like ESCAPE and the Group of European Data Experts (GEDE-RDA).

Contact on KIS SDC as EST data centre representative: Nazaret Bello Gonzalez nbello@leibniz-kis.de

 

PUNCH4NFDI https://www.punch4nfdi.de

KIS is a participant (not a member) of the PUNCH4NFDI Consortium. PUNCH4NFDI is the NFDI (National Research Data Infrastructure) consortium of particle, astro-, astroparticle, hadron, and nuclear physics, representing about 9.000 scientists with a Ph.D. in Germany, from universities, the Max Planck Society, the Leibniz Association, and the Helmholtz Association. PUNCH4NFDI is the setup of a federated and "FAIR" science data platform, offering the infrastructures and interfaces necessary for the access to and use of data and computing resources of the involved communities and beyond. PUNCH4NFDI has been granted funds and will start officially its activities on October 1, 2021. KIS SDC aims to become a full member of PUNCH and federate our efforts on ground-based solar data dissemination to the broad particle and astroparticle communities.

Contact on KIS SDC as PUNCH4NFDI participant: Nazaret Bello Gonzalez nbello@leibniz-kis.de & Peter Caligari mailto:cale@leibniz-kis.de 

 IT news


🖥 IT news


Peter Caligari

Ongoing & Future developments

Webpage

Status
colourYellow
titleKIS
The design of the new website is essentially complete. We are currently making some final technical adjustments to the webserver and Typo3. The website is already running at the deployment (VM-ware) server at KIS and is already publicly available at the web address:

https://newwww.leibniz-kis.de

After the content has been moved, the server will be renamed http://www.leibniz-kis.de, and the old site will be shut down.

One of the reasons for the relaunch was to increase support of the particular browsers used by people with disabilities. This requires specific fields in the back-end to be filled in so that the page content can be appropriately classified. We will have a training course on handling the typo3 back-end in general, focusing on the above points on 

July 13 & 14, 2021, 10:00 CEST (Editors' training)

We currently plan to avoid any user login in the front end. This would allow us to not have to use cookies at all, rendering the need to use these annoying GDPR popups obsolete. However, this means we might not have any restricted areas on the website at all (including an Intranet)! This is a radical approach, and we might not be able to stringently follow through with this (see below). In that case, the Intranet on the website will be limited to purely informational pages; any documents now downloadable on the old website should be migrated to the cloud (wolke7). Anyhow, Typo3 allows hosting multiple websites under a single installation sharing the basic design and resources. Therefore, any websites requiring user registration and login (like the Intranet or a possible OT-webpage) might be built as separate websites, keeping the publicly accessible website login-free. 

Network

Status of the dedicated 10 Gbit line between KIS & OT

Status
colourYellow
titleKIS
Status
colourPurple
titleOT
The missing network equipment for the end at KIS will be installed in the second week of July. We will then try to establish the link remotely from Freiburg with the help of personnel at the telescopes.

Test of (application) firewalls at KIS

Status
colourYellow
titleKIS
Status
colourPurple
titleOT
Firewall testing at KIS (see https://leibniz-kis.atlassian.net/l/c/rF8kmXjv ) has terminated. Two manufacturers are still being considered, and a final choice will be made as soon as possible.

We (IT) still very much advocate going for high-availability setups for KIS and OT (in Freiburg) because KIS will host a significant part of SDC and OT because there's no trained personnel on-site, and replacements to the Canary islands take time).

Storage

Status
colourYellow
titleKIS
Status
colourRed
titleSDC
We are currently setting up one DELL R740XD2 as a (fake) dCache cluster running two (redundant) dCache pools on VM-ware. This host serves as a testbed to simulate hardware and network failures in the dCache cluster to come while providing a failure-tolerant (hopefully) net capacity of about 100 TB to KIS, alleviating the currently pressing storage shortage.

Starting in July, six more comparable hosts will be purchased through a public tender. These will have a similar setup and form storage Tier1 (near-line) of SDC at KIS. We expect the hosts to arrive in late September.

We use ZFS on virtualized Debian servers as a basis for the individual dCache-nodes. ZFS uses copy-on-write and checksums any blocks on disk and provides auto-healing. Zpools will most probably use RAIDZ or RAIDZ2, and any file will reside on at least 2 different servers. At the time of this writing, the only other file system offering similar features is BTRFS, but support for BTRFS was recently pulled from some major distributions (e.g. CentOS, the distro that has mainly been used at the KIS so far).

Status
colourRed
titleSDC
The 100 TB space on Microsoft Azure for cold data still needs some configuration. As of today, the third-party software responsible for moving files between our Isilon storage cluster (mars) and the cloud has problems doing so from Linux clients in a satisfactory way. The manufacturer of the software is working on the issue.

Current Resources

Compute nodes

patty

hostname

# of CPUs & total cores

ram [GB]

Status
colourYellow
titleKIS

]

patty, legs & louie

Status
colourYellow
titleKIS
(installed but not publicly available yet. Nearly there…)

2 x AMD EPYC 7742, 128 cores

1024

itchy & selma

Status
colourYellow
titleKIS

4 x Xeon(R) CPU E5-4657L v2 @ 2.40GHz, 48 cores

512

scratchy

Status
colourYellow
titleKIS

quake &halo
Status
titleKIS/seismo

hathi

Status
colourPurple
titleOT

4 x Intel(R) Xeon(R) CPU E5-4650L @ 2.60GHz, 32 cores

512

Central storage space

Total available disk space for /home (

Status
colourYellow
titleKIS
Status
colourPurple
titleOT
), /dat (
Status
colourYellow
titleKIS
Status
colourPurple
titleOT
), /archive (
Status
colourYellow
titleKIS
), /instruments (
Status
colourPurple
titleOT
)

name

total [TB, brutto]

free [TB, brutto]

mars

Status
colourYellow
titleKIS

758

39

quake

Status
titleKIS/seismo

61

0

halo

Status
titleKIS/seismo

145

44,5

jane

Status
colourPurple
titleOT

130 (-> 198)

23


 References

📎 References

Products & Tools

Forthcoming Conferences/Workshops

Collaborations