Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

It's a pleasure to present you our second newsletter. We try to keep the release schedule close to one, not exceeding two months, balanced between being informational and not too chatty.


Apart from the regular project progress and IT news, there are quite some chapters on policies that will affect how observations will be done and what is required to access data in the future. There's also a section on licensing issues of data no longer embargoed.


Taras Yakobchuk introduces the new tool he is developing for visualizing and analyzing calibrated GRIS/GREGOR data. The tool is not only intended to help experts analyzing data offered by SDC, but it should also allow access to laypersons who are not experts in dealing with this type of data.


We would like to encourage you to openly comment on any parts. Feedback is always welcome and helps us to deliver a better product.


Table of Contents
maxLevel3
minLevel2

Editorial


📰 Editorial


🔒

Embargos: Definitions, Planned Realization, and Proof of Concept 

What goes here?  

Peter Caligari

Embargoes restrict access to observational data and derived higher data products for a certain period after the observation campaign. All data in SDC is planned to be subject to such an embargo. The envisioned period for data where no Ph.D. student is involved (neither in the observation nor in the later evaluation of the data) is 1 year; if there are Ph.D. students involved, the period is prolonged to 2 years. 

From a technical standpoint, raw data (and derived data products) belong to a particular group of people. Initially, this group of people consists of the PI of the observation and all participating COIs. We assume that this group of people is relatively static during the embargo, and people are added to or removed from this list only very rarely. It is not intended that users can make such changes themselves, but instead need to contact an admin. 

A campaign is a particular group of instruments used during a specific time interval. Any specific instrument can only pertain to one campaign at a time. All data gained with that instrument during that time belongs to the associated group of observers specified in the observing proposal. A posteriori, it's sufficient to know when and with what instrument the original data was taken to associate it to a campaign. Naturally, campaigns do not overlap, but each observer (PIs and COIs) might be part of multiple campaigns (even simultaneous ones if different instruments are involved). 

To keep track of campaigns, we are currently developing an online database that will provide a web interface so that any observer can register with this database before observation. This website's functionality is not yet completely defined, but it might well replace the entire current process of submitting observation requests by PDF and mail.

Authentication on the website is done via certificates, which are prerequisites for registering, submitting observation requests and accessing data. The certificates will then be mapped to Linux users, which are then assorted into the ephemeral Linux groups used to restrict access to embargoed data (one group per campaign). 

Any data that does not belong to a campaign cannot be subject to an embargo and becomes freely available from the start!

Embargoes on derived higher level products

Enforcing the embargo on raw data is relatively straightforward, however, this is not the case for higher-level data products. Nevertheless, derived data products should be subject to the same embargo as the original data. How this will be done is not completely clear at the moment. Users will probably be able to put data back to their personal scope (a namespace used to distinguish files with the same name). Each entity (an instrument, telescope, user, camera, etc.) will have its own scope. By default, personal scopes are world-readable temporary storage within Rucio (see below). This is a global setting (the same for all users) and can only be changed by admins. We might need to remove this world access to those scopes to guarantee embargoes. That, however, would mean that not even other members of the campaign (COIs) would have access to derived data products during the embargo period. Should that be required, we might end up with one scope per campaign. We would appreciate feedback on the necessity of this feature. 

Data products of general interest to a greater public (data that will remain in SDC and become openly available after the embargo) will probably need to be reviewed anyway and be put back by some kind of privileged embargo-aware procedure. 

Visibility of embargoed data

We use Rucio to distribute, manage and access data in SDC. A side effect of that decision will be that even embargoed data is listable and will show up as a result of matching data searches. The data will, however, not be downloadable while embargoed.

Third-party users (not part of the original campaign) need to contact the PI to get access. The latter would then have this new user added to the campaign by an administrator or provide the desired data by other means than direct access via Rucio.

Even embargoed data should give some context data like quick looks to make searches useful. It needs to be discussed whether such low-quality data that is not scientifically exploitable can be exempt from the original embargo?

Proof of concept. 

We currently run a demo setup of Rucio with a storage unit based on dCache. Uploads will be performed using Webdav. Technically, this is very close to the envisioned design of SDC.

In its simplest version, the one we will probably use in V 1.0 of SDC at the end of 2021, users will point any client to his certificate by setting an environment variable appropriately. A user's key and the certificate need to reside in (adequately protected) files in pem-format within the user's home directory: 

Code Block
export CLIENT_CERT=~/usercert.pem
export CLIENT_KEY=~/userkey.pem

The user in Rucio needs a so-called identity corresponding to this certificate's Subject for the authentication mechanism X509: 

Code Block
[root@client tmp]# rucio-admin account list-identities root
Identity: tutorial, type: USERPASS
Identity: /C=DE/O=GridGermany/OU=Leibniz-Institut fuer Sonnenphysik (KIS)/OU=SDC/CN=Peter Caligari, type: X509

On the storage units, this identity is then mapped to local user-IDs. For this proof of concept, this is done manually in a hard-coded file on the dCache node: 

Code Block
[root@dcache0 ~]# cat /etc/dcache/multi-mapfile 
"dn:/C=DE/O=GridGermany/OU=Leibniz-Institut fuer Sonnenphysik (KIS)/OU=SDC/CN=Peter Caligari" username:tester uid:1000 gid:1000,true

Now uploads by any means (Rucio, gfal-copy, the dCache native method, or even curl using Webdav) will result in files with an owner and group ID of 1000. Files by default are world-readable! 

Enforcing an embargo is relatively straightforward: 

  1. upload the file to Rucio (but do not register it yet) 

  2. remove world-readability from the file

  3. associate it from the default group of the PI to the group representing the campaign

  4. register it with Rucio (now the file would pop up in searches and listings). 

In a real-world setup, the mapping between dCache users and certificates will probably be done via an online database instead of a hard-coded file. This mechanism still needs to be developed. 

Info

Feedback welcome

We would welcome any comments on anything said above, especially on embargoes' assumptions, like the infrequency of adding users to campaign groups, the inheritance of embargoes of higher-level data products, the lag thereof for quick-look data and the like, etc. Please submit any comments and suggestions, preferably before the end of April 2021. 

Project Status

SDC Project Status 02-2021

Petri Kehusmaa

Inc drawio
zoom1
simple0
pageId219152389
custContentId218824807
lbox1
diagramDisplayNameUntitled Diagram.drawio
hiResPreview0
baseUrlhttps://leibniz-kis.atlassian.net/wiki
diagramNameUntitled Diagram.drawio
imgPageId212008969
pCenter0
aspect12e1b939-464a-85fe-373e-61e167be1490 1
width924
includedDiagram1
aspectHashe2dc1af971366644803e9ec7f18c9983475ec2d4
linksauto
tbstyletop
height421

Solution Analysis and Design Phase steps.

SDC Project team has been working hard to find the best possible hardware and software components to build a robust platform for the solar community. The project has now entered a phase where we are creating a detailed solution design meaning that we have already identified many technical pieces which are going to be included in the final version of SDC. The team is now trying to find the best possible ways to integrate these different pieces together. This means a lot of investigations on technical details and testing different scenarios.

📋 Summary

Current project health

Current project status

Project constraints

Status
colourGreen
titleGREEN

“Create Detailed Solution Design” phase in progress.

 

Resources and their availability.

Technology POCs taking more time than predicted.

At the core of everything at this point is RUCIO. It’s going to be the most essential piece of software for SDC. RUCIO is going to take care of data transfers, data dissemination, data embargoes, data security and a lot of automation at the same time. SDC’s primary goal is to automate data transfers from OT to SDC archive and make sure that all necessary data policies are being applied at the same time. RUCIO is also going to take care of the data lifecycle meaning that the most relevant data is always available once the oldest data is being archived into long-term storage.

SDC team is also developing pipeline and analytic tools to allow solar scientists to get quick results out from our science-ready L1 data. These tools include GRIS inversions, BBI speckle image reconstruction and GRIS data visualization.

Inc drawio
zoom0.8
simple0
pageId219152389
custContentId218890407
lbox1
diagramDisplayNameSDC Components v01
hiResPreview0
baseUrlhttps://leibniz-kis.atlassian.net/wiki
diagramNameSDC Components v01
imgPageId212008969
pCenter0
aspectVhEeLwoprRTZtBJZUcRu 1
width579
includedDiagram1
aspectHashddabdec45aeebd5ca6cdcdcd657be7290a369b1c
linksauto
tbstyletop
height473

SDC Technical Components (image by Carl Schaffer and Petri Kehusmaa).

 📊 Project status

Tip

Accomplishments

  • High-level solution design

  • Started collecting data policies

  • Clarified embargo policies

  • Listed essential use cases for SDC

 

Next steps

  • Continue selecting solution components and planning component integrations

 

Warning

Risks & project issues

  • Lack of resources

  • Resource availability

  • Multiple process implementations at the same time

Governance

👩‍⚖️ Policies, Frameworks & Governance

📜 Proposal for Data License in SDC

Peter Caligari

There is a fundamental difference between copyright (in the sense of ownership) and the right to use data. It's the latter that is clarified by a license specifying how third parties can use the data.

After an initial embargo period (envisioned: 1 year, 2 years if Ph.D. students are involved), all data in SDC is expected to become freely available. Nevertheless, the first question that must be answered in this context is whether or not post-embargo data retains copyright by either the original observers or SDC? The choice of the licence to which post-embargo data is subject depends on the answer to this question (see below). 

Either way, post-embargo data should be put under a license (not to be confused with the license given to software tools and workflows developed in the framework of SDC). Basically, all treatises on this subject agree that it's a terrible idea to publish data without any license at all. Even data in the public domain should be subject to a licence whose sole purpose is to make that fact irrevocably clear. Similarly, most sources agree that one should neither come up with a completely novel proper licence. Incompatibilities with different legal national norms would almost certainly be pre-programmed in this case. Likewise, the combined use of the data offered with data from other sources will practically inevitably lead to licensing conflicts. 

The Creative Commons Licenses are standardized data licenses, compatible with most national standards, that can modularly be amended by attributes to prohibit specific data usage. The licences of CC relevant in the context of SDC are: 

  • CC0 1.0 Universal: data is completely public domain, no rights whatsoever are retained, no citation of the source is required upon its use, and data can be freely reused, modified and redistributed (even commercially). Suitable if no copyright is retained. 

  • CC BY 4.0 Attribution 4.0 International: requires to give credit to the source of the data upon reuse; reuse can be amended with further restrictions, which are:

    • NC: non-commercial use, only

    • ND: no derivatives (data must stay as-is and cannot be redistributed in modified form) 

    • SA: share-alike (any data derivatives must also be redistributed under a similar license). 

These additions are cumulative, so a license stating exclusively non-commercial use and redistribution only if the derived data products are offered freely under the same license would be: 

CC BY-NC-SA 4.0

Any additional restriction might prevent combining data from different sources, though. Suppose we subjected our data to the SA-building block. You could then only combine it with third-party data if the latter's license would also allow publication under SA (see e.g. the faqs of CC).

If we allow commercial use, even re-licensing the data and products thereof might probably possible (there are licenses outside CC that explicitly exclude re-licensing only while still allowing commercial use (ODC-By); we did not look into those in detail, though). 

Discussions within SDC tend towards using plain CC0 1.0 or CC BY 4.0 for any non-embargoed data, and, following general recommendations, declare meta-data completely public domain by using CC0 1.0. CC BY 4.0 is mainly used for publications and rarely for scientific data, though. The latter is primarily put under CC0 1.0 due to a lack of copyright in the first place. If the sole intention of using CC BY 4.0 is to have users acknowledge the use of our data in their publications, CC0 1.0 might probably enough, as attribution might probably be enforced by other means. 

SDC might require written (or implied) consent from the observer for their data to be published under the chosen licence after the embargo expires. The application-form for observing proposals should be modified to include such consent.

It's worth noting that attributing a license of this type to a particular data set is a once-and-forever decision: once put under such a license, it cannot be put under a different license later if one changes one's mind! Therefore, this is not a decision to be made hastily but demands a consciously well-considered and balanced judgment, right from the start.

We would welcome your view and feedback on the above subjects and on the intention to use either plain CC0 1.0 or CC BY 4.0. Especially if you do not agree with either! If you are fine with one of them, which one would you see more appropriate for SDC? If you would like to comment, please send a mail to 

sdc-board@leibniz-kis.de

We will consider any contributions until the end of April 2021. 

Links for further reading: 

http://creativecommons.org.au/content/licensing-flowchart.pdf

https://chooser-beta.creativecommons.org

https://creativecommons.org/licenses/

https://www.dfg.de/foerderung/info_wissenschaft/2014/info_wissenschaft_14_68/

https://openaccess.mpg.de/Berliner-Erklaerung

https://www.forschungsdaten.info/typo3temp/secure_downloads/72984/0/cda7fa0aa53a45b87c0f97d34c3c96ab7b1e7346/Leitfaden_-_Verantwortungsvoller_Umgang_mit_Forschungsdaten.pdf

For the postitions of Leibniz, EU, and others, see e.g.:

https://www.mdc-berlin.de/system/files/migrated_files/fiona/ag-oa_0.pdf

🇪🇸->🇩🇪 Data Transport from OT to KIS before Rucio

Peter Caligari

Background

As long as on-the-spot campaigns were possible, data between OT and KIS was transported using external hard-drives. The latter also served as backup-disks for this valuable data. This raw data was then copied to the central storage at KIS, mainly used to process it and produce higher-level products. The latter were then written back to external disks and tapes again to free space for further data processing.

Similarly, OT's central storage is meant as temporary storage between observation and transport of the raw data to the observer's home institution. As of the beginning of 2021, we will have about 150 TB (usable) disk space.  

Current Situation

Due to Covid-19, all observations at OT are done remotely only. Contrary to our expectations at the beginning of 2021, it currently looks more likely that the current lockdown restrictions will be tightened rather than relaxed. Coordinating the data transport of different campaigns over the network becomes a pressing issue. We, therefore, decided to mirror /instruments at OT to the same directory at KIS. Whatever data is written to the former will be replicated with the best effort to KIS. Data on the KIS side is read-only, retains permissions and ownership of the original data on at OT, and will remain even if the source data at OT is deleted. Copying data to other directories at KIS for further processing should not consume disk space, as identical data is really kept only once on the disks (the technology behind that is called deduplication)

Problems with this approach

While entirely automated (and as such very comfortable from a users point of view), there are several drawbacks of this approach: 

  1. Any data is copied: excellent data just as well as rubbish. According to what was said above, once transferred to KIS, the latter cannot be deleted there again. 

  2. Some data is post-processed right at OT (after arriving on /instruments), and only the outcomes of this process are worth being transported. The procedure described above is prone to unnecessarily copying the unprocessed data to KIS and would not consider the processed data for transport (unless copied back to /instruments on Tenerife).

  3. All raw data is copied. Even data from partners or international campaigns that were not intended for KIS but would instead have been transported directly to their respective home institutions is copied.

  4. The procedure used for copying is intended for an entirely different use case. It's meant to maintain an off-site copy on a best-effort basis for disaster recovery. As such, it does not bother signalling the successful replication of individual files; what makes it to the replica before failure, made it, and about those files that did not, nothing can be done anyway. In our scenario, that means it becomes difficult to see when a file was replicated entirely (and can thus be deleted at OT to free up space there for buffering data from future observation). 

To cope with these problems, we might again (like already proposed in 2020) introduce an intermediate folder for copying data to KIS. We would then not automatically copy any data in /instruments but any data copied to that folder. That would allow for post-processing on-site and have only the outcome of that process transferred to KIS. Data not intended for KIS or data not being worth being kept would simply not be copied at all to that folder. However, this reduces the degree of automation as an additional manual step is required. 

Once this intermediate directory is set up, we will announce the switch away from copying /instruments by mail. For the time being, one would still have to delete successfully replicated data by hand from the intermediate directory on Tenerife. 

In late 2020 we tried to automate the process of checksumming the original data on Tenerife, replicating it to KIS, and, upon success, removing the original on Tenerife. This, however, turned out to be quite tricky to implement, as it requires the synchronization of two totally independent processes at two sites with no shared access. Even though we failed then, we might look into that effort again. Should we succeed this time, we'll let you know.  

Space requirements & costs

Tenerife's disk space can accommodate data from around 3 campaigns and thus sufficient if data is deleted promptly.
We have looked at different solutions to store new data from campaigns in 2021 at KIS before SDC V1.0 is fully established. A simple expansion of the existing system is impossible due to a manufacturer's switch in technology (DELL/EMC). One would have to buy a completely new cluster (again with minimal redundancy and size), making this option rather expensive. The price per TB usable disk space range from approximately 360 €/TB to 800 €/TB (including VAT).
We also looked into moving some dormant data (data not accessed for more than, let's say, one year) to the public cloud. Costs there, however, strongly depend on the access pattern; should, against the established access pattern of that data arise the need to download big junks of that data, traffic costs become prohibitive. A traffic-independent flat-rate amounts to about 5000€/month for 300 TB cloud storage; on top of that, license costs would need to be added to access this outsourced data seamlessly (those would not be required, however, if we used a local cloud-appliance that consists of many slow disks; the third variant we looked at). The price per TB/month ranges from approximately 60 €/TB to 200 €/TB (including VAT). The latter is a traffic-independent flat rate.
Please note that SDC will solve this problem: All raw data and most of the large static simulation outputs will go to SDC, freeing up enough space on the yet existing system for every-days work. So once SDC is up and running, there's no need to buy storage for this usage in the foreseeable time. Investing considerable amounts of money just to integrate smoothly with the existing system without a need in the future is a waste.

Solution

We, therefore, decided to buy a first storage node of SDC and use it as temporary storage for new raw data and existing static data. SDC nodes consist of 19" two-socket servers with 24x16 TB disks, two SSDs for caching, and a separate disk mirror for the system. One tier at SDC will span several of these. We might well already use the technology envisioned for each node in SDC to present this temporary solution to the network. This would give us storage at a price of around 170€/TB (usable and including VAT).

Products & Tools

🛠 SDC Products & Tools

GRISView (working title)

Taras Yakobchuk

Image Removed

GRISView (working title) is an upcoming visualization and analysis tool to work with calibrated GREGOR/GRIS observational datasets. It is intended to facilitate easy data preview and present interactive tools for quick plotting, analysis and export. Tested features in the first release will include:

  • Advanced view, pan and zoom functions for map images and spectra

  • Both single map and time-series observations support

  • Multiple POI (point-of-interest) and ROI (rectangle-of-interest) to study spatial features

  • Interactive map isocurves generation and profile cuts

  • Distance measurements between map pixels in different units

  • Spectral line identification, markers and relative wavelength scale

  • Supports data format that is distributed by the SDC web archive

  • Written entirely in Python with GUI using Qt cross-platform framework

    Project Status


    SDC Project Status 03-2021


    Petri Kehusmaa

    Inc drawio
    zoom1
    simple0
    pageId219152389
    custContentId218824807
    lbox1
    diagramDisplayNameUntitled Diagram.drawio
    hiResPreview0
    baseUrlhttps://leibniz-kis.atlassian.net/wiki
    diagramNameUntitled Diagram.drawio
    imgPageId212008969
    pCenter0
    aspect12e1b939-464a-85fe-373e-61e167be1490 1
    width924
    includedDiagram1
    aspectHashe2dc1af971366644803e9ec7f18c9983475ec2d4
    linksauto
    tbstyletop
    height421

    Solution Analysis and Design Phase steps.

    SDC Project team has been working hard to find the best possible hardware and software components to build a robust platform for the solar community. The project has now entered a phase where we are creating a detailed solution design meaning that we have already identified many technical pieces which are going to be included in the final version of SDC. The team is now trying to find the best possible ways to integrate these different pieces together. This means a lot of investigations on technical details and testing different scenarios.

    📋 Summary

    Current project health

    Current project status

    Project constraints

    Status
    colourGreen
    titleGREEN

    “Create Detailed Solution Design” phase in progress.

     

    Resources and their availability.

    Technology POCs taking more time than predicted.

     📊 Project status

    Tip

    Accomplishments

    • High-level solution design

    • Started collecting data policies

    • Clarified embargo policies

    • Listed essential use cases for SDC

     

    Next steps

    • Continue selecting solution components and planning component integrations

     

    Warning

    Risks & project issues

    • Lack of resources

    • Resource availability

    • Multiple process implementations at the same time

    Governance


    👩‍⚖️ Policies, Frameworks & Governance


    📜

    Products & Tools


    🛠 SDC Products & Tools


    Nazaret Bello Gonzalez

    SDC data archive

    https://sdc.leibniz-kis.de/

    Get access to data from GRIS/GREGOR and LARS/VTT instruments and the ChroTel full-disc telescope at OT.

    Updates as of April 2021

    • Overview Calendar fields are now clickable and will link you to an overview of all observations performed on a given day

    • HTTPS has been implemented port numbers have been removed from URLs

    Speckle reconstruction 

    https://gitlab.leibniz-kis.de/sdc/speckle-cookbook

    This tutorial helps the user run KISIP (Wöger & von der Lühe, 2008) on her favourite BBI and/or HiFI imaging data.
    Contact: Vigeesh Gangadharan (vigeesh@leibniz-kis.de)

    Coming soon: 

    A Jupyter Notebook to assist the user on VFISV inversions for GRIS data by Vigeesh Gangadharan , including features like, e.g., wavelength calibration. Stay tuned!

    Conferences & Workshops


    📊 Conferences & Workshops


    Nazaret Bello Gonzalez

    Forthcoming Conferences/Workshops of Interest 2021

    Every second Thursdays, 12:30-13:30 CET

    PUNCH Lunch Seminar (see SDC calendar invitation for zoom links)

    • 11 Feb 2021: PUNCH4NFDI and ESCAPE - towards data lakes

    • 25 Feb 2021: PUNCH Curriculum Workshop

    April week 12-16 (3 days, TBD)

    ESCAPE WP4 Technology Forum 

    June 01-02 (16:00 - 17:30)

    15th International dCache Workshop

    June 10-11

    3th International Workshop on Science Gateways |  IWSG 2021

    Topics:

    • Architectures, frameworks and technologies for science gateways

    • Science gateways sustaining productive collaborative communities

    • Support for scalability and data-driven methods in science gatewayS

    • Improving the reproducibility of science in science gateways

    • Science gateway usability, portals, workflows and tools

    • Software engineering approaches for scientific work

    • Aspects of science gateways, such as security and stability

    June 28, 2021:

    Data-intensive radio astronomy: bringing astrophysics to the exabyte era

    Topics: 

    • Data-intensive radio astronomy, current facilities and challenges

    • Data science and the exascale era: technical solutions within astronomy

    • Data science and the exascale era: applications and challenges outside astronomy

    SDC participation in Conferences & Workshops

    Nov. 26, 2020:

    2nd SOLAR net Forum Meeting for Telescopes and Databases

    Talk:  Big Data Storage -- The KIS SDC case, NBG, PC & PK, 2nd SOLARNET Forum (Nov 26)
    Nazaret Bello GonzalezPetri Kehusmaa Peter Caligari

    SDC Collaborations


    🤲 SDC Collaborations


     Nazaret Bello Gonzalez

    SOLARNET https://solarnet-project.eu

    KIS coordinates the SOLARNET H2020 Project that brings together European solar research institutions and companies to provide access to the large European solar observatories, supercomputing power and data. KIS SDC is actively participating in WP5 and WP2 in coordinating and developing data curation and archiving tools in collaborations with European colleagues.
    Contact on KIS SDC activities in SOLARNET: Nazaret Bello Gonzalez nbello@leibniz-kis.de

     ESCAPE https://projectescape.eu/

    KIS is a member of the European Science Cluster of Astronomy & Particle Physics ESFRI Research Infrastructures (ESCAPE H2020, 2019 - 2022) Project aiming to bring together people and services to build the European Open Science Cloud. KIS SDC participates in WP4 and WP5 to bring ground-based solar data into the broader Astronomical VO and the development tools to handle large solar data sets. 

    Contact on KIS SDC activities in ESCAPE: Nazaret Bello Gonzalez nbello@leibniz-kis.de

     

    EST https://www.est-east.eu/

    KIS is one of the European institutes strongly supporting the European Solar Telescope project. KIS SDC represents the EST data centre development activities in a number of international projects like ESCAPE and the Group of European Data Experts (GEDE-RDA).

    Contact on KIS SDC as EST data centre representative: Nazaret Bello Gonzalez nbello@leibniz-kis.de

     

    PUNCH4NFDI https://www.punch4nfdi.de

    KIS is a participant (not a member) of the PUNCH4NFDI Consortium. PUNCH4NFDI is the NFDI (National Research Data Infrastructure) consortium of particle, astro-, astroparticle, hadron and nuclear physics, representing about 9.000 scientists with a Ph.D. in Germany, from universities, the Max Planck Society, the Leibniz Association, and the Helmholtz Association. PUNCH4NFDI is the setup of a federated and "FAIR" science data platform, offering the infrastructures and interfaces necessary for the access to and use of data and computing resources of the involved communities and beyond. PUNCH4NFDI is currently competing with other consortia to be funded by the DFG (final response expected in spring 2021). KIS SDC aims to become a full member of PUNCH and federate our efforts on ground-based solar data dissemination to the broad particle and astroparticle communities.

    Contact on KIS SDC as PUNCH4NFDI participant: Nazaret Bello Gonzalez nbello@leibniz-kis.de & Peter Caligari mailto:cale@leibniz-kis.de 

     IT news


    🖥 IT news


    Peter Caligari

    Ongoing & Future developments

    Webpage

    Status
    colourYellow
    titleKIS
    The restart of our web page continues. We now have the design agency's first drafts for various page types. The page tree has been adapted to reflect the new departmental structure of the KIS.

    I put both, the preliminary designs and the envisioned page tree in our cloud for anybody to inspect and comment on. Comments are really welcome. I will collect and focus on them and if feasible try to include them in the next design round.

    Network

    Status of the dedicated 10 Gbit line between KIS & OT

    Status
    colourYellow
    titleKIS
    Status
    colourPurple
    titleOT
    The 10 Gbit link between OT and KIS (see previous newsletter 01/21) is nearly finished. All intermediate networks are linked and the line is already at VTT.

    We still need to connect it to the switches at OT, though and need to buy some network equipment for the connection between the University of Freiburg and KIS. We expect the line to be functional within a few weeks. We’ll keep you updated.

    Test of (application) firewalls at KIS

    Status
    colourYellow
    titleKIS
    Status
    colourPurple
    titleOT
    We meanwhile have an appliance of the new firewall for testing at KIS. Physically it sits before our current firewall towards the internet. It’s hardly actively filtering but just monitors our traffic and alerts us of any problems. This setup allows us (IT) to get used to this kind of machine.

    We might not buy exactly this machine (this one is a mid-sized appliance from Paloalto-Networks). This is but a first test of this kind of routers/firewalls for IT at KIS.

    We expect the test to terminate in the next few weeks so that we can make a decision on which machine we would like for OT and KIS and how and when to buy them.

    Storage

    Status
    colourYellow
    titleKIS
    Status
    colourRed
    titleSDC
    We are currently in the process of ordering the first storage node for SDC. This node consists of a DELL R740XD2 (2 CPUs, 24x16 TB disks, 10 Gbit Ethernet). The price (including VAT) per usable TB storage is of the order of 160€/TB.

    We will use this machine as a test-bed for the technology envisioned for SDC and already have all raw-data from observations in 2021 as well as the large files from simulations accumulating on mars stored there.

    SDC will consist of at least 4 similar nodes. This is the first one. As soon as the remaining hosts are setup we will move any data still on this first host to the new SDC cluster and join it to the latter, also.

    Status
    colourRed
    titleSDC
    In parallel, we are looking into outsourcing seldomly accessed files to the public cloud. Within the framework of the SDC, it is planned to use the latter mainly to flexibly cover short-term peaks in demand. 

    The costs per TB of storage space in the cloud are strongly dependent on capacity and, above all, the access pattern. They vary between approx. 60-200 €/TB/a. Access-independent models, in which only a fixed fee is charged per stored GB, but no fees for downloading or uploading, are at the upper end of this scale. At the lower end are public providers such as Amazon, Google and Microsoft, which charge a relatively high fee for each type of data access in addition to the (relatively cheap) price of simple storage.

    Additionally, licence fees of a similar magnitude for the software that moves files between the cloud and the local storage at the KIS are required. 

    We are currently obtaining concrete offers to outsource 100 TB for 1 year to a public cloud. The pricing models are so complicated that we can determine the resulting costs only through a limited real-world test. 

    We will intentionally design the integration so that it will become apparent to all users which files are in the cloud and which are not. Although this is cumbersome (and artificially induced), we deem this awareness essential (at least initially, where we have no experience of the potential costs involved). The exact model is still to be worked out, and we will inform you about it again in due course. 

    Status
    colourPurple
    titleOT
    The two new nodes for jane arrived at OT. The installation will be done as soon as either Peter Caligari can travel there or we get a technician of DELL up to the telescopes. Due to Covid-19, the time scale for this installation remains unclear. We will keep you informed.

    Current Resources

    Compute nodes

    hostname

    # of CPUs & total cores

    ram [GB]

    patty

    Status
    colourYellow
    titleKIS

    legs & louie
    Status
    colourYellow
    titleKIS
    (installed but not publicly available yet. Nearly there…)

    2 x AMD EPYC 7742, 128 cores

    1024

    itchy & selma

    Status
    colourYellow
    titleKIS

    4 x Xeon(R) CPU E5-4657L v2 @ 2.40GHz, 48 cores

    512

    scratchy

    Status
    colourYellow
    titleKIS

    quake &halo
    Status
    titleKIS/seismo

    hathi

    Status
    colourPurple
    titleOT

    4 x Intel(R) Xeon(R) CPU E5-4650L @ 2.60GHz, 32 cores

    512

    Central storage space

    Total available disk space for /home (

    Status
    colourYellow
    titleKIS
    Status
    colourPurple
    titleOT
    ), /dat (
    Status
    colourYellow
    titleKIS
    Status
    colourPurple
    titleOT
    ), /archive (
    Status
    colourYellow
    titleKIS
    ), /instruments (
    Status
    colourPurple
    titleOT
    )

    name

    total [TB, brutto]

    free [TB, brutto]

    mars

    Status
    colourYellow
    titleKIS

    758

    39

    quake

    Status
    titleKIS/seismo

    61

    0

    halo

    Status
    titleKIS/seismo

    145

    44,5

    jane

    Status
    colourPurple
    titleOT

    130 (-> 198)

    23


     References

    📎 References

    Products & Tools

    Forthcoming Conferences/Workshops

    Collaborations