I’ll focus on Amazon’s S3 offering, but there should be many alternatives, i.e. This post tries some filesystems that directly access S3. The conclusion will be a blog post containing a comparison a.k.a. My plan is to try out duplicity, git using transparent encryption, s3-based storage systems, git-annex and encfs+sshfs as alternatives to Dropbox/ Wuala/ Spideroak. It incorporates a well-established integrated CI/CD system, but the Community Edition provides a significantly reduced feature set only.This is part two of a series about encrypted file storage/archive systems. It provides many useful collaboration features like issue tracker, wiki, online editor. It can handle large amount of repositories and users. Gitlab - most relevant to the purposes of DataPLANT is one of the "Big Three" players (GitHub, GitLab, Bitbucket). There are a couple of Free and/or Open Source Software Git Collaboration Platforms like GitLab (Community Edition) and Gitea (community-driven fork of Gogs) out there as well as non-free or service-only like, GitHub (cloud service, on-premises enterprise product available), GitLab (cloud service, on-premises Enterprise Edition) or Atlassian Bitbucket (cloud service only). dvc files (YAML format) and directly supports S3, GCS, SFTP, HDFS or filesystem as a backend. It uses reflinks by default but can also support symlinks, hardlinks or copying. DVC is a popular framework in Machine Learning community and written in Python. Git-annex directly supports a large number of different storage systems. It maintains file information in a dedicated annex branch. It deploys symlinks by default but also supports hardlinks, reflinks or copying of data. It is programmed in Haskell (Git-annex) and Python (DataLad). used by DataLad which is popular in the Neuro Science community. It requires a dedicated server for managing LFS objects. It stores the pointer files in Git and file contents in a special LFS storage. LFS uses reflinks (if possible) or deep copies. It works transparent to the user (Git LFS needs to be installed for that to work, though). It uses the Git Smudge filter to replace the pointer file with the actual file content. Git-LFS is developed and maintained by GitHub and written in the Go language. Git Large File Storage ( LFS), Git-annex / DataLad and Data Version Control (DVC). There are several implementations for this purpose available. The versioning is handled by storing references to externally stored (large) files in Git. The idea is to store smaller (text) files with Git, and larger files outside of Git. Which got introduced to Git 2.25.0, released beginning of this year. A possible solution is to use sparse checkouts for large repositories Git uses sparse clones, sparse checkouts but still performs poorly with larger files. All clones contain the full history by default. It is not centralized by default and does not implement an inherent repo hierarchy. As git was originally created with source code in mind, the plain version is not well suited in this regard as it is implemented as a distributed version control system (DVCS). As the ARC consists of multiple file formats including large files of raw data from various inputs it needs to deal with large files as well. A widely used platform - well beyond it's original purpose of maintaining code in collaborative software projects - is the versioning software Git. The starting point is the Annotated Research Context (ARC) which got presented in an Kick-Off Task Area 2 "Software / Services". This can be achieved through a framework which supports data versioning and sharing. DataPLANT needs a solid technical base for collaboration within projects and between (inter)national research groups.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |