During the last months we’ve all been someway hit by Coronavirus (COVID-19) outbreak, which has slightly changed our habits and our lives. As the official data regarding the virus spread came out, my friend Enrichman had the idea to merge the different sources and expose some static APIs returning geo-based information. Yes, a huge number of repositories, websites, dashboards and visualizers have risen and evolved during this time, but we attempted to do our part in late February, when the virus’ spread was still contained. Anyway, I had the opportunity to help him setting up an automatic flow to push in our repository the updated data published on the official repository we were monitoring. This flow is entirely based on Github Actions, so I’d like to share my experience with this tool.
Github Actions are becoming increasingly popular since they’ve been declared generally available on November 13, 2019. Born to ease Continuous integration and Continuous Delivery flows, Github Actions make it possible to automate any workflow aimed to build, test or deploy any project on any platform simply declaring the actions on a YAML file.
What we did
My main focus on this project has been on automating data updates from official data repositories. To achieve this, we set up a Github Actions workflow that’s available here. Let’s dig into it! We’ll explore it to give a brief overview on how to setup a Github Actions workflow.
First, we have to choose a name for our workflow:
name: Update data from CSSEGISandData
We have now to select the trigger for our workflow: it can be one or more events related to the activity on the repository, such as
fork. It’s also possible to narrow down to a specific branch or a specifc attribute of the event we want to subscribe.
In our specific case, we need a recurrent trigger, so instead of binding our workflow to a specific event we are going to
schedule it using a cron syntax.
on: schedule: - cron: '0 * * * *'
It’s now time to define the core of our workflow, which consists of a set of
jobs. Jobs are executed in parallel by default, but it’s possibile to run them sequentially using the
needs keyword to express a dependency. Each job must have an identifier and a step of parameters for its configuration: we used
runs-on to define the type of machine the job will run on. For example,
env lets you set environment variables used in the job.
jobs: clone-source: runs-on: ubuntu-latest
steps we list the sequence of tasks needed to our scope: they can either be a list of commands or an invocation to an action published in a Docker registry. We used both flavors in our workflow, the former is defined under
run keyword, the latter needs a reference to the action after
uses keyword. There are lots of Actions that help develop any workflow, but you can also implement and submit your own Action.
The following snippet shows an example of both kind of steps: we leveraged an existing Action to checkout a data source repository, and we defined a custom sequence of commands to compile and execute our Go code.
steps: - name: Clone source uses: actions/checkout@v2 with: repository: 'CSSEGISandData/COVID-19' ... - name: Generate new files run: | go build . ./covid19
The complete worklfow syntax reference is available here.
Obviously, this is quite a particular workflow, but this has been a good chance to explore and learn how to use the tool. Github Actions are flexible and highly customizable based on different needs, even if they’re aimed to help automating development workflow, empowering Continuous Integration and Continuous Delivery processes.