Exploring Github Actions

Introduction

During the last months we’ve all been someway hit by Coronavirus (COVID-19) outbreak, which has slightly changed our habits and our lives. As the official data regarding the virus spread came out, my friend Enrichman had the idea to merge the different sources and expose some static APIs returning geo-based information. Yes, a huge number of repositories, websites, dashboards and visualizers have risen and evolved during this time, but we attempted to do our part in late February, when the virus’ spread was still contained. Anyway, I had the opportunity to help him setting up an automatic flow to push in our repository the updated data published on the official repository we were monitoring. This flow is entirely based on Github Actions, so I’d like to share my experience with this tool.

Github Actions

Github Actions are becoming increasingly popular since they’ve been declared generally available on November 13, 2019. Born to ease Continuous integration and Continuous Delivery flows, Github Actions make it possible to automate any workflow aimed to build, test or deploy any project on any platform simply declaring the actions on a YAML file.

What we did

My main focus on this project has been on automating data updates from official data repositories. To achieve this, we set up a Github Actions workflow that’s available here. Let’s dig into it! We’ll explore it to give a brief overview on how to setup a Github Actions workflow.

First, we have to choose a name for our workflow:

name: Update data from CSSEGISandData

We have now to select the trigger for our workflow: it can be one or more events related to the activity on the repository, such as push, pull_request or fork. It’s also possible to narrow down to a specific branch or a specifc attribute of the event we want to subscribe. In our specific case, we need a recurrent trigger, so instead of binding our workflow to a specific event we are going to schedule it using a cron syntax.

on: 
  schedule:
    - cron:  '0 * * * *'

It’s now time to define the core of our workflow, which consists of a set of jobs. Jobs are executed in parallel by default, but it’s possibile to run them sequentially using the needs keyword to express a dependency. Each job must have an identifier and a step of parameters for its configuration: we used runs-on to define the type of machine the job will run on. For example, env lets you set environment variables used in the job.

jobs:
  clone-source:
    runs-on: ubuntu-latest

Under steps we list the sequence of tasks needed to our scope: they can either be a list of commands or an invocation to an action published in a Docker registry. We used both flavors in our workflow, the former is defined under run keyword, the latter needs a reference to the action after uses keyword. There are lots of Actions that help develop any workflow, but you can also implement and submit your own Action. The following snippet shows an example of both kind of steps: we leveraged an existing Action to checkout a data source repository, and we defined a custom sequence of commands to compile and execute our Go code.

    steps:
    - name: Clone source
      uses: actions/checkout@v2
      with:
          repository: 'CSSEGISandData/COVID-19'
   
    ...

    - name: Generate new files
      run: |
        go build .
        ./covid19

The complete worklfow syntax reference is available here.

Conclusion

Obviously, this is quite a particular workflow, but this has been a good chance to explore and learn how to use the tool. Github Actions are flexible and highly customizable based on different needs, even if they’re aimed to help automating development workflow, empowering Continuous Integration and Continuous Delivery processes.