data | 10 December 2021

Desidata Package for R

A permanent gallery of projects by students at Srishti.

Currently in development

We’ve been learning R at Srishti and one of the issues has been having datasets related to India that we can use for teaching the tidy methods of data wrangling and analysis. The built in datasets are good for most cases but Venkat, my facilitator, is enthusiastic about teaching using India-specific datasets to create visualizations and also to create an archive that can referred to in his other classes.

Now, I love making archival tools

Introducing the desidata package, which provides datasets and documentation right within RStudio! The package has two surfaces; the first written in R which serves the data from the repository’s API and the second written in Python which automates data documentation and addition into the repository.

One of the problems with datasets about India is that they are available in scattered formats and without clear documentation about each variable. This is not ideal for students who are just getting started with working in R and can be a deterrent. The Python command line utility solves this problem by reading the dataset and enforcing documentation from the author about each variable. This is compiled into a neat README and a JSON file which can be accessed by the package. This allows students to not only use the package for searching and downloading datasets, but also access well-written data-dictionaries and get information on sources without leaving RStudio.

As of this writing, the package is still in development (v.0.1) with most of the API functionalities in place.

Snippets

GUI interface for desidata
readme files generated by Python