scheme-markup-language

module

v0.1.0 Latest Latest Go to latest Published: Apr 18, 2024 License: GPL-3.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/Indus-Action-Initiatives/scheme-markup-language

Links

Open Source Insights

README ¶

Scheme Markup Language (SML)

Why?

SML is needed to manage the documentation and criteria of the schemes. A text file is the abstraction chosen to encapsulate the details about a scheme. It allows for version control, readability, easy extensions and standard software engineering principles like peer reviews, roll back, testing before productionising etc.

We have chosen to represent the SML as a indentation-sensitive key-value representation.

Structure

Following are the components of a standard SML setup:

Project
Tables
Dataset
Schemes

Project

A project is the outermost container holding all the logic and documentation regarding any scheme eligibility. It contains table files, dataset files and schemes files. There can be multiple table files and scheme files. But there has to be exactly one dataset file in the project. A project can be backed by a git repository to allow for version control and peer reviews, environments etc. You can imagine the project as the root directory in a filesystem.

Table

Table files capture semantics of the logical entities from the beneficiary database. Usually it has one-to-one mapping with the beneficiary table. But depending on the structure of the schemes and beneficiary information structure, there can be multiple logical table. In this case, there will be multiple table files in the project e.g. if certain schemes need family information as well as the individual information for the same criteria, then family and the beneficiary will be a separate table; an example of such a criteria can be family_income < 100000 AND beneficiary_gender = 'Male'

Dataset

A dataset includes entries for all the table files and relationship between them using JOIN conditions. There must be exactly one dataset file in the project.

Scheme

A scheme has two main elements to it:

Set of Criteria: A criteria represents a particular evalutation evaluation criteria for a scheme. The criteria can have nested conditions. For example:

The age for males should be greater than or equal to 21 and for females it should be greater than equal to 18

can be represented by

 criteria age
     combine OR
         combine AND
             column gender
                 operator equals
                 value male
             column age
                 operator gte
                 value 21
         combine AND
             column gender
                 operator equals
                 value female
             column age
                 operator gte
                 value 18

Evaluation: An evalutation is a combination of the criteria in the scheme file using logical operators && and ||. Generally, the evaluation will be a conjunction of all the criteria in scheme file. There must be exactly one evaluation in a scheme file.

Formatting columns

The numeric and date columns can be formatted using the standard Excel-like format strings. For example:

The percentage marks in the tenth & the twelfth class can be formatted as below:

column tenth_percentage_marks
    type float
    sql tenthPercentageMarks
    format %.2f
column twelfth_percentage_marks
    type float
    sql twelfth_percentage_marks
    format %.2f

Auxiliary columns

Sometimes the values needed to evaluate the criteria do not come from the same row. They may come from a different row from the same time. In such cases, we need to generate and execute a query involving a common table expression. For example:

Imagine a table named family_members with each member as separate rows, with a column called family_role denoting the role of the member (father, mother, child, etc.) in the family. Suppose a criteria involves ascertaining if the father or mother of the child has a BOCW card. This criteria can be represented as follows:

criteria parent_bocw
    combine OR            
        column mother_bocw
            table fm
            operator equals
            value True
        column father_bocw
            table fm
            operator equals
            value True
criteria family_role
    column family_role
    table fm
    operator equals
    value child

In the example above mother_bocw and father_bocw are auxiliary columns. Their definitions are as follows:

auxiliary_column mother_bocw
    type bool
    sql SELECT mother.has_bocw_card FROM family_members as mother INNER JOIN families ON mother.family_id = families.id WHERE mother.family_role='mother' GROUP BY 1 LIMIT 1
auxiliary_column father_bocw
    type bool
    sql SELECT father.has_bocw_card FROM family_members as father INNER JOIN families ON  father.family_id = fm.family_id WHERE father.family_role='father' GROUP BY 1 LIMIT 1

Lambda transformers

Sometimes, SQL transformations aren't enough inorder to calculate the values needed to evaluate the criteria. These cases are the ones which are better solved using the more powerful languages like Golang. In EE, such transformations are called Lambda transformations. For example:

Collecting the pregnancy data can be quite tricky during campaigns. In order to get around the potentially offending question about the pregnancy status of a girl child, the survey is designed to collect the names of the pregnant members in the family if any. But now to attribute the pregnancy_status to the potential pregnant family member, one must deduce the precisely figure out which is the member of the family that is pregnant using the names collected during the survey.

This can be using the lambda transformers like the ones below. Following the definition of the column in the table file.

column pregnancy_status
    type status
    transformer lambda
    name populatePregnancyStatus

The corresponding lambda function for the transformer (in Python) is shown as follows:

def populatePregnacyStatus(self, family, beneficiary):
    # get the names of the pregnant women of the family from the pregnancy mapping
    pregnantWomenCombinedDict = getMappedDict(
        config["pregnancyMapping"], beneficiary
    )
    # current data has a provision of only two pregnant women, so pass two
    pregnantWomen = splitCombinedDict(pregnantWomenCombinedDict, 2)

    for woman in pregnantWomen:
        if "name" not in woman:
            return
        name = woman["name"].strip()
        if name == "":
            return
        fuzzyScore, index = fuzzy_matching(name, family["members"])
        if index >= 0 and index < (len(family["members"]) - 1):
            family["members"][index]["pregnancy"] = "yes"
        # elif index == (len(family['members']) - 1):
        #     respondent['pregnancy'] = 'yes'
    for member in family["members"]:
        if member["gender"] == "male":
            member["pregnancy"] = "no"
        elif "pregnancy" not in member or member["pregnancy"] == "":
            member["pregnancy"] = UNKNOWN_STRING

MakeSQL

MakeSQL deals with turning the project into a set of valid SQL statements that can be run against the in-memory DuckDB database, i.e. the sql queries are generated in the DuckDB dialect.

It has two main components:

1. Parser

The SML "parser" takes in the project in SML form and outputs a structured JSON representation of the project. It does more than just parsing including lexical analysis and semantic analysis as well. The SML parser thus also finds the inconsistencies in the project and flags them.

2. SQL Generator

The SQL generator takes in a "parsed" SML project in JSON format, a selection JSON (set of columns, filter strings etc.) and generates valid evaluation SQL to be executed against a populated DuckDB instance.

Please note that itt is not a responsibility of MakeSQL to ensure that the SQL statements specified in the criteria are in the DuckDB namespace. MakeSQL assumes that it is the case.

SAMPLE input.json

TODO

Ensuring that criteria SQL are in the DuckDB namespace

TODO

DDL Queries

TODO

Auto-generating a SML project from a sample input

TODO

Directories ¶

Path	Synopsis
main
sml

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL