DIML (Dot-Indicated Markup Language)

A new way to YAML — a proposition for new data format

Mike R
5 min readNov 3, 2017

Background

I wanted to get some input for an idea and see what other people thought about this,

Problem: I like using YAML to exchange data and configure applications, but some things bother me,

  1. its a syntatic nightmare, if you miss a double space, colon, dash, YAML goes nuts, so a YAML Linter is always necessary to check the syntax
  2. very long and nested data structures are hard to follow, its hard to visually separate 2-space characters in long data trees
  3. Cannot add comments to JSON, if converting YAML to JSON, you lose the comments
  4. to exchange data between machines, YAML file is converted into JSON (which is a subset of YAML), so now there are 2 formats to deal with since JSON looks completely different from YAML (and JSON does not support comments)

YAML Problems

For example, heres a simple YAML data structure of some book characters:

valid YAML

Linting this results in various errors and possible showstoppers,

yamllint (py)

And perhaps the biggest issue I have with YAML is that for any API work, YAML needs to be converted to JSON, so the previous example in JSON would look like this,

prettyfied JSON, note that the comments in YAML file are lost in generated JSON

And a one liner would look even worse,

So for both YAML and its JSON cousin, linters are necessary for basic syntax checking.

Now anyone used to working with these 2 formats and are comfortable with these limitations are probably thinking its not a big deal. But think about the 1st time you saw a large JSON file. Or had to fix a missing colon in that file for it to get processed correctly. My eyes glazed over having to dig through a messy JSON or YAML structure.

What if there is a simpler, cleaner and better way?

Enter DIML

DIML features/benefits:

  1. simple dot-separation of nested values, no dashes, brackets, begin/end characters, no equal or colon characters to denote key : value, the only character ever used is a dot (.)
  2. doesnt care about line breaks
  3. doesnt care about missing line endings
  4. doesnt care about spaces or tabs
  5. same syntax regardless if its a 1-liner or prettyfied, can use the same file for both human readable config and API work
  6. for human readability and basic data check, it should be able to prettify a 1-liner and than do logical syntax checking, for example it will warn in case a logical Key is missing,

For example, this is a valid DIML syntax

valid DIML

The following is an invalid syntax, DIML detects that the City key is not declared (missing a 2-dot in front of ‘city’), so a 2-dot “San Francisco” value is missing its key.

invalid DIML

(note: this will only work in a Prettyfied state, in 1-liner format DIML will process “city” as a continuation of “19D” value, because of the missing dot in front of city)

The way this can work is if in a Prettyfied format, DIML would read a line-break as a logical separation between key and value and then make sure to check if there is a dot present there. In case above, it would logically detect a line break between ‘19F’ and ‘city’ but would not detect a dot in front of ‘city’, and therefore issue a warning to the user. Note that the file would still be valid, it just would have warnings.

Heres an example of a large valid DIML structure, the 3 dots on top declare it to be DIML

And heres an API 1-liner of the same file, DIML interprets a new Key as a Pipe symbol

Comments are included in DIML 1 liner (DIML-OL)

Also, you can add values without line breaks for easier/more compact reading,

compact HR (human readable) DIML (DIML-HR)

Or yet another way,

It should also be easy to parse a large DIML structure without using the usual

for key, value in json_data.items():
print key, ':', value

the above works but for very nested JSONS, you need to do multiple loops,

instead of the above, to print keys and values would be something like this using the above name + address example,

Python example

d = diml.load('mydata.diml')get all keys with 1 dot (dot level 1)
d.get_key(1) #
# first_name
# last_name
# age
# address
get all keys with 2 dots (dot level 2)
d.get_key(2)
# street
# apartment
# city
# state
to get all values of dot level 3
d.get_val(3)
# 125 broadway
# 19D
# san francisco
to get all keys and values of any dot-level, lets say 3
d.get_all(3)
# address.street.125 broadway
# address.apartment.19D
# address.city.san francisco

Finally

keep in mind this is only a hypothetical proposition, no actual work has been done to make DIML a reality. I want to hear some criticism of this concept. Thanks!

--

--