[RFC] EON: A simpler configuration format
April 23, 2022
Do you like YAML’s relative readability but dislike all the ways a user could shoot themselves in the foot? If you’re like me, you probably agree that it’s not the user’s job to remember that the file format they’re using automatically converts true
, on
, yes
, or y
to a boolean TRUE1. In other words, the semantics of the token true
is not inherent to that token, but is determined by the field you’re currently configuring.
For example, say you want a list of bash commands, where each command is represented as a list, with commands and arguments separated out:
commands:
- [true]
- [yes]
- [chmod, 0700, foo.txt]
- [git, add, *.txt]
This currently has the following issues:
- The first command would be the boolean
true
, not the Bash commandtrue
- The second command would also be the boolean
true
, not the Bash commandyes
- Depending on your YAML library implementation, the third command may parse
0700
as the number700
or448
- The last command would throw an error, since
*
starts a YAML alias, and.
is not a valid character in a YAML alias.
As the user, this is very much violating the principle of least surprise. As the developer, I can’t do anything about this, since YAML does the type-conversion automatically before I get the value at all. So there’s absolutely nothing I could do to get the user’s actual input (e.g. I have no way of knowing if the user typed in yes
); the onus is on the user to quote the value.
Introducing the Easy Object Notation format
EON files are suffixed with .eon
and should generally work with YAML syntax highlighting. Conceptually, EON parses to a JSON object2 containing objects, lists, strings, and nothing else. EON also provides a standard specification for other types, and EON implementations should provide parsers out-of-the-box for these types, but conceptually, all other types are parsed from strings.
This allows the user to write x: true
, and the developer to determine whether that true
is a boolean or a string. The core intuition here is that you generally aren’t decoding data in a vacuum; you’re generally deserializing some object that knows the types it wants for the fields. If you’ve ever decoded enums from JSON, you’re probably already doing this extra step anyway.
This is still a rough draft, but it covers all the major points. If there’s even a modicum of interest in this, I might put up an actual spec.
-
Objects are basically the same as YAML: can use either indentation-based keys or
{}
syntax- Duplicate keys is an error
- Order not guaranteed
- Keys may optionally be quoted, to allow a key to include characters like
:
-
Lists are also basically the same as YAML:
-
or[]
syntax for lists -
All scalars are initially strings. Developers decide how to deserialize strings per-field.
- There would be some first-class support for multiline strings, without needing quotes or
long_description: |
syntax or anything. - Scalars may optionally be quoted, to allow a string to start with
[
or{
or something
- There would be some first-class support for multiline strings, without needing quotes or
-
EON implementations must provide parsers out-of-the-box for basic types:
- Boolean:
true
andfalse
(case insensitive) map to TRUE and FALSE - Integer: arbitrary precision, supports binary, octal, hex
- Float: arbitrary precision, supports scientific notation
- Null:
null
should map to the language’s idiomatic representation of NULL - Future: datetime types like TOML?
- Boolean:
API examples
Some potential API ideas for parsing the given config:
str_field: hello world
multi_field: { a: 1.5, b: 2.2 }
list_field:
- 1
- 2
- 0xDEADBEEF
baz:
state: ON
required_nullable: null
Python
@dataclass
class Config:
str_field: str
optional_bool: bool | None
multi_field: float | dict[str, float]
list_field: list[int]
baz: Baz
@classmethod
def __decode_eon__(cls, v: eon.Value) -> Config:
o = v.object()
return Config(
str_field=o["str_field"].string(),
optional_bool=o.optional(
"optional_bool",
eon.boolean(),
),
multi_field=o["multi_field"].one_of(
# helper equivalent to: lambda v: v.float()
eon.float(),
eon.object_of(eon.float()),
),
list_field=o["list_field"].list_of(
eon.integer(),
),
baz=o["baz"].custom(Baz),
)
@dataclass
class Baz:
state: State
required_nullable: str | None
@classmethod
def __decode_eon__(cls, v: eon.Value) -> Baz:
o = v.object()
return Baz(
state=o["state"].custom(State),
required_nullable=o["required_nullable"].one_of(
eon.string(),
eon.null(),
),
)
class State(Enum):
ON = "ON"
OFF = "OFF"
@classmethod
def __decode_eon__(cls, v: eon.Value) -> State:
s = v.string()
try:
return cls[s.upper()]
except KeyError:
raise eon.DecodeError(f"Invalid State: {s}")
def main():
cfg = eon.loads(s, Config)
print(cfg)
Typescript/Javascript
type Config = {
str_field: string
optional_bool?: boolean
multi_field: number | Record<string, number>
list_field: number[]
baz: Baz
}
type Baz = {
state: State
required_nullable: string | null
}
type State = 'ON' | 'OFF'
const decodeConfig = (v: eon.Value): Config => {
const o = v.object()
return {
str_field: o.str_field.string(),
optional_bool: o.optional_bool?.boolean(),
multi_field: o.multi_field.oneOf(
// helper equivalent to: (v) => v.float()
eon.float(),
eon.objectOf(eon.float()),
),
list_field: o.list_field.list_of(eon.integer()),
baz: o.baz.decodeWith(decodeBaz),
}
}
const decodeBaz = (v: eon.Value): Baz => {
const o = v.object()
return {
state: o.state.decodeWith(decodeState),
required_nullable: o.required_nullable.oneOf(
eon.string(),
eon.null(),
),
}
}
const decodeState = (v: eon.Value): State => {
const s = v.string()
switch (s) {
case 'ON':
case 'OFF':
return s
default:
throw new eon.DecodeError(`Invalid State: ${s}`)
}
}
const main = () => {
const cfg = eon.parse(s).decodeWith(decodeConfig)
console.log(cfg)
}
Haskell
data Config = Config
{ strField :: Text
, optionalBool :: Maybe Bool
, multiField :: Either Double (Map Text Double)
, listField :: [Int]
, baz :: Baz
}
deriving (Show)
data Baz = Baz
{ state :: State
, requiredNullable :: Maybe Text
}
deriving (Show)
data State = ON | OFF
deriving (Show)
instance FromEON Config where
parseEON = parseObject "Config" $
-- the type to parse a field as is determined
-- automatically by FromEON instance resolution
Config
<$> parseField "str_field"
<*> parseField "optional_bool"
<*> parseField "multi_field"
<*> parseField "list_field"
<*> parseField "baz"
instance FromEON Baz where
parseEON = parseObject "Baz" $
Baz
<$> parseField "state"
<*> parseField "required_nullable"
instance FromEON State where
parseEON = parseText >>= \case
"ON" -> pure ON
"OFF" -> pure OFF
s -> fail $ "Invalid State: " <> show s
main :: IO ()
main = either (error . show) print $ Eon.decode s
Miscellaneous
These are some random musings I have that would be nice to include in the spec, but I’m not tied to them.
-
I would probably enforce that lists are indented one more level, e.g.
a_list: - a - b - c
and disallow
a_list: - a - b - c
since it annoys me that there are two ways, and two developers editing the same file might be inconsistent with the indentation
-
Smart quotes have bitten people before when copy-pasting, would probably want to require smart quotes or apostrophes to be quoted, always
Comparison to other formats
- XML is a bit verbose for me, and very difficult to read. The line between attributes and subnodes are a bit fuzzy, and it’s annoying to have to repeat the whole tag name to open and close the node. One thing that XML shares with EON, though, is that it also treats node contents as strings, with type deserialization being defined in the application.
- JSON is really good for transferring data between machines, but is not good for user generation + consumption. It doesn’t support comments, it doesn’t support trailing commas, and, while unambiguous in terms of what things parse to, is relatively hard to read.
- YAML is rather nice, but the spec includes way too many features, and has some confusing behaviors. Feature-wise, it supports anchors (making
&
and*
reserved characters), type-defining pragmas (which led to arbitrary code execution in the initial Python implementation), and automatic value conversions ([x,y,z]
=>["x", true, "z"]
), which are way too featureful, A. for what I want to support in a config file, and B. for library implementers to be completely spec-compliant. - TOML is really good for flat-ish configs, but if you want to nest lists and objects a few levels, TOML really starts to stretch to its limit.
- Typed configuration languages like Dhall or Nickel have a lot of syntax, are a bit difficult to learn, and, similar to YAML, is a bit too featureful than I need/want.
Yes, obligatory XKCD:
I think all of the above formats are really good for certain things, but I think the one thing currently lacking is a readable format (like YAML) with very simple semantics (unlike YAML). EON fills this gap, and introduces a new concept of a configuration format that doesn’t force users to know about types. If you have a field where you really need to allow the user to specify if they mean the number 1
or the string "1"
, then sure, go with YAML. But I would imagine 90% of the time, the user doesn’t need the distinction, which is where EON’s simplicity shines over YAML.
Next steps
What are your thoughts? Would you find this format useful? Vote and/or comment here!
Huge thanks to Paul Craddick + Greg Lorence for fleshing out some of these thoughts with me.
-
Yes, YAML 1.2 only allows
true
orfalse
now, but does your YAML parsing library using 1.2? If your library useslibyaml
, you’re still on 1.1 ↩ -
Actual implementation may avoid the intermediate JSON object for performance reasons. But this is the conceptual model that one should have. ↩