Skip to content

XML tag reading

Nightinggale edited this page Oct 4, 2018 · 1 revision

The info classes all have a read(pXML) function to read xml data into the C++ class variables. This page will explain the different ways to do it, how incorrectly data from xml can cause bugs and how to detect it while reading.

Vanilla

GetChildXmlValByName

This is a simple read contents of specific tag. It works like:

pXML->GetChildXmlValByName(var pointer, "tag name", default = 0);

It's overloaded, meaning it works for int, bool, string etc. The variable is assigned the default value if the tag isn't present in xml. No automatic assert checks, but none are really needed.

Reading Type tags

Sometimes the xml tag is a string, which is a type from another file. Vanilla does this in two lines:

pXML->GetChildXmlValByName( szTextVal, "BuildingType");
m_iBuilding = GC.getInfoTypeForString(szTextVal);

Don't do this. Use GetEnum instead (explained further down). The problem is that there is no real verification in this. If the tag is missing or empty, it just sets -1. If it's present, the index will be read, but there is no check for which file. The code example clearly expects a result of BuildingTypes. If some xml modder writes a string, which is of type BuildingClassTypes, it will get the index and assume it to be the index in buildings and we have a bug because it's pointing to the wrong building.

Another example could be somebody writing UNIT_FOOD instead of YIELD_FOOD. It's fine and it will store the index of UNIT_FOOD. Later when the tag is used, it could be looking up the index in an array, possibly the vector with CvYieldInfos and because UNIT_FOOD has an index higher than NUM_YIELD_TYPES, the game will crash with an out of bound error.

Arrays

Vanilla does this with SetVariableListTagPair. It sets a C style array to a specific length and assigns some default value to it. After that it will load xml data and write it into the array.

It works when used correctly, but just as the GC.getInfoTypeForString, it leaves a lot of room to mess up in xml, possibly crashing the game. This one too will not provide useful assert messages if it even asserts at all before crashing.

Use InfoArrays.

Go to child list

if (gDLL->getXMLIFace()->SetToChildByTagName(pXML->GetXML(), "tag name"))
{
    // some read code here
    gDLL->getXMLIFace()->SetToParent(pXML->GetXML());
}

This code will open a tag and make it possible to read the child tags, meaning we can use xml in a file system like hierarchy. Set to parent returns back to the starting point and is very important. You will likely run into problems if you return in a different level than the function was called from.

The reason why it's in a if statement is because it returns false if the tag isn't present, in which case going back to the parent would mess up xml reading.

This layout can be used to group tags together if they are at least somewhat related and avoid having a long list of not really related tags in random order. It can also be used for various types of lists, though list reading should ideally be done with InfoArray.

Modded interface

The CivEffect branch has added some functions/classes to deal with the shortcomings of vanilla.

GetEnum

Function to load a string and look up the index in the file it's from. It's called like this:

pXML->GetEnum(const char* szType, T *eEnum, const char* szTagName, bool bMandatory = true)
  • szType should always be getType() and is used for assert messages.
  • The enum can be any enum which is assigned to a file, like UnitTypes
  • szTagName is the name of the tag to read
  • bMandatory tells if an assert should be used if no index is found

What it does is it reads the string A from the xml tag. It then loops the file in question to locate the index where getType() == A. This will ensure that the xml modder has used the correct file. The enum can be of various types like UnitTypes, BuildingTypes, YieldTypes etc. This supports the files listed in the enum JITarrayTypes, meaning it currently supports 42 files.

It asserts on failure to locate index if bMandatory is set and it asserts on a string present, which is other than empty or NONE. The assert messages aim to be so detailed that xml modders should be able to figure out what went wrong without any help other than the xml file itself and the assert message.

InfoArray

This is an array, which is written to handle arrays read from xml. It will create a bunch of tokens, each consisting of 1-4 16 bit ints. The type of each int and how many are used are set by the InfoArray constructor. It uses JITarrayTypes for types and extend it to various non-xml file types as well.

This means you use a get function, which takes two arguments. First is the token index, the next is the variable in that token.

This means if we set the constructor to JIT_ARRAY_PROFESSION, it will be a list of professions and the second argument in get is always 0. If the constructor is JIT_ARRAY_PROFESSION, JIT_ARRAY_INT, it becomes 0 = profession, 1 = int.

When calling the read function, it will read the given tag and then walk through the child tags to read the tokens. It just reads the tags in order and doesn't care much about the layout.

Take for instance an InfoArray of type JIT_ARRAY_PROFESSION.

<professions>
    <profession>PROFESSION_FARMER</profession>
    <profession>PROFESSION_MINER</profession>
</professions>

Now take an InfoArray of type JIT_ARRAY_PROFESSION, JIT_ARRAY_PROMOTION

<professions>
    <entry>
        <profession>PROFESSION_FARMER</profession>
        <promotion>PROMOTION_A</promotion>
    </entry>
    <entry>
        <profession>PROFESSION_FARMER</profession>
        <promotion>PROMOTION_B</promotion>
    </entry>
</professions>

This will be read because it goes JIT_ARRAY_PROFESSION, JIT_ARRAY_PROMOTION, JIT_ARRAY_PROFESSION, JIT_ARRAY_PROMOTION, hence reading two entries. However the same can also be written:

<professions>
    <entry>
        <profession>PROFESSION_FARMER</profession>
        <promotions>
            <promotion>PROMOTION_A</promotion>
            <promotion>PROMOTION_B</promotion>
        <promotions>
    </entry>
</professions>

This works because once the token is full, it will create the next one, but reuse how the start from the parent. In this case the parent was called with PROFESSION_FARMER already set, meaning all child tags of promotions will get a PROFESSION_FARMER prefix. The result is the same, two entries, both with farmer and one with A and one with B.

Reading xml will assert if the strings aren't from the specified files, meaning it's hard to cause problems in xml, which will not trigger an assert error, which will provide a useful error message.

Using an InfoArray in C++ will loop the entries. You can ask for a specific entry, but it's not optimized for that. If you really want to be able to ask for specific entries, consider using BoolArray or JustInTimeArray. Both can take an InfoArray as argument and fill out data from it. This will allow looking up specific entries without looping through all entries.