Weird "require" error during server init

Summary:
When loading a save game, a server init script gets aborted when one of its require statements returns, for some completely inexplicable reason, the string value of an error experienced earlier during loading. (The aborting when causing a Lua error by treating a string as a table is understandable; the condition that allows this situation to happen is not.)

Steps to reproduce:
Not sure exactly, but I could attach a save game that, depending on what mods were enabled, would reliably cause it. My hunch is that it has something to do with missing entities from manifests and/or missing components on existing entities (e.g., the component was removed from the entity in an update to the mod, but the saved entity still has the component).

Expected Results:
Running require on valid paths should work all the time, including during normal server init script running (e.g., after 'radiant:required_loaded'). The paths load fine under other conditions, so it’s not a problem with that.

Actual Results:
A failed require causes the server init script to fail out, causing a chain reaction of failures because other things aren’t loaded properly.

Notes:
I don’t know how this could be happening without some sort of low-level corruption, like a buffer overflow. I’d love to be wrong on that.

Attachments: Will attach if someone would actually get use out of it. In this case, when require was run on the ace_farming_task_group lua file, the resulting string was this:

stonehearth/components/portal/portal_component.lua:9: attempt to index local 'json' (a nil value)

which was an error experienced from the earlier loading of an entity whose mod was disabled. Once the game finished loading (in a broken-mod state, since many other monkey-patches weren’t applied), running require on that file from the Lua debug console worked normally.

Version Number and Mods in use: Latest SH, latest unstable ACE, and various other mods in this save file.

Maybe someone particularly technically-minded, like @RepeatPan or @max99x, would know what’s going on.

require in Stonehearth is overloaded in radiant/lib/env.lua and not your vanilla require, as it has some magic included for whole path stuff.

I don’t quite understand the error though - a require at some point returns a string instead of the mod? Or the error message? Latter could point towards a pcall going haywire somewhere, and would indicate that the requiree failed.

You can also try to replace this require with a proper _host:require, which is what require internally calls. You’ll need to specify the absolute path though (i.e. including the mod name, like radiant.lib.env). If that also returns a string, and also an error, then I guess the internals are broken.

However, figuring this out will be quite underwhelming - since mods are allowed (I’d guess) to return strings and are not bound by technical limitations to return tables, you can’t just overload require and say if type(mod) == 'string' then error(mod) end - unless you wish to add this as a convention.

Edit: There’s one thing that is dangerous when doing this kind of requiring external scripts: You don’t know what they are doing whenever they are included (e.g. static initializers). Some may assume that the game is already running/some service is already available, and fail if this is not the case (i.e. are not using event-based initialisation).

Hard to tell, but it’s possible that passing unexpected things to error() will mess up the Lua stack. The script host tries to stay alive through errors, but really, you’re playing with undefined behavior at that point.

Earlier during the loading process, the game tried to load some component data for an entity in the save file whose mod was disabled/non-existent, and thus whose json data didn’t exist. The component caused a Lua error, which was then printed in the log by whatever C++ process is running the Lua scripts (and later in the Lua error window in-game after loading finished), and continued trying to load the rest of the game. This included running about 2/3 of the monkey-patching procedure for ACE and inexplicably stumbling on a very simple patch file that works fine on its own, returning the previous error string in place of the normal result of require.

I’m not trying to change general require loading functionality or anything, and I’m not worrying about trying to load someone else’s scripts, I just want my own normal patch files to run fine and not return completely unrelated strings instead. :stuck_out_tongue:

Here’s the code for monkey-patching (monkey_patches is a locally defined table that isn’t modified by anything):

   for from, into in pairs(monkey_patches) do
      local monkey_see = require('monkey_patches.' .. from)
      local monkey_do = radiant.mods.require(into)
      radiant.log.write_('stonehearth_ace', 0, 'ACE server monkey-patching sources \'' .. from .. '\' => \'' .. into .. '\'')
      radiant.log.write_('stonehearth_ace', 0, 'ACE server monkey-patching data \'' .. tostring(monkey_see) .. '\' => \'' .. tostring(monkey_do) .. '\'')
      radiant.mixin(monkey_do, monkey_see)
   end

Here’s a partial result when returning to the main menu from the mods screen:

2019-02-07 13:51:39.237244 |       server |  0 |                             mod stonehearth_ace | ACE server monkey-patching sources 'ace_collection_quest_shakedown' => 'stonehearth.services.server.game_master.controllers.scripts.collection_quest_shakedown'
2019-02-07 13:51:39.237244 |       server |  0 |                             mod stonehearth_ace | ACE server monkey-patching data 'table: 000002AF91852670' => 'table: 000002AF916A8550'
2019-02-07 13:51:39.239201 |       server |  0 |                             mod stonehearth_ace | ACE server monkey-patching sources 'ace_farming_task_group' => 'stonehearth.ai.task_groups.farming_task_group'
2019-02-07 13:51:39.239201 |       server |  0 |                             mod stonehearth_ace | ACE server monkey-patching data 'table: 000002AFA064ADA0' => 'table: 000002AF8FEA2920'

Here’s what happens during that same part when loading save game with some unrelated component json errors:

2019-02-07 13:53:55.835308 |       server |  0 |                             mod stonehearth_ace | ACE server monkey-patching sources 'ace_collection_quest_shakedown' => 'stonehearth.services.server.game_master.controllers.scripts.collection_quest_shakedown'
2019-02-07 13:53:55.835308 |       server |  0 |                             mod stonehearth_ace | ACE server monkey-patching data 'table: 000002B01B7F3B20' => 'table: 000002B01B7F3920'
2019-02-07 13:53:55.837261 |       server |  0 |                             mod stonehearth_ace | ACE server monkey-patching sources 'ace_farming_task_group' => 'stonehearth.ai.task_groups.farming_task_group'
2019-02-07 13:53:55.838238 |       server |  0 |                             mod stonehearth_ace | ACE server monkey-patching data '...components/resource_node/resource_node_component.lua:10: attempt to index local 'json' (a nil value)' => 'table: 000002B01B7F3FA0'
2019-02-07 13:53:55.839215 |       server |  1 |                         lua.code | generating traceback...
2019-02-07 13:53:55.839215 |       server |  0 |                         lua.code | -- Script Error (lua) Begin ------------------------------- 
2019-02-07 13:53:55.839215 |       server |  0 |                         lua.code |    radiant/lib/util.lua:476: bad argument #1 to 'pairs' (table expected, got string)
2019-02-07 13:53:55.839215 |       server |  0 |                         lua.code |    stack traceback:
2019-02-07 13:53:55.839215 |       server |  0 |                         lua.code |    	[C]: ?
2019-02-07 13:53:55.839215 |       server |  0 |                         lua.code |    	[C]: in function 'pairs'
2019-02-07 13:53:55.839215 |       server |  0 |                         lua.code |    	radiant/lib/util.lua:476: in function 'mixin'
2019-02-07 13:53:55.839215 |       server |  0 |                         lua.code |    	stonehearth_ace/stonehearth_ace_server.lua:86: in function 'monkey_patching'
2019-02-07 13:53:55.839215 |       server |  0 |                         lua.code |    	stonehearth_ace/stonehearth_ace_server.lua:120: in function <stonehearth_ace/stonehearth_ace_server.lua:119>
2019-02-07 13:53:55.839215 |       server |  0 |                         lua.code |    	[C]: ?
2019-02-07 13:53:55.839215 |       server |  0 |                         lua.code |    	[C]: in function 'trigger'
2019-02-07 13:53:55.839215 |       server |  0 |                         lua.code |    	radiant/modules/events.lua:55: in function <radiant/modules/events.lua:55>
2019-02-07 13:53:55.839215 |       server |  0 |                         lua.code | -- Lua Error End   ------------------------------- 

It doesn’t matter if you are trying to change how require works, because require already works differently than in the lua manual. No intervention is required for it to be different. I think it’s important to point it out, because the whole process of how lua usually loads a file/script is possibly not applying here.

The error message later on is irrelevant and is caused by you trying to use a string as a table, as you’ve already identified. The first one is more interesting though, and I’ll assume that it’s the Stonehearth vanilla file.

In the end, this is all just guesswork without a simple, reproducible test case that is as small as possible (i.e. specifically not all of ACE).

Assuming we’re talking about this file, then I’m not sure how this patch could work in the first place.

You aren’t returning anything. In the best case, the require would therefore return nothing; which lua will pad with nils. You’ll have a hard time mixing in anything because monkey_see is nil. Not entirely sure how Stonehearth’s mixin deals with that.

However, you get an error message instead, which is odd. I suppose it could be caused by some sort of internal corruption/error (your mod returns nothing, so require is returning whatever error message happens to be on the stack at this time - maybe it has a mechanic that says “if no result, return last_error”, assuming that no result == an error somewhere).

To possibly fix this error, add a return. Assuming that declare_somethingsomething is modifying the original object, rather than creating a deep-copy of the modified one, it’s not even necessary to monkey patch anything - load the class, call the method, and then call it a day. It’s a simple method invocation, and trying to mixing/patch anything here seems like a terrible idea.

But as you can see, in other/normal circumstances it works just fine with that very file. It’s kind of a hacky (isn’t everything about this?) but simple way of patching task groups that generally works and easily fits into the existing system.

I tried removing that particular file from the patch list, expecting it to simply error out on the next one, but instead it worked fine with that save file. Adding a return of an empty table also seems to work fine. Perhaps setting up a separate non-monkey-patching function for this sort of modification in the init script would make more sense. So it does seem to be an issue with returning nil from that function and automatically replacing the nil with something else somehow.