Week 8: Submodule Implementation

Hi everyone, welcome to the eight blog of my GSoC'25 journey. The week started with a lot of hustle and confusion regarding correct approach for Submodule Implementation but it ended on a good note with a clear direction and perfect approach for Submodule Implementation.

I started this week by rejecting ExternalSymbol approach due to an edge case (link) where Submodule Implementation was lacking Function Signature, args etc. Here, we could not directly point Parent Module Function to Submodule Function as its not complete and will not execute properly. So, we decided to move back to our original Approach where we were directly modifying Parent Module Function.

It took me around 2 days of time to implement this approach. Main Pointers of this approach were :-

In Parent Module Functions, firstly, we convert deftype to implementation from interface.
Then, we add variables and ExternalSymbols, made in Submodule Function to Symbol Table of parent module Function.
We also add variables and ExternalSymbols, of Submodule to Symbol Table of Parent Module.
We then populate body of Parent Module function by duplicating statements of Submodule function to update the scopes of involved symbols.
Finally, loaded from mod flag of parent module is turned to false to re-save updated Parent Module ASR.

Link to Pull Request to implement this approach is Pull #7961. This approach was able to implement submodules in a single file and multiple files. But this approach was also a failure as it failed for the case of Separate Compilation. In Separate Compilation, we directly generate object code of the file we are currently compiling and this object code can't be modified later, once generated. Thus, if Module and Submodule are present in separate files, then it will fail. This was a major drawback for this approach which is why we finally rejected this approach.

After this, I researched a bit to develop a perfect approach for Submodules but was not able to do. With a detailed discussion with Pranav Goswami, we agreed upon a perfect approach where linker would directly link LLVM Functions of module and submodule. So, a LLVM Function would be generated for Module Interface which would be undefined as it lacks the actual implementation. Similary, a LLVM Function would be generated for Submodule Implementation which would be complete with actual implementation. For linker, to link these LLVM Functions, their LLVM function names (indirectly mangling prefixes) should be exactly same. So, we decided to use Parent Module name as Mangling Perfix and update LLVM Function names with these prefix. This thing needs to be done only for Functions or Subroutines where keyword module is used because module keyword means that the Function or Subroutine is either a Module Interface or Submodule Implementation. We don't want affect other Subroutines or Functions with these changes.

Then onwards, I worked to implement this approach. It was not a difficult task and I was able to implement it quite easily. I also, made changes in Unused Functions pass as a part of this approach so that Submodule Implementations are not removed from ASR before LLVM phase. Link to Merged Pull Request to implement this complete approach is Pull #8010. I also handled class interfaces in Parent Module so that its Submodules Implementation links to it by making a small modification in LLVM phase. To avoid any sort of regression, I added integration tests (regular ones and with Separate Compilation) to cover all these progress made in Submodule Implementation. Link to Merged Pull Request for the same is Pull #8013.

With these, our Submodules were executing correctly in a single file and in multiple files but with separate compilation. For multiple files without separate compilation, their was still a problem. In this case, we were not able to generate LLVM Function for Submodule Implementation as in Program, we only import Parent Module which has no links to its Submodule. With a meeting with Ondrej Certik, we identified multiple approaches to tackle this which are :-

For each module we load, we scan for submodule files and use the parent module name to find those and load those.
We store the information for each parent module which submodules it has, either by modifying the parent module file or storing this separately in a file somehow.
We directly provide an option for user to provide the submodule file which is to be used when compiling the main program and then, we simply load it.

Of the all approaches, we agreed to currently settle on third approach. In the next week, I am planning to work on this approach so that we can implement submodules for separate files without separate compilation. Also, we will need to extend this approach for CMAKE build system maybe by adding a new compiler option like --add-submodule to provide submodule to be used when compiling the main program through CMAKE. I will also work on this CMAKE thing once I complete my main Implementation for the above discussed third approach in my next week.

Overall, I worked for 29 hours this week and enjoyed the work that I did in the eight week and would like to thank Ondrej Certik, Harshita Kalani, Pranav Goswami and all the other LFortran members for their reviews and suggestions which helped me a lot to tackle new difficulties. I am looking forward to continue my journey in the next week with the same excitement and enthusiasm and plan to complete my proposed tasks as quickly as I can.