Custom chunk optimization, how code splitting works in Rspack/Webpack

Zack Jackson
10 min readSep 3, 2024

--

Chunk splitting is one of the most powerful features of modern bundlers like Webpack and Rspack, yet its inner workings and how to programmatically optimize or modify the module graph remain relatively obscure.

Module Federation heavily relies on the robust chunk splitting mechanisms provided by these bundlers, but there are often tricky issues when it comes to combining Federation with user-defined chunk splitting configurations — certain settings can inadvertently disrupt code sharing and cause unexpected issues.

I have been working on a pull request to address a few long-standing gripes Ive had, and this let me down a 3 week rabbit hole of how the optimization actually works

Optimization stages in Compilation

Webpack has many hooks, the compilation class has even more — and this is where we begin.

It allll starts with seal

You can look at the seal method on Webpacks compilation class here

Sealing the build is the “slowest” part of a compile, its also where most of the good stuff happens, after we link and transform your modules into their final forms. In the future we have plans for rspack to perform distributed compilation, since the module graph can be generated elsewhere and sealed by another process, in theory.

Here’s a breakdown of what the seal method does:

1: Initialization:

  • The function starts by defining finalCallback, which clears various queues (factorizeQueue, buildQueue, rebuildQueue, processDependenciesQueue, addModuleQueue) and then calls the provided callback.
  • It creates a new ChunkGraph, which is a data structure that holds information about how modules are split into chunks.

2: Sealing the Compilation:

  • The seal hook is called, signaling that the compilation is now transitioning into the final stages.

3: Optimize Dependencies:

  • The function enters a loop that repeatedly calls the optimizeDependencies hook until no more optimizations can be applied. This stage allows plugins to optimize the module dependency graph.

4: Chunk Creation:

  • The beforeChunks hook is called before chunk creation begins.
  • The moduleGraph is frozen, meaning that no more changes can be made to it. Unless you call unseal hook
  • Chunks are created for each entry point in the application. The code assigns modules to chunks, handles chunk dependencies, and associates chunks with their entry points.
  • If any entry point has circular dependencies or invalid configurations (like using both dependOn and runtime), errors are logged.

5: Optimize Chunks:

  • Various optimization hooks are called in sequence: optimize, optimizeModules, optimizeChunks, optimizeTree, and optimizeChunkModules.
  • These hooks allow plugins to further optimize the chunks, such as by merging, splitting, or reordering them.

6: Hashing and Code Generation:

  • The function calculates hashes for modules and chunks, which are used to determine whether the content has changed (useful for caching).
  • Code generation occurs where modules are transformed into the final JavaScript code that will be included in the bundles.

7: Runtime Requirements:

  • The function processes runtime requirements, ensuring that all necessary runtime code (like module loaders) is included.

8: Asset Creation:

  • Assets (the final output files) are created for both modules and chunks.
  • The processAssets hook is called, allowing plugins to modify or add assets before they are finalized.

9: Finalization:

  • The afterSeal hook is called, signaling the end of the sealing process. If any additional sealing is required, the process can be repeated.
  • Finally, the callback is invoked, and the compilation process is complete.

Where split chunks takes place

The optimization phase in the compilation process follows a specific order of hooks. Since compilation.hooks.seal has already been called, this indicates that the bundler should not expect any additional modules to be added to the module graph at this point. Instead, our focus shifts to modifying the connections between these modules and the chunks they belong to.

  1. compilation.hooks.optimizeDependencies: Immediately after sealing the build, you can hook into this phase to perform actions like setting the used exports, dead code elimination, or tree shaking. This involves updating information about who uses each module, what exports are utilized, and so on.
  2. compilation.moduleGraph.freeze: At this stage, the bundler freezes the module graph, ensuring no further changes are made to its structure or usage.
  3. compilation.hooks.afterChunks: Unoptimized initial chunks are created, including those for dynamic imports and entry points, but no custom chunk splitting has occurred yet.
  4. compilation.hooks.optimize: The optimization phase is just beginning. If you need to perform any actions as optimization starts, this is the place to do it.
  5. compilation.hooks.optimizeModules: Here, you can make changes to the modules, such as replacing one module with another or removing unused modules.
  6. compilation.hooks.optimizeChunks: This is where the actual chunk splitting takes place.

The SplitChunksPlugin

The actual SplitChunksPlugin is quite large and complex, so I've written a basic version to demonstrate how the splitChunks.cacheGroups configuration in Webpack/Rspack is implemented.

const { STAGE_ADVANCED } = require('webpack').Compilation;

class BasicSplitChunksPlugin {
apply(compiler) {
compiler.hooks.thisCompilation.tap('BasicSplitChunksPlugin', (compilation) => {
compilation.hooks.optimizeChunks.tap(
{
name: 'BasicSplitChunksPlugin',
stage: STAGE_ADVANCED, // Using the advanced stage for optimization
},
(chunks) => {
// Iterate over each cache group defined by the user
const cacheGroups = compilation.options.optimization.splitChunks.cacheGroups;

for (const cacheGroupName in cacheGroups) {
const cacheGroup = cacheGroups[cacheGroupName];

// Filter chunks based on the user-defined chunks option in the cache group
const selectedChunks = chunks.filter((chunk) => {
if (cacheGroup.chunks === 'all') return true;
if (cacheGroup.chunks === 'initial') return chunk.canBeInitial();
if (cacheGroup.chunks === 'async') return !chunk.canBeInitial();
return false;
});

selectedChunks.forEach((chunk) => {
// Find modules matching the cache group's test criteria
const modulesToSplit = chunk.getModules().filter((module) => {
if (cacheGroup.test instanceof RegExp) {
return cacheGroup.test.test(module.resource);
}
if (typeof cacheGroup.test === 'function') {
return cacheGroup.test(module);
}
return false;
});

if (modulesToSplit.length > 0) {
// Create a new chunk for the selected modules
const newChunk = compilation.addChunk(`${cacheGroupName}-split`);
newChunk.chunkReason = `split chunk for cache group: ${cacheGroupName}`;

modulesToSplit.forEach((module) => {
chunk.moveModule(module, newChunk);
});
}
});
}
}
);
});
}
}

module.exports = BasicSplitChunksPlugin;

// in user land
const BasicSplitChunksPlugin = require('./path/to/BasicSplitChunksPlugin');

module.exports = {
// ...other configurations
optimization: {
splitChunks: {
cacheGroups: {
commons: {
test: /[\\/]node_modules[\\/]/,
name: 'vendors',
chunks: 'all',
},
styles: {
test: /\.css$/,
name: 'styles',
chunks: 'all',
},
},
},
},
plugins: [new BasicSplitChunksPlugin()],
};

Customizing the chunk split

Chunk splitting occurs at STAGE_ADVANCED = 10, so I need to ensure that my custom optimization phases run after the SplitChunksPlugin has already executed. Otherwise, my changes to the graph could potentially be overridden by the SplitChunksPlugin. To achieve this, I would use stage: STAGE_ADVANCED + 1 or stage: 11 to ensure my plugin processes the chunks after the default splitting has taken place.

Real-World Example

In my work with custom chunk and module optimization, my goal is to ‘hoist’ dependencies into runtime chunks. If you’ve used Module Federation, you may have encountered issues when setting runtimeChunk: 'single'. These problems arise from an outdated design decision made back in 2020 when Webpack 5 was first implemented.

Structure when runtimeChunk: single is set

The core issue is that runtime chunks are added after the initial compilation, meaning they don’t have any modules or entry points assigned to them during the standard compilation process. With Module Federation, particularly when consuming ‘remotes’ or sharing code with ‘eager,’ all references are hoisted to the top of the entry point chunk as expected. However, this hoisting can create problems with Module Federation, as the runtime chunks may not correctly initialize or manage these references, leading to potential issues during execution.

Module Federation is implemented as a RuntimeModule within the Webpack/Rspack runtime and is executed during the startup phase. During this process, it attempts to initialize remote modules found in the build, such as webpack/container/reference/home_app. However, these references may not be present in __webpack_modules__ because the runtime chunk is empty, with all modules residing in the entry points, which might not be loaded upfront. This can result in missing remotes that are not initialized during startup, leading to runtime errors later on. This issue primarily occurs when runtimeChunk: 'single' is set—when the runtime goes into the main chunk, for example, any references are already hoisted since main also serves as the entry point.

When runtimeChunk is not set to single, everything is alright

The Need

I want to develop a plugin that hoists all components related to Module Federation startup into the appropriate runtime chunks. Additionally, I want to ensure that the splitChunks: { chunks: 'all' } configuration doesn't interfere with or break the functionality of Module Federation.

        compilation.hooks.optimizeChunks.tap(
{
name: PLUGIN_NAME,
// advanced stage is where SplitChunksPlugin runs.
stage: 11, // advanced + 1
},
(chunks: Iterable<Chunk>) => {
const runtimeChunks = this.getRuntimeChunks(compilation);
this.hoistModulesInChunks(
compilation,
runtimeChunks,
chunks,
logger,
);
},
);

Next, I have a method called hoistModulesInChunks. In my case, I create an additional entry in another plugin—this entry point is typically named with the suffix _partial if the corresponding chunk exists. I use this as a reference to locate the desired modules; otherwise, I perform a more extensive lookup. While this approach could be optimized, it effectively serves its purpose.

  // Method to hoist modules in chunks
private hoistModulesInChunks(
compilation: Compilation,
runtimeChunks: Set<Chunk>,
chunks: Iterable<Chunk>,
logger: ReturnType<Compilation['getLogger']>,
): void {
const { chunkGraph, moduleGraph } = compilation;
// when runtimeChunk: single is set - ContainerPlugin will create a "partial" chunk we can use to
// move modules into the runtime chunk
const partialChunk = this.containerName
? compilation.namedChunks.get(this.containerName)
: undefined;
let runtimeModule;
if (!partialChunk) {
for (const chunk of chunks) {
if (
chunkGraph.getNumberOfEntryModules(chunk) > 0 &&
this.entryFilePath
) {
runtimeModule = this.findModule(
compilation,
chunk,
this.entryFilePath,
);

if (runtimeModule) break;
}
}
} else {
const entryModules =
chunkGraph.getChunkEntryModulesIterable(partialChunk);
runtimeModule = entryModules
? Array.from(entryModules).find(
(module) => module instanceof ContainerEntryModule,
)
: undefined;
}

if (!runtimeModule) {
logger.error(
'[Federation HoistContainerReferences] unable to find runtime module:',
this.entryFilePath,
);
return;
}

const allReferencedModules = this.getAllReferencedModules(
compilation,
runtimeModule,
'initial',
);

// If single runtime chunk, copy the remoteEntry into the runtime chunk to allow for embed container
// this will not work well if there multiple runtime chunks from entrypoints (like next)
// need better solution to multi runtime chunk hoisting
if (partialChunk) {
for (const module of chunkGraph.getChunkModulesIterable(partialChunk)) {
allReferencedModules.add(module);
}
}

for (const chunk of runtimeChunks) {
for (const module of allReferencedModules) {
if (!chunkGraph.isModuleInChunk(module, chunk)) {
chunkGraph.connectChunkAndModule(chunk, module);
}
}
}

// Set used exports for the runtime module
this.cleanUpChunks(compilation, allReferencedModules);
}

Once I find the referencing module I want, Ill getAllReferencedModules

This allows me to locate all outgoing connections, and build up the entire tree of modules needed.

  // Helper method to collect all referenced modules recursively
private getAllReferencedModules(
compilation: Compilation,
module: Module,
type?: 'all' | 'initial',
): Set<Module> {
const collectedModules = new Set<Module>([module]);
const stack = [module];

while (stack.length > 0) {
const currentModule = stack.pop();
if (!currentModule) continue;
const mgm = compilation.moduleGraph._getModuleGraphModule(currentModule);
if (mgm && mgm.outgoingConnections) {
for (const connection of mgm.outgoingConnections) {
if (type === 'initial') {
const parentBlock = compilation.moduleGraph.getParentBlock(
connection.dependency,
);
if (parentBlock instanceof AsyncDependenciesBlock) {
continue;
}
}
if (connection.module && !collectedModules.has(connection.module)) {
collectedModules.add(connection.module);
stack.push(connection.module);
}
}
}
}

return collectedModules;
}

Once I’ve gathered all the referenced modules, I can then use connectChunkAndModule to ensure the bundler outputs the module in the newly connected chunk. Finally, I'll clean up the chunk graph by removing any references to split chunks, as the modules are now included in the runtime chunks and will always be available. This cleanup involves disconnecting and removing any empty chunks that are no longer needed.

  private cleanUpChunks(compilation: Compilation, modules: Set<Module>): void {
const { chunkGraph } = compilation;
for (const module of modules) {
for (const chunk of chunkGraph.getModuleChunks(module)) {
if (!chunk.hasRuntime()) {
chunkGraph.disconnectChunkAndModule(chunk, module);
if (
chunkGraph.getNumberOfChunkModules(chunk) === 0 &&
chunkGraph.getNumberOfEntryModules(chunk) === 0
) {
chunkGraph.disconnectChunk(chunk);
compilation.chunks.delete(chunk);
if (chunk.name) {
compilation.namedChunks.delete(chunk.name);
}
}
}
}
}
modules.clear();
}

You can see the full plugin (in its current state at time of writing)

Tree shakeing

Tree shaking a module is also possible with optimizeDependencies

In my case, I inject modules into all runtime chunks, some of them might not have orignally had the module in that part of the graph, so the bundler will mark it as unused and remove the seemingly dead code.

While this is a basic solution to meet my needs — a ton can be done in this hook. Using moduleGraph.getExportsInfo will allow you to assign used exports per runtime or remove them if you wanted to tree shake additional code.

        // Hook into the optimizeDependencies phase
compilation.hooks.optimizeDependencies.tap(
{
name: PLUGIN_NAME,
// basic optimization stage - it runs first
stage: -10,
},
(modules: Iterable<Module>) => {
if (this.entryFilePath) {
let runtime: RuntimeSpec | undefined;
for (const [name, { options }] of compilation.entries) {
runtime = compiler.webpack.util.runtime.mergeRuntimeOwned(
runtime,
compiler.webpack.util.runtime.getEntryRuntime(
compilation,
name,
options,
),
);
}
for (const module of modules) {
if (
module instanceof NormalModule &&
module.resource === this.bundlerRuntimeDep
) {
const allRefs = this.getAllReferencedModules(
compilation,
module,
'initial',
);
for (const module of allRefs) {
const exportsInfo: ExportsInfo =
moduleGraph.getExportsInfo(module);
// Since i dont use the import federation var, tree shake will eliminate it.
// also because currently the runtime is copied into all runtime chunks
// some might not have the runtime import in the tree to begin with
exportsInfo.setUsedInUnknownWay(runtime);
moduleGraph.addExtraReason(module, this.explanation);
if (module.factoryMeta === undefined) {
module.factoryMeta = {};
}
module.factoryMeta.sideEffectFree = false;
}
}
}
}
},
);

Wrapping up

This ultimately is how i solved my issues, and hopefully provides a glimpse into how one can use optimization stages.

Dont forget to try out rspack ❤ + We are hiring

--

--

Zack Jackson
Zack Jackson

Written by Zack Jackson

Infra Architect @ ByteDance. Creator of Module Federation. Specializing in Bundler and Javascript Orchestration at scale.

Responses (1)