the question of the day is, can i beat tiny-remapper?
i think there are 4 phases to remapping
- parse mappings file
- hard to parallelize this! disk IO be like that
- read class hierarchy
- scan every class (both in the classes-to-remap and the remap classpath), look for non-
private
methods, superclasses, and superinterfaces
- this is embarassingly parallel. for JDK classes we can even precompute it
- compute method owners
- if class B extends class A and both classes define “public void foo()”, then we need to recognize B.foo overrides A.foo
- for each class, walk up the superclass tree looking for someone else who defined that method
- this is sorta embarassingly parallel? if class C extends class B, if we compute serially we can leverage knowledge of B.foo being an override of A to infer that C.foo is also an override of A
- i think in practice class hierarchies tend to go wide not deep, so parallelizing this is probably fine. might need a profiler whack.
- (basically the design question is should we write to methodmetadata.owner right away to reduce browsing time, or defer them all to the end to reduce contention)
- actually remap classes
- class and field renames are easy, they’re straight from the mappings. “total” set of method renames should be inferrable from the hierarchy analysis
- what i mean by that: we have metadata for every non-private method that explains what its “true” owner is, and that can be easily fed into mappings lookup
- this should be enough information to asm every input class and write it back in a parallel fashion
fork-join with three phases. map reduce. idk.
INTERESTING: if it is acceptable to hold everything in-memory (which is probably okay), we can read all classes from the jar then start parsing mappings in parallel while we compute method owners (i’m assuming you can only read one file at a time cause of the realities of disks). while we’re doing the final remap phase we can also drop our copy of the original classes as soon as the ASM ClassReaders get their hands on them
observations
- zip processing is… weird
- duct-tape solution: zipfilesystem like tiny-remapper use
- non-class resources are annoying to copy with zfs though… really need a way to copy a file without recompressing it
- fancy solution: something that reads the central directory and dispatches the decompressing work to each worker thread
- and some way of collecting them all at the end. ideally in a stable iteration order?
- the problem of course is that class remaps can change filenames, and in some cases order is not preserved
- read each input class name, remap them into a treeset or something, write them back in that order
- and by that i mean, whoever’s in charge of writing holds on to a sorted list of “expected next filename”, if it receives the “correct” filename, great, else store it in a temporary side map
- there is probably a lot of stuff tiny-remapper does that this does not do. especially, especially error checking.
- i have no idea how generic signatures work. are they all stripped? spacewalkermc/sparrow-and-raven?
some tiny-remapper options and what they do
- ignorefielddesc: fields use a class/name model instead of a class/desc/name model
- useful for MCP which doesn’t list field descs
- forcepropagation
- knownindybsm
- something about invokedynamics (often called “indys” here)
- propagateprivate
- probably controls whether
private
methods still participate in the ownership dance
- propagate bridges
- “enabled”, “disabled”, “compatible” options
- removeframes
- proly strips stackmap frames (?)
- ignoreconflicts
- checkpackageaccess
- fixpackageaccess
- resolvemissing
- rebuildsourcefilenames
- skiplocalvariablemapping
- renameinvalidlocals
- invalidlvnamepattern
- nonclasscopymode
- what to do with non .class files
- FIX_META_INF mode removes digital signatures (the thing you “delete meta inf” for), as well as mapping Main-Class, Launcher-Agent-Class, and services
- threads
- mixin
- enables a little MixinExtension that also remaps mixins
what are some non-obvious features of tiny-remapper to look at?
- can process individual
.class
files as well as jars (TinyRemapper#readFile
)
- supports multi-release jars (this is what the
Mrj
stuff is about in the code)
- does not parse
module-info.class
- something something, “input tags”? have to investigate those more
- i think the purpose is so you can shove every fabric mod jar into the same tiny-remapper with different input tags, and pull them out jar-by-jar
- something about bridge methods
- it kind of has a similar structure to what im thinking of.. hmm
- usage (as prescribed by Main) is
- addNonClassFiles (one pass over the jar)
- readInputs
- filters down to read(Path[], boolean, InputTag)
- for(Path input : inputs) futures.addAll(read(…))
- that is read(Path, boolean, InputTag[], boolean, fsToClose)
- walks the ZipFileSystem and adds a new CompletableFuture for each class
- so class reading is parallelized per class..!
- readClassPath
- same thing but saveData is set to false
- which doesn’t seem to do anything, actually?
- apply
- synchronized(this), just in case
- first call refresh()
- ensures all pending read operations are done
- loadMappings
- checkClassMappings
- sometimes remapped classes are diverted to a ConcurrentHashMap (if fixPackageAccess || inputTags) but other times the remapped class is sent directly to the immediateOutputConsumer
- ok so class ordering is probly not stable lol
- the OutputConsumerPath (writes to the jar with zfs) has a “threadSyncWrites” option, if enabled a reentrantlock prevents writes from more than one thread
- unused in org:fabricmc. apparently unused on all of github! ok
but yeah it basically has the same structure as i’m proposing :skull: read in parallel (with zipfs), propagate in parallel, write classes in parallel (with zipfs). i guess the main differences in mine are that i want to read mappings concurrently with propagation, i want propagation to be unidirectional, and i want to use plain java instead of zipfilesystem
tiny-remapper propagation is kinda weird, i think it renames at the same time. some insightful comments left in the source
/*
* initial private member or static method in interface: only local
* non-virtual: up to matching member (if not already in this), then down until matching again (exclusive)
* virtual: all across the hierarchy, only non-private|static can change direction - skip private|static in interfaces
*/
- “down propagation from static member matching the signature starts its own namespace” - hmm
class A{void foo()}, B extends A{static void foo()}, C extends B{void foo()}
?
- the “forcePropagation” file comes into play here
- THEORY: might have something to do with correcting private->public ATs?
- something wacky with bridge methods
as expected “up” propagation calls propagate() on superclasses/superinterfaces, and “down” propagation calls propagate() on subclasses
oough there’s also isAssignableFrom
logic? i think it’s used as part of resolveMethod, comments there make talk of the jvm spec
article from proguard about it https://www.guardsquare.com/blog/behind-the-scenes-of-jvm-method-invocations#Virtual_methods
https://docs.oracle.com/javase/specs/jvms/se17/html/jvms-5.html#jvms-5.4.6
interesting: hotspot will IncompatibleClassChangeError in certain cases where method lookup is actually ambiguous https://jvilk.com/blog/java-8-specification-bug/
while i’m here, the specific overriding rules: https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-5.html#jvms-5.4.5
- mC can override mA if:
- mC and mA have identical names and descriptors (good)
- mC is not PRIVATE
- one of the following constraints on the base method mA is true
- mA is PUBLIC or PROTECTED
- mA is not private and appears in the same package as mC
- mA is not private and there’s a method mB in a class between them blah blah
that final constraint is very complicated, basically what it means is that if you’re scanning upwards for an overriding method and you find a public
one, you are permitted to continue scanning upwards even through package-private methods you can’t see. the example from the spec is
public class A { void m() {} } //D.m can override this (!)
public class B extends A { public void m() {} } //D.m can override this
public class C extends B { void m() {} } //D.m cannot override this
//different package
public class D extends P.C { void m() {} }