Monstera: new branch

Here you can discuss ReactOS related topics.

Moderator: Moderator Team

The_French_Rat
Posts: 101
Joined: Sun Oct 21, 2012 11:57 pm
Location: Melbourne, Victoria, Australia
Contact:

Monstera: new branch

Post by The_French_Rat »

What is it?
Just think of ReactOS as the XP beta, Whistler.

PurpleGurl
Posts: 1788
Joined: Fri Aug 07, 2009 5:11 am
Location: USA

Re: Monstera: new branch

Post by PurpleGurl »

Good question. All Fireball gave was this:

"Create a new branch for my experimental pet project. The branch will be organized the same way as the arwinss branch - as a diff to trunk plus its own files, so no copying from trunk is involved."

Aeneas
Posts: 470
Joined: Sat Oct 10, 2009 10:09 pm

Re: Monstera: new branch

Post by Aeneas »

(Though the correct Latin plural of monstrum would be monstra... ;) )

oldman
Posts: 1146
Joined: Sun Dec 20, 2009 1:23 pm

Re: Monstera: new branch

Post by oldman »

Monstera is a new implementation of a memory manager (along with a cache manager) compatible with the ReactOS kernel
Please keep the Windows classic (9x/2000) look and feel.
The layman's guides to - debugging - bug reporting - compiling - ISO remaster.
They may help you with a problem, so do have a look at them.

User avatar
gonzoMD
Posts: 1062
Joined: Fri Oct 20, 2006 7:49 am
Location: Germany
Contact:

Re: Monstera: new branch

Post by gonzoMD »

oldman wrote:
Monstera is a new implementation of a memory manager (along with a cache manager) compatible with the ReactOS kernel
And what is with ARM3? will it be developed parallel or abandoned (Wasn't it a reimplementation of mm too?)

cruonit
Posts: 251
Joined: Mon Jun 29, 2009 12:57 am

Re: Monstera: new branch

Post by cruonit »

a post i found on stacoverflow:

The primary reason that Linux isn't written in C++ is of course that Linus Torvalds hates it.

There are also technical reasons why C might be preferred over C++ for things like kernels.

New architectures and platforms will typically have a C compiler long before they have a C++ compiler. C is a much simpler and easier language to implement.

Portability of C code between compilers has been far better. Portability of C++ code was long something that required a lot of discipline to achieve. See the (now historic) Mozilla portability guide for insight into the lengths that programmers go to to create portable C++.

C++ requires a more complicated runtime to support things like exception handling and RTTI. This can be hard to provide in an unhosted environment. Compilers do permit you to switch them off.

Apparently simple statements can hide expensive operations, thanks to operator-overloading. This would normally be considered a Good Thing, but in an embedded/kernel development world people like to be able to see where expensive operations are being performed.

Here are some non-reasons:

"C++ is slower than C." C++ has the same overheads as C. Additional overheads typically only arise when using features C doesn't support.

"Virtual dispatch is slow." Virtual dispatch is slower than static dispatch, but the performance penalty is modest, particularly when used judiciously. The Linux kernel already makes wide use of jump tables for performing dynamic dispatch.

"Templates cause massive code bloat." This is potentially true. However the Linux kernel uses macros to perform similar code generation effects, for instance creating typed data structures, or for retrieving a containing structure from a pointer to a member.

"Encapsulation hurts performance."

Here are some reasons that a kernel in C++ might be a good idea:

Less boiler plate code to use the common dynamic dispatch pattern.

Templates give a safer way to perform simple code generation, with no performance penalty over macros.

The class mechanism encourages programmers to encapsulate their code.

User avatar
jonaspm
Posts: 585
Joined: Mon Nov 21, 2011 1:10 am
Location: Mexico
Contact:

Re: Monstera: new branch

Post by jonaspm »

Aleksey Bragin
60847 added 28 files monstera in reactos

Initial commit of a small subproject I wanted to do for years. If you like it, please feel free to join me as there is more than enough place for improvement.
This commit brings the first very small implemented part (mainly everything related to phase 0 initialization).
A cut from the readme file:
Monstera is a new implementation of a memory manager (along with a cache manager) compatible with the ReactOS kernel at source code level and providing the same binary compatible Native API through a lightweight wrapper.
Monstera is implemented in a subset of C++ programming language. A document
outlining specific restrictions, coding style and other considerations is
available.
Key ideas:
1. Object oriented language for object oriented kernel. When NT was implemented, C++ wasn't that good.
2. Simplicity > optimization. Nowadays we can sacrifice a bit of performance in favor of more robust implementation.
3. Same Native API. Internal implementation and external interfaces are two different things.
4. Don't drift away too much. It's still based on NT architecture, but think of it as if Microsoft Research would decide to reimplement NT in C++ for fun.
Credits:
- ReactOS Portable Systems Group for the code which is used as a base in many places of Monstera.
- Alex Ionescu for his invaluable contribution to the ReactOS kernel.
- Timo Kreuzer, Johannes Anderwald for their C++-in-the-kernel-mode help.
- Amine Khaldi for help with the build environment.
- ReactOS team for their great work.

mrugiero
Posts: 482
Joined: Sun Feb 14, 2010 9:12 am

Re: Monstera: new branch

Post by mrugiero »

This sounds like a really fun project. Also, I have much faith in the stability and development speed (related to how many hours they spend) of this, although at the already stated cost of a bit of a performance penalty.

PurpleGurl
Posts: 1788
Joined: Fri Aug 07, 2009 5:11 am
Location: USA

Re: Monstera: new branch

Post by PurpleGurl »

Personally, I don't see the point of a memory manager with a performance hit. I do like the idea of stability. Writing one in pure assembly would be nice, but tedious and lacking in portability. I'd love to see someone port some crucial things back to assembly, but long after major bugs are squashed and other major parts are added. That would be a step to do at the end.

I found out in my Quickbasic coding that there are times when assembly is appropriate. Quickbasic used to create rather large .EXEs. Then I started using the Crescent PDQ pack instead of the default linking library. That reduced code sizes. However, it could not always be used since the PDQ was weak on graphics support and certain other things. Then I learned how to call the PDQ library from assembly (use the same programming model and parameter passing style, ie. .Model Medium, Basic). Then I learned how to include some of the functions and modules inline. In the process, I learned how inefficient BC.EXE (QB compiler) was for certain types of operations, since I studied the list dump files. I found that If blocks were vary inefficient in size and speed. Instead, nested Ifs were better. Have the operation most likely to fail in the outer portion, so if that fails, the other tests won't execute. Instead, the If block structure evaluated everything and assigned a number to each and then used the AND operator on the numbers to compare the value of everything and then branch. So every single test is ran every time and additional comparisons which were not necessary. That is how inefficient the generated assembly code was.

Worse was how coarse the QB commands were. For instance, lets take keyboard handling. Quickbasic used Inkey$ to read single characters from the keyboard. Now, lets consider how that is generated and how that is used. To get that, you would need to call a software interrupt to DOS or the BIOS. It would return the value as an integer in the AX register. Then Inkey$ copied that to memory and gave it a string descriptor and returned the address to the string descriptor. Now, that makes me ask 2 questions. Why does it have to be in memory at all? And why does it have to be a string? (So why not return an integer?) In QB, functions were returned in the AX register if they were integers, and pointers to string descriptors were returned if strings. Now, lets consider how programmers used the string value returned. Most likely, they would use the VAL command to convert the string to an integer. So you do all that work to get an integer, convert to a string, and convert it back to an integer. Even worse, using VAL often pulled in the entire floating point emulator package. Now, PDQ simplified this by passing an integer back rather than a string in their PDQInkey% function. That was a little better, since you received integers just like they came from the CPU (and passed back through the AX register and not memory).

Oh, and the worst was handing interrupts. If you wanted to call Interrupts from QB, you had to set up a complex struct and pass it to the Interrupt or InterruptX routine. That added a bunch of exception handling code and stack code, etc. And worse, it would return things as unsigned integers as passed from many of the calls despite QB only handing signed numbers, so you'd have to use long integers and convert to signed, and pull in the entire FP emulator package. Interrupts are MUCH easier to do in assembly. Just stuff the registers and call the interrupt using the native CPU Int command. No structs, no conversion, no hand-holding, no debugger code, etc.

I gave up coding and never learned how to code for Windows. I don't know how to code for protected mode nor Win32. I guess I could learn if I had the coaching and prodding to do so.

Anyway, I am not sure why coding the MM in C++ would help. It might improve stability and portability, but I am interested in performance. I'd rather see it coded in assembly where things are the most granular and where you can choose the instructions. But if this project will help in getting a more correct implementation, who am I to complain? The first thing you have to get down is an initial strategy for doing it. Once you know what you are doing like the back of your hand, then it won't be all that bad rewriting in any other language. Right now, I think it is fair to say that we all just want it to work. Another story is that I wrote my own Input routine for Quickbasic. I found that the default one included too much functionality. All I wanted was to be able to type and backspace for a limited distance and not ruin the display while an illiterate user played with it. So I wrote it in QB at first. It was hard figuring out how to do what I wanted. I finally came up with a block of code and optimized it. Then I played at converting to .asm and reduced the compilation size even more. But had I not played in QB and created the original code, I would not have written the assembly version with the keyboard code inline. So I see the value of coding the MM in C++ as a playground and helping solidify what you are doing. So any rewrites back from that may be quite mature.

milon
Posts: 969
Joined: Sat Sep 05, 2009 9:26 pm

Re: Monstera: new branch

Post by milon »

PurpleGurl, am I correct in understanding your post to mean that Monstera should be pursued in C++ and then later translated into assembly for performance reasons? If so, then I think we're on the same page. Let's get it setup and working before we try to optimize it. I'm excited for Monstera - a better MM means a better ROS with less crashes!

PurpleGurl
Posts: 1788
Joined: Fri Aug 07, 2009 5:11 am
Location: USA

Re: Monstera: new branch

Post by PurpleGurl »

milon wrote:PurpleGurl, am I correct in understanding your post to mean that Monstera should be pursued in C++ and then later translated into assembly for performance reasons? If so, then I think we're on the same page. Let's get it setup and working before we try to optimize it. I'm excited for Monstera - a better MM means a better ROS with less crashes!
Yes. The devs are going to do what they want anyway. Hopefully, at the least, it will be used to explore the shortcomings it has now in regular C, and used to tweak the MM already in the library. If Fireball wants to play with with it, he might stumble across what may currently be broken. Like ARWINSS, I don't think the goal will ever be to integrate it back into the kernel, but more as a learning experience.

Now, the disadvantage of making anything in assembly would be portability. It would be nice if ROS could run on other platforms, including good ones which were orphaned (maybe Itanium or certain Power PC), and ones that are leading platforms in other areas (ARM, for instance). But still, it would be nice to have fine-tuned code for specific platforms, like PCs for instance.

I brought up my QB coding experiences to show how slow and clunky larger, higher level languages make things. I imagined managed code would work similarly. At least Fireball is proposing C++ and not C#. The problem with higher level languages is the lack of granularity.

I described how keyboard input worked under DOS to show how code can easily gain bloat and performance loss through inefficiency. Getting a keystroke involved stuffing values into registers and call the interrupt, then getting the value back in the AX register and it would already be an integer. If you needed an integer, that would be good, but QB would do extra work and return that as a string. That involved adding a string descriptor and involving the program's memory manager. But if you needed it as an integer, you'd have to convert the string into one, and that might pull in floating point code since the code was intended for 16-bit machines, and they would want to make sure there were no overflows. So just polling the keyboard could pull in about 15K more code, and there could be a lag. There was no need to do all that conversion back and forth, since the CPU and the OS already returned it as an integer, and that would actually be a more handy way to input a single character, since you could use logical operators to change the case, and subtract to change ASCII codes to numeric values, etc.

PascalDragon
Posts: 123
Joined: Wed Aug 04, 2010 7:34 pm

Re: Monstera: new branch

Post by PascalDragon »

PurpleGurl wrote:Now, the disadvantage of making anything in assembly would be portability. It would be nice if ROS could run on other platforms, including good ones which were orphaned (maybe Itanium or certain Power PC), and ones that are leading platforms in other areas (ARM, for instance). But still, it would be nice to have fine-tuned code for specific platforms, like PCs for instance.
I would simply let the compiler do its job. Today's C(++) compilers are rather good with optimizations (though of course one needs to take care that they don't optimize things that they shouldn't :roll: ) and thus in my opinion using assembly for things except low level interaction with the system is unnecessary (and as you already wrote hinders portability) and the developers should better spent their time fixing bugs in the system or implementing necessary/wished for features.

Regards,
Sven
Free Pascal compiler developer

PurpleGurl
Posts: 1788
Joined: Fri Aug 07, 2009 5:11 am
Location: USA

Re: Monstera: new branch

Post by PurpleGurl »

PascalDragon wrote:I would simply let the compiler do its job. Today's C(++) compilers are rather good with optimizations (though of course one needs to take care that they don't optimize things that they shouldn't :roll: ) and thus in my opinion using assembly for things except low level interaction with the system is unnecessary (and as you already wrote hinders portability) and the developers should better spent their time fixing bugs in the system or implementing necessary/wished for features.

Regards,
Sven
I know what you are saying, and I disagree as I have before. I know very well how compilers had add bloat and slow things down. I know, I've seen listings and disassembled things I've written and compiled. But I think you misunderstand me. What I said was all in the context of after things are developed. That is what I keep saying. Get ReactOS finished and locked down first, then fine tune it. If nobody here wants to do that, then nobody here has to. A 3rd party branch or fork would be nice, or even an unofficial service pack. I use Waterfox as a browser, though stability is an issue (it seems to be for all browsers these days). Waterfox exists because Firefox decided to stop writing for 64-bit, and it uses 3rd party optimized libraries.

You want to argue about the coding "logic," and that is exactly what I am talking about - compilers often change the logic. I guess you missed what I said above about compilers converting integers to string and forcing you to convert back, and adding many unnecessary cycles, or the example of the "And If" blocks. Guess how the compiler does those vs. how I would do it? The compiler executes and evaluates all the code in the block first, assigns each a bouillon value, and then evaluates that value. That is very inefficient. What I would do would be to evaluate from the most likely estimated to fail to the most likely to succeed. Thus the entire block does not have to execute if a section is going to fail. And the decision to branch and break out of the code would be done immediately and from the registers, not deferred, placed in memory, and then unnecessary logic operators used. Yes, there was a work around in the language I was using, and that was not not use If blocks but use individual If statements. Nesting is a better approach and saves about a paragraph of space per operation used.

If nobody else will support the conversion to tight assembly using code for modern machines, then maybe I will be forced to do it, and maybe I should then make a license prohibiting anyone criticizing the conversion to highly optimized hand assembly from using it. ;-) (Actually, I cannot, I'd have to adhere to GPL3, etc.) If you want mediocrity, that is fine, but please don't try to force it on the rest of us. In many ways, I wish this was the 80s or so, where *only* the "geeks" knew enough to code and stuff. If it were up to me, coding would require a license and only the geeks would get it, not rich brats trying to make a bunch of money. And that is just fantasy.

The notion of portability at all costs is much like "socialism" (no, I don't want to discuss that nor get into healthcare, etc.) in that it is mediocre, but for everyone, and sacrifices performance for platform diversity. Like I said, I don't see hand assembly as a waste at all. If you want portability and ease of development, sure, use a common and portable language. That helps in that you can get more hands on deck to code it; everyone wants code for their own platform, and if they all can agree on a common language that hides the differences of the hardware used, then they can get more cooperation. However, *AFTER* that is all said and done and the code becomes more mature, then it is time for programmers of what ever individual platforms tuning things more for their platform of choice. I would start with the most critical portions that could benefit from tight hand assembly. And yes, there are some complex pieces where the compiler is about as best as you can get too. I once tried to rewrite a GIF89 decoder/display module that some high school kid wrote in QuickBasic. There was even a line in the comments challenging conversion to assembly, and I decided to give it a shot. I gave up not too long into it. What bit I coded of it turned out to be slower. I expected faster decoding and drawing.

Z98
Release Engineer
Posts: 3379
Joined: Tue May 02, 2006 8:16 pm
Contact:

Re: Monstera: new branch

Post by Z98 »

I have a hard time believing that your experiences reflect modern compilers. For at least x86/x64 compilers, the current state of the art are very good at producing performant code and will often do things that would never occur to a programmer. For example, an optimizing compiler may often inject nops into the instruction stream that might look like "bloat" but are actually for alignment purposes so that cache accesses are aligned. A programmer is far more likely to view these nops as "bloat" but removing them worsens performance. Your comment regarding how condition blocks are handled also sounds out of date, as what you've described is basically predication, which hasn't been necessary on desktop processors for years at this point. GPUs still use it, but CPUs have had pretty good branch prediction for quite some time now.

Several of the developers have done experiments with hand written assembly versus C code compiled to machine code and in almost all cases the compiler beat them because the compiler started pulling tricks that were non-intuitive.

PurpleGurl
Posts: 1788
Joined: Fri Aug 07, 2009 5:11 am
Location: USA

Re: Monstera: new branch

Post by PurpleGurl »

Z98 wrote:I have a hard time believing that your experiences reflect modern compilers. For at least x86/x64 compilers, the current state of the art are very good at producing performant code and will often do things that would never occur to a programmer. For example, an optimizing compiler may often inject nops into the instruction stream that might look like "bloat" but are actually for alignment purposes so that cache accesses are aligned. A programmer is far more likely to view these nops as "bloat" but removing them worsens performance. Your comment regarding how condition blocks are handled also sounds out of date, as what you've described is basically predication, which hasn't been necessary on desktop processors for years at this point. GPUs still use it, but CPUs have had pretty good branch prediction for quite some time now.

Several of the developers have done experiments with hand written assembly versus C code compiled to machine code and in almost all cases the compiler beat them because the compiler started pulling tricks that were non-intuitive.
Yes, I have probably been out of the loop for too long.

Yes, I used the .EVEN code word when optimizing by hand, so I know about 90h (nop) and all. I know how that works, and great for running on 286 machines (and 8086, 80186). That exact method was more crucial in the old days since there was no cache at all. I imagine it is more complex now that the CPUs are larger and you have to be double and quad word aligned now. (Oh, and there are other uses for nops such as patching by hand and things that should be discouraged like cracking.) I first thought the nops were bloat until I read a book on x86 optimization. While he gave things I used as absolutes for a number of years and had great results, it is still a good read now that things have changed since the basics of coming up with what he did remains the same (profiling and benchmarks). So if I get back into programming, I certainly want to do what you have done and do real case tests of the code and maybe do my own comparisons.

Another thing that can be mistaken for bloat is initialized date segments. Sure, you could code to make it uninitialized and get a smaller size, but in reality, there is no improvement in execution, since the same amount of memory would likely get allocated anyway. It is just that one approach hides this.

And what I found about the inefficient compilation of If....Then blocks (as opposed to independent If...Then statements) highlights a principle that is still useful, and that is using generated lists or disassembled code to see if what is happening is what is desired. While I still used PDQ and my own assembly code, I found ways to cause the compiler to compile more closely to desired. Just using regular If statements instead of the blocks with "And If" helped in terms of compile size (16 bytes less per line) and speed (it wasn't all executed at once and then evaluated, but executed only as needed, particularly when ordering the nesting from most likely to fail or be false to most likely to pass or be true). So reading the listings showed me how to make my QB code more efficient. I also figure that much of the time (unless using inline code as an optimization for speed), the smaller the compile size the better to a point, since the shortest distance is a straight line.

I remember all sorts of tricks I used, some which won't work for protected mode and Win32, such as self-modifying code. I learned first hand why self-modifying code was a bad practice. All it took was for a new CPU to come out with a different cache scheme, and you'd be likely to corrupt something (race condition I assume). I imagine a lot of tricks don't have the performance gains like they once did. I did things like try to use local jumps as much as possible, use XOR to clear registers., use the .EVEN code word as discussed above to get the 90h added to align the code, shift bits to multiply or divide, use INC and DEC for adding and subtracting by one, etc.

The only reason I took the discussion down the road was the granularity issue. I have never coded in C++, but I assume it is less granular than C, just by the comments about performance hits. Granularity issues are what I ran into with QuickBasic which was quite a higher level language. The language itself forced the bloat, not so much the compiler. When the language has commands that are awkward for the CPU rather than more natural, that adds more code, and if you want what is more native, then you have to add more code to bring it back. In those cases, assembly is a better option since you can write the exact functions you need, calling interrupts or APIs yourself rather than going through other code that takes the long detour around.

In QB, the reason for the FP library getting pulled in for string handling was if you tried to convert from a string back to integers as you'd get directly from the AX register after the service call. It was assumed you'd be using a 16-bit CPU, and such a conversion could cause an overflow, so a long integer would have likely been used to have room to check it before the conversion (the conversion would likely count off the number of characters before converting, and it is possible to have 5 characters and be valid or 5 and overflow, and there would need to be room). But if you need 32-bit on a 16-bit, then the compiler would have to call in the long integer support, and that might also attempt to pull in FP support since it could be done more efficiently if that hardware existed. That was interesting how that worked where software interrupts were remapped to where the machine "thought" a FPU was present, but if a real one was detected, everything would be mapped back to the actual hardware. It was a weird system of fixup values and cryptic names for library components.

Post Reply

Who is online

Users browsing this forum: Ahrefs [Bot] and 2 guests