Reverse engineering IRIX: The beginning.
#11
RE: Reverse engineering IRIX: The beginning.
Hey Raion, here are some points to watch out for after taking a look at your code:


I tend to divide coding into two phases which get repeated iteratively until the code is complete.

Phase no. 1 is getting the main code path done. You will see there's always at least one code path which has to be executed most of the time. This would be your main code path/code line.

Phase no. 2 is adding secondary code paths like error/exception handling, securing code, adjusting style aspects and commenting code.

In abstract:

while (CODE_INCOMPLETE)
{
phase_no1();
phase_no2();
}


Once you get your main line of code working, start completing the code with the different error/exception handling paths. Since you are starting as a developer, don't repeat the same mistakes others already did:

Code without error handling (i.e. code that just works for some cases) is __incomplete__ code. It is tedious to add. Yes. But necessary to get correct code, not just working code (i.e. only main code path).


Code:
fam.cpp:115:

snprintf(msg, sizeof(msg),"%c%d %d %d %s\n", code, reqnum, getuid(), groups[0],
filename);
msg2 = msg + strlen(msg) + 1;


Take above code as an example. You would read through man snprintf and add the corresponding if's or switch's to handle all possible return values.

From IRIX's man snprintf:


if (snprintf() == EXPECTED_NUM)
{
// Main code line
}
else if (... < 0)
{
// Error/exception code line
}


This would be error/exception handling.

And since we are at it: you'll see in the above code that you need the return value from snprintf() to evaluate it in more than one case. At least two approaches can help you here:

ret and if:

1. ret = snprintf();

if (ret == EXPECTED_NUM)
{
}
else if (ret < 0)
{
}


2. switching out

switch (snprintf())
{
case EXPECTED_NUM: ...
default:
        // < 0 case
}


Okay, so now the code is handled in all possible paths, which makes it robust, more stable, and complete. What else can be done? Securing it.


Securing code means guaranteeing correct code paths for all possible variable values. The state of your program must always (also: for any case) be deterministic:


ret = snprintf();

if (ret >= 9)
{
// Main code line
}
else
{
// Error/exception code line
}


This code handles all cases assuming groups and filename have been sanitized, validated and secured. Where does the

if (ret >= 9)

come from?


It comes from snprintf()'s format string: "%c%d %d %d %s\n" .

The resulting string, after snprintf(), will include, at least: +1 byte for %c, +3 bytes for all %d, +3 bytes for all spaces, +1 for %s, +1 byte for \n = 1 + 3 + 3 + 1 + 1 = 9.

This is assuming the filename can be relative, so at least one byte must be given. If filename must be absolute, you would add +2, not +1 ('/' + at least one more byte for the filename):

1 + 3 + 3 + 2 + 1 = 10


Ok, what else?

Sanitization vs. validation

Sanitizing refers to the structure of code or values (the real things).
Validation to the architecture of code or values (the logical things).

Sanitizing would be to check for correct characters in the filename string. Validation would mean to check if the given filename is actually valid as input:


if (ret >= 9)
{
        // Sanitize filename string
        // * Check if correct bytes are given


        // Validate filename
        // * Check if file exists, is accessible, ...
}


A last few words about coding style. There are many coding styles, but there's one important concept: consistency.

If you have chosen your style, stick to it, always:

Inlined types:

int FAMMonitorFile2(FAMConnection *fc,

fam.cpp:162


Stacked types:

int
FAMMonitorCollection(FAMConnection* fc,

fam.cpp:181

Same for parameters.


Line iffing:

if (userData) FAMStoreUserData(reqnum, userData);

fam.cpp:201


Float iffing:

if (!filename || filename[0] != '/')
        return -1;

fam.cpp:90


Wrapped iffing:

if ((reqnum = FAMFindFreeReqnum()) == ILLEGAL_REQUEST) {
return(-1);
    }

fam.cpp:161


Spacing and indentation:

return(-1);

fam.cpp:162

return (-1);

fam.cpp:169


Benefits of consistent coding style:

* more readable
* easier to maintain


As an example, if you try grepping a function name and you mixed both function declaration/definition types like above (inlined and stacked) you would probably need to grep twice or get much more output:

grep FuncName *          # Lots of output
grep '^FuncName' *       # Only catches FuncName if stacked
grep '^int FuncName' *  # Only catches FuncName if inlined


Some other remarkable points:

Code:
            while (*c != '\0') {
fe->filename[i++] = *c++;
    }
    last = fe->filename[i-1];
    fe->filename[i-1] = '\0';

fam.c:293

Just to verify this is correct. You copy a portion of a string, c, to filename and then set its (filename) last character to \0?


Code:
// A Necessary Global
static char* userEndExistArray;

fam.cpp:49


static int* userDataIndexArray;

void  FAMInit()
{
    static int firstTime = 1;

    // allocate arrays
    if (firstTime) {
        userDataArray = (void**) malloc(maxUserData * sizeof(void*));
        userDataIndexArray = (int*) malloc(maxUserData * sizeof(int*));
        userEndExistArray = (char*) malloc(maxUserData * sizeof(char*));
        for (int i=0; i<maxUserData; i++) {
            userDataIndexArray[i] = -1;
        }
        firstTime = 0;
    }

helpers.cpp:25

Should be:

Code:
userDataIndexArray = (int*) malloc(maxUserData * sizeof(int));
userEndExistArray = (char*) malloc(maxUserData * sizeof(char));



(12-29-2021, 02:42 AM)Raion Wrote:  As I speak I was successful in reconstruction of it from the documentation he provided. He provided a significant analysis of each and I did my best to match the CFront era C++ function design and setup. As a result the code quality is atrocious and likely will not be my finest work, now or ever. I am currently working in getting it compiling and will build a few programs for testing purposes.


If you did all this code by yourself then I'm impressed. It's not what I would expect from someone without system programming experience. This is huge, Raion.



Cheers
TruHobbyist
Developer

Trade Count: (0)
Posts: 195
Threads: 21
Joined: May 2018
Find Reply
12-29-2021, 12:19 PM
#12
RE: Reverse engineering IRIX: The beginning.
A lot of the code here is not really consistent and a significant reason was I took about three or four breaks weeks long on it and a lot of the coding style has been inherited from whatever I was working on around the same time.

There were three things that we did to ensure compatibility with the original library:

1. My colleague documented every function extensively and even gave me advice on starting points as well as examples written in pseudocode on how I should tackle these things

2. As I built every function I passed the code to him and he would verify functionality being identical to the original code and if it wasn't he would explain to me what they were doing differently. This explains a lot of the strangeness especially around some of the string handling.

3. A lot of weirdness of this is because of this being basically early 90s C++ before C++98 was a standard.

My colleague is under the strong impression that the original library was actually written in C before being tacked on with C++ elements and that a lot of it was written by multiple people and thus a significant reason for the weirdness of the code. I certainly would not have done things the way that the original authors did. Our first stage was to match functionality, then I corrected code and warnings and various small bugs.

I'm going to be taking a break from this, but when I come back my primary goal will be to move a lot of this code to proper C++ and see if we can get a functional FAM daemon built from this. I probably put close to 100 hours between me and my colleague on this project and while it was definitely fun there were a lot of stressful aspects of it. This is kind of a nondescript part of the OS but something that I was very interested in seeing brought to light.

One person cannot reverse engineer the OS at this rate, it would take me an eternity even if I become a master at doing this. However I'm going to press on with doing what I can and where I can't reverse engineer I will hopefully be able to pass the torch to somebody who's smarter than I am.

I'm the system admin of this site. Private security technician, licensed locksmith, hack of a c developer and vintage computer enthusiast. 

https://contrib.irixnet.org/raion/ -- contributions and pieces that I'm working on currently. 

https://codeberg.org/SolusRaion -- Code repos I control

Technical problems should be sent my way.
Raion
Chief IRIX Officer

Trade Count: (9)
Posts: 4,240
Threads: 533
Joined: Nov 2017
Location: Eastern Virginia
Website Find Reply
12-29-2021, 04:36 PM
#13
RE: Reverse engineering IRIX: The beginning.
Well the C-style languages are by far not my favorite. I also have to agree on a few of these points but the biggest point I would say wasn’t directly brought up in the previous post is the in-line IF or in-line conditional.

I remember a very famous Apple bug that appeared somewhere in 2007 or 2008 I believe we’re a whole certificate check didn’t take place because the person who modified the code at Apple didn’t understand the difference between an in-line IF and a block-style.

It actually made it out to production and made the news that’s when I sort of learned about that shorthand because I was never taught it originally.


So personal advice number one when it comes to C-style programming do not use implied coverage like the single line conditional statements unless you have an incredibly complex nested functional expression single liner.

Even though it “wastes space”. Use a block-style conditional for all things that aren’t a functional expression.
weblacky
I play an SGI Doctor, on daytime TV.

Trade Count: (10)
Posts: 1,716
Threads: 88
Joined: Jan 2019
Location: Seattle, WA
Find Reply
12-29-2021, 09:44 PM
#14
RE: Reverse engineering IRIX: The beginning.
That's just how CFront handled it, I was using MIPSPro in CFront mode to match functions.

I am going to refactor code for this, but the first priority was compatibility, otherwise it's not really libfam anymore.

I'm the system admin of this site. Private security technician, licensed locksmith, hack of a c developer and vintage computer enthusiast. 

https://contrib.irixnet.org/raion/ -- contributions and pieces that I'm working on currently. 

https://codeberg.org/SolusRaion -- Code repos I control

Technical problems should be sent my way.
Raion
Chief IRIX Officer

Trade Count: (9)
Posts: 4,240
Threads: 533
Joined: Nov 2017
Location: Eastern Virginia
Website Find Reply
12-29-2021, 09:58 PM


Forum Jump:


Users browsing this thread: 1 Guest(s)