Tag Archives: crash

Murex logs and nothing to do with lognormal

When you work with Murex, it is likely that you will encounter issues. (if you don’t you’re indeed very long sighted or you should definitely buy some lottery ticket).

Anyway, it is very likely that you will soon end up in the logs directory of the app directory and then what… You have the launcher logs, process pids, many folders, etc… Knowing which logs to open or check is quite daunting but fortunately Unix/Linux has so many tools to do the search for you that you’re in for a treat.

When browsing the logs, you have  2 best friends : a more experienced consultant (if he has a PAC background, you definitely need him as a friend) and… Google! Google is my go-to whenever I have question as to how to find something based on any criteria.

Example:

Last night I was running our EOD script and wanted to check the answer files which had terminated successfully. Quick google search (“how to find files containing string unix”) and I got: fi

find / -type f -exec grep -l "text-to-find-here" {} \;

Transformed it into:

find . -name '*answer*' -exec grep -l "Successfully" {} \;

Well, it worked but then I realize that I actually only cared about the ones which failed and that was getting too painful to do it myself (I was also getting tired which does not help). Another quick google search and

find .  -name '*answer*' -exec  grep  -H -E -o -c  "Successfully"  {} \; | grep 0

And it gave me the list of logs which failed making it much easier to investigate the failing ones.

So that was my example for end of day. But you could also find the files which were modified in the last 5 minutes, narrowing down a lot the number of items to browse through.

find . -cmin -5

And then you can do another search for OutOfMemory or similar if the filename/filepath is not enough to tell you which logs you should check.

And I would recommend to have a go through the logs quickly. If indeed you can’t find anything, of course you can call in a friend as there is no 50-50 or ask the audience sort of joker. My problem with calling in a friend, is that it gets always more tempting as a quick way to solve a problem and you don’t build up your skills as much.

What’s more frustrating than having a problem, calling Murex and checking the very first file: <servicenamelogfile>.log and get : user XXXX not authorized to log in? (or any similar error which could have been fixed in 2 minutes once you have the error cause).

What about you dear reader? Any other command/tips you would recommend?

Murex error messages through the ages

Today, when you have a crash, the message is pretty standard and very intelligible but it was not always like that let’s review what they were.

Murex error messages : the X-windows era

Time: time before 2000/2001

At the time of X windows, there was no error message as such. If a session was crashing, basically the process would stop straight away on the server side and the window would simply disappear on the client side. Did you really have a crash? Or did you close it by mistake? You can never be sure and it was always confusing. Probably the best time for supporting customers 🙂

Murex error messages : Fatal error

Time: from 2000/2001 to ca 2005

When Murex released the Mxg2000 version, the software architecture changed. On the client side, you had a thin client running. As such, if the process crashed on the server side, you then had to display something on the screen. So they settled for probably the largest error message possible: Fatal error for session detected for process XXX. Session will now close (I’m not completely sure I nailed it exactly as it’s been some times!). All I remember is that the error could span on your whole screen especially if you were just using a 15′ screen.

Murex error messages : Maybe?

Time from ca2005 to ca2009

Alright, to be fair this was the error message that motivated me to write this post: The error message ended in Service is maybe dead. Technically it was correct: the client could not connect anymore to the server process. So maybe it was dead.

Problem is that for end users the results was the same: close and restart a new session. And if you had few crashes in a row, you would wonder if Murex was mocking you. Yeah maybe it was 🙂

Murex error messages : Modern times

Time from ca2009 to now

Sadly (for this post I mean), Murex is crashing less and less and we probably reached the most clear and intuitive message to date: “Process is not valid anymore. Session will now close.” It’s not too long it’s concise. Unfortunately you can get access to the java stack that many people still like to sent but as the crash occurred on the server side, the java stack just shows that the client cannot connect anymore.
Anyway, after few iterations, the message is now crystal clear!

Dear reader, my memory is not what it used to be. So if you see an error (haha) or something missing, please let me know and I’ll correct it.

Murex assistance or handover, how much information does one need?

I had that conversation last week: in order to provide Murex assistance, how much information is required?

Murex consultants (from Murex) providing support usually don’t need design documents, they can be useful sometimes but most of the time they will intervene on single specific problems. Access to an environment is ample information and they will dig in to retrieve whatever information is necessary. If you ask them as to why a crash is occurring, a task stopped working or a formula is not triggered, they do not need to look at the full picture. They can only focus on the single issue and all the information will be at hand (remember that Murex gives usually easy access to all the information used to produce a result: rate curve will give all pricing details, formula debugger will show code execution as well as variable values.

The other reason is that usually Murex consultants are supporting multiple customers and reading through the design document would take a significant amount of time while yielding no direct advantage to answer questions.

On the other hand, internal support team members  need to read the design documents if they haven’t been involved in the project. The design document will effectively describe the big picture and explain how everything fit together.

But thinking it over about the time I was supporting different customers at Murex, we would either ask the customers when something seemed not right. Sometimes it would indeed be a design decision and it would not make sense to go against it.
An example: few weeks ago I wrote about rate propagation and basically KMQA/IS is the most logical choice. But you will find customers using KZQC, especially in the Fx world, and this is perfectly normal.

Of course, there is a real added value for the Murex consultants to know about the design choices. This knowledge could come from reading the design document or more often from frequent support and proximity to the customer. And I was indeed more efficient in my efforts when I clearly knew why such a configuration was selected.

Does this mean that Murex consultants should read and go through design documents? I don’t think so, except if they are supporting only that customer (or very few customers). Their job is to provide information about the software, help you get most of it and solve your issues. But the internal support team can provide them assistance and guidance as to why a solution would not be suitable in the customer’s context. Good support is always at the encounter of Murex knowledge and customer configuration knowledge.

Debugging crashes

Alright, this post will be very much focused on how to debug issues and will be low level.

Murex does crash and number of times, working for Murex we did not get all the information we needed to pinpoint precisely the problem.

The system crashes… what to do?

The first thing is to gather information from the user encountering the crash and understand what was the user doing, if it happened regularly, etc… One important thing to check, if the crash is recent: was anything changed recently (change of configuration or setting for example).

The second step is to reproduce the crash on your side (hopefully without the user). If the crash is too random or cannot be reproduced at will, the best is then to gather the core files and share them with Murex to obtain more information as to what the session was doing at the time.

Assuming the crash can be reproduced, the next step is to narrow it down. This step will actually speed greatly the turnaround from Murex side on the problem. And it might also lead to a workaround to the crash. Depending as to where the crash is happening, there could be multiple way of proceeding. I’ll cover the most common case: crash happening while loading multiple transactions.

If you’re lucky, the transaction scan watching might give you the answer. Transaction scan watching basically logs in when a transaction is opened for valuation and then logs it out. So each transaction should appear an even number of times. If one does appear an odd amount of time (and it’s the last one on the list) you might then have a winner. To confirm that the transaction is indeed the culprit, just try your process (accounting entries generation, simulation) on that single trade and see if the crash occurs.

To use transaction scan watching, go to help-monitor, SPB Information and choose transaction scan watching (Zap). This will delete the existing log (transaction scan watching is only used for debugging, you do not endanger anything by zapping it out, except of course if someone else is doing debugging at the same time!). Then choose activate and run your process again. Once it crashes, with a new session choose Transaction scan watching (Display) to check the log.

If the transaction scan watching fails to focus on a single trade, it will give you trades which have already been processed and might also give you a trend: all IRS have passed, it failed while doing loans. Be careful with this info as the way of scanning transactions is not always clear.
Without the transaction scan watching, to narrow it down, you then need to break it down by portfolios or deal type (usually the knowledge of what was changed recently helps as you can focus on what you suspect being wrong first). Don’t hesitate to use filters on trades inserted today or which had events performed on them during the day.

Once you have the trade, check it out (check also the static data and market data it uses) to understand where is the issue coming from. If you can fix it, it’s a win .
If you can’t, you can then test that the issue occurs with a new trade for which you picked the same pattern. And in that case, congrats you’ve found the bug and can raise it as such.