Category Archives: Basic+

String comparison in OpenInsight – Part 3 – Linguistic Mode

Welcome to the final part of this mini series on the string comparison mechanics in OpenInsight. In the first two parts we reviewed how this task is currently handled in both ANSI and UTF8 modes, but this time we’ll take a look at a new capability introduced for the next release which is called the “Linguistic String Comparison Mode”.

As we’ve seen previously, there is certainly room for improvement when dealing with string comparisons and sorting in non-English languages, mainly due to the burden placed on the developer to maintain the sorting parameters, especially once the requirements extend beyond the basic ANSI character set. There is also no advantage taken of the capabilities of Windows itself, which provides a comprehensive National Language Support (NLS) API for testing strings for linguistic equality.

What is “linguistic equality”?

If you’re unfamiliar with the term “linguistic equality” it essentially means comparing strings according to the language rules of a specific locale thereby providing appropriate results for a user of that locale. For example, consider the following cases that illustrate how comparisons differ for the same characters in different locales:

  • Many locales equate the ae ligature (æ) with the letters ae. However, Icelandic (Iceland) considers it a separate letter and places it after Z in the sorting sequence.
  • The A Ring (Å) normally sorts with merely a diacritic difference from A. However, Swedish (Sweden) places the A Ring after Z in the sorting sequence.

In a standard OI system these sort of rules would need the developer to define the collation sequence records that represent them, which is simply duplicating effort when Windows itself is easily capable of handling this for us.

Using Linguistic Mode

In order to utilize this API without impacting current systems we have introduced a new “mode” into OpenInsight that allows you to determine exactly when you wish to enable linguistic support. This mode comprises three elements:

  1. Mode ID – this is the mode itself, which can be one of the following values:
    • (0) Normal, non-linguistic mode.
    • (1) Linguistic mode.
  2. Mode Flags – A set of bit-wise flags for use with the Linguistic mode.
  3. Mode Locale – A locale identifier for use with the Linguistic mode (defaults to the current user’s locale).

It’s simply a case of setting the mode when you want it to apply to sorting and case-insensitive operations, and turning it off when you don’t. Just like with Extended Precision Mode you can set a default mode for your application and then adjust this at runtime as desired.

(Note that using the Linguistic mode is not affected by OpenInsight’s ANSI or UTF8 mode, as the string comparisons are processed “outside” in Windows itself.)

The following five functions are used to control the Linguistic Mode:

  • GetDefaultStrCmpMode – returns the default application mode settings.
  • SetDefaultStrCmpMode – sets the default application mode.
  • GetStrCmpMode – returns the current mode settings.
  • SetStrCmpMode – sets the current mode.
  • GetStrCmpStatus – returns the status of a string comparison operation.

Along with this set of equates:

  • RTI_STRCMPMODE_EQUATES
  • MSWIN_COMPARESTRING_EQUATES

Example:

$Insert RTI_StrCmpMode_Equates
$Insert MSWin_CompareString_Equates

// Set the mode to Linguistic, sorting digits as numbers, case-insensitive, 
// and with linguistic casing, using the "en-UK" locale
SCFlags = BitOr( LINGUISTIC_IGNORECASE$, NORM_LINGUISTIC_CASING$ )
SCFlags = BitOr( SCFlags, SORT_DIGITSASNUMBERS$ )

Call SetStrCmpMode( STRCMPMODE_LINGUISTIC$, SCFlags, "en-UK" ) 

// Now do some sorting ...
Call V119( "S", "", "A", "L", data, "" )

Full details on each of these functions can be found at the end of this post, but let’s take a look in more detail at the each of the mode settings:

Mode ID

This is an integer value that controls how string comparisons are made:

When set to “0” then the application will run in “normal” mode, which means that string comparisons will use the methods described in parts 1 and 2 of this series. The Mode Flags and Mode Locale settings are ignored.

When set to “1” the application uses the Windows CompareStringEx function for string comparisons instead. The Mode Flags and the Mode Locale settings will also be used with this.

Mode Flags

This setting is a integer comprising one or more optional bit-flags that are passed to the Windows CompareStringEx function when running in Linguistic Mode (It may be set to 0 to apply the default behavior). A full description of their use can be found in the Microsoft documentation for the CompareStringEx function, but briefly these are:

FlagDescription
LINGUISTIC_IGNORECASE$Ignore case, as linguistically appropriate.
LINGUISTIC_IGNOREDIACRITIC$Ignore nonspacing characters, as linguistically appropriate.
NORM_IGNORECASE$Ignore case.
NORM_IGNOREKANATYPE$Do not differentiate between hiragana and katakana characters.
NORM_IGNORENONSPACE$Ignore nonspacing characters.
NORM_IGNORESYMBOLS$Ignore symbols and punctuation.
NORM_IGNOREWIDTH$Ignore the difference between half-width and full-width characters.
NORM_LINGUISTIC_CASING$Use the default linguistic rules for casing, instead of file system rules.
SORT_DIGITSASNUMBERS$Treat digits as numbers during sorting.
SORT_STRINGSORT$Treat punctuation the same as symbols.

Mode Locale

This is can be the name of the locale to use (like “en-US”, “de-CH” etc.), or one of the following special values:

  • “0” or null – Use the current user locale (LOCALE_NAME_USER_DEFAULT).
  • “1” – Use the current OS locale (LOCALE_NAME_SYSTEM_DEFAULT).
  • “2” – Use an invariant locale that provides stable locale and calendar data (LOCALE_NAME_INVARIANT)

The Linguistic Mode and Basic+

The following Basic+ operators and functions are affected by the Linguistic Mode :

  • LT operator
  • LE operator
  • EQ operator
  • NE operator
  • GE operator
  • GT operator
  • _LTC operator
  • _LEC operator
  • _EQC operator
  • _NEC operator
  • _GEC operator
  • _GTC operator
  • IndexC function
  • V119 function
  • Locate By statement
  • LocateC statement

Note that when used with the case-insensitive operators and functions (such as _eqc, IndexC() etc.) the LINGUISTIC_IGNORECASE$ flag is always applied if the NORM_IGNORECASE$ has not been specified.

Performance considerations

Using the Linguistic Mode can impact performance for two reasons:

  1. There is just more work to do – comparison of strings using more complex rules will always be slower that a simple comparison of ordinal byte values or code points.
  2. The strings must be copied and transformed into UTF16 (wide) strings before passing to the Windows CompareStringEx function. While this is not a slow operation in and of itself it will add some overhead.

Because of this Linguistic Mode is not enabled by default – you are free choose when to apply it yourself.

String Comparison Mode functions

GetDefaultStrCmpMode function

This function returns an @fm-delimited dynamic array containing the current default string comparison mode settings for the application in the format:

<1> Mode
<2> Flags
<3> Locale

Example:

$Insert RTI_StrCmpMode_Equates

DefSCM  = GetDefaultStrCmpMode()
DefMode = DefSCM<GETSTRCMPMODE_MODE$>

SetDefaultStrCmpMode function

This function sets the default string comparison mode for an application. The mode is set to these default values for each new request made to the engine (i.e each event or web-request).  This is to protect against situations where an error condition could force the engine to abort processing before the mode could be reset, thereby leaving it in an unknown state.

This function takes three arguments:

NameDescription
ModeSpecifies the default mode to set: “0” for Normal mode, or “1” for Linguistic Mode.
FlagsBitmask integer that specifies the default flags to use when in Linguistic Mode
LocaleSpecifies the name of the default locale to use.

Example:

$Insert RTI_StrCmpMode_Equates
$Insert MSWin_CompareString_Equates

// Set the default mode to Linguistic, sorting digits as numbers, using the
// user's locale
SCFlags = SORT_DIGITSASNUMBERS$

Call SetDefaultStrCmpMode( STRCMPMODE_LINGUISTIC$, SCFlags, "" ) 

GetStrCmpMode function

This function returns an @fm-delimited dynamic array containing the current string comparison mode settings for the application in the format

<1> Mode
<2> Flags
<3> Locale

Example:

$Insert RTI_StrCmpMode_Equates

CurrSCMode = GetStrCmpMode()<GETSTRCMPMODE_MODE$>

SetStrCmpMode function

This function sets the current string comparison mode for an application. Note that the mode is set to the default values for each new request made to the engine (i.e each event or web-request).  This is to protect against situations where an error condition could force the engine to abort processing before the mode could be reset, thereby leaving it in an unknown state.

This function takes three arguments:

NameDescription
ModeSpecifies the mode to set: “0” for Normal mode, or “1” for Linguistic Mode.
FlagsBitmask integer that specifies the flags to use when in Linguistic Mode
LocaleSpecifies the name of the locale to use.

Example:

$Insert RTI_StrCmpMode_Equates
$Insert MSWin_CompareString_Equates

// Set the mode to Linguistic, sorting digits as numbers, case-insensitive, 
// and with linguistic casing, using the "en-UK" locale
SCFlags = BitOr( LINGUISTIC_IGNORECASE$, NORM_LINGUISTIC_CASING$ )
SCFlags = BitOr( SCFlags, SORT_DIGITSASNUMBERS$ )

Call SetStrCmpMode( STRCMPMODE_LINGUISTIC$, SCFlags, "en-UK" ) 

// Now do some sorting ...
Call V119( "S", "", "A", "L", data, "" )

GetStrCmpStatus function

While it is unlikely that the CompareStringEx function will raise any errors it is possible if incompatible flags or parameters are used. In this case Windows returns an error code which may be accessed in Basic+ via this function (See the CompareStringEx documentation for more details on error values).

Example:

$Insert RTI_StrCmpMode_Equates
$Insert MSWin_CompareString_Equates

// Set the mode to Linguistic, sorting digits as numbers, case-insensitive, 
// and with linguistic casing, using the "en-UK" locale
SCFlags = BitOr( LINGUISTIC_IGNORECASE$, NORM_LINGUISTIC_CASING$ )
SCFlags = BitOr( SCFlags, SORT_DIGITSASNUMBERS$ )

Call SetStrCmpMode( STRCMPMODE_LINGUISTIC$, SCFlags, "en-UK" ) 

// Now do some sorting ...
Call V119( "S", "", "A", "L", data, "" )

SCError = GetStrCmpStatus()
If SCError Then
   ErrorText = RTI_ErrorText( "WIN", SCError )
End

Conclusion

This concludes this mini-series on OpenInsight string comparison processing. Hopefully you’ll find the new Linguistic Mode useful in your own applications, bearing in mind that some of the custom sorting options, such as “Treat Digits As Numbers”, can have a use in any application beyond simply dealing with non-English language sets.

Some of the more astute readers among you may have noticed that no mention of indexing has been made so far with respect to Linguistic Mode. This is because work is currently ongoing in this part of the system, and we’ll give you more details regarding this at a later date.

Further reading

More information on this subject may be found here:

String comparison in OpenInsight – Part 2 – UTF8 Mode

Welcome to the second part of our mini-series explaining the mechanics of how string comparisons are handled in OpenInsight. In our previous post we looked at the inner workings when running in ANSI mode – this time we’ll look at UTF8 mode instead.

(Note that we’ve included some Basic+ pseudo-code examples in this post to illustrate more clearly how some parts of the comparison routines work. These are simplifications of the actual C++ internal functions and not actual code from the system itself.)

String comparison in UTF8 Mode

In UTF8 mode characters can be multi-byte and therefore have a value greater than 255 (normally referred to as their “code point”, or in Basic+ terms, the Seq() value of a character), so this means that the standard ANSI-mode method described previously cannot be used. Instead, a slightly different approach is taken to allow higher code points to be included in custom sorting.

When the system is loaded the UTF8 library creates an internal character-map (called the “ANSI-map”) which is a 256-element array (0-255) of code-point values. This is initialized to the same values as the standard ANSI character set, i.e. position 65 will have the code point for the ANSI character with the value of 65, position 230 will have the code point for the ANSI character with the value of 230 and so on.

This ANSI-map this can be changed at runtime so that code points that are higher than 255 can be included, and code points that appear in the ANSI-map are always sorted lower than those that aren’t, regardless of their actual value. The following functions (exported from RevUTF8.dll) are used to query and update the ANSI-map:

GetAnsiToUnicode – returns the code point for a specified map element.

// MapIndex - must be an integer between 0 and 255 
CodePoint = GetAnsiToUnicode( MapIndex )  

SetAnsiToUnicode – updates the code point for a specified map element.

// MapIndex - must be an integer between 0 and 255
// NewCodePoint - integer value of the code point to set
Call SetAnsiToUnicode( MapIndex, NewCodePoint )

UTF8 comparison method

When comparing two characters we first need to find a “sort index” for a character which is determined as follows:

  • Get the code point value for the character being compared.
  • Look in the ANSI-map using the low byte value of the code point as the index. If the value at that position is the same as the character code point then the sort index is set to that index and it is marked as “found”.
    • E.g. If the character has a code point value of 458 (0x1CA) then it’s low-byte value is 202 (0xCA). If the ANSI-map contains the value 458 at index 202 then the sort index is set to 202 and it is marked as “found”.
  • Otherwise, scan backwards through the ANSI-map looking for an element that has the same value as the code-point for the character. If we match it then the sort index is set to the same position and it is marked as “found”.
// Pseudo-code
dim ansiMap( 255 )

sortIndex = -1 ; // Not found
codePoint = seq( ch )
testIndex = bitAnd( codePoint1, 0xFF )
if ( ansiMap( testIndex ) == codePoint ) then
   // Found
   sortIndex = testIndex
end else
   // Not found
   for testIndex = 255 to 0 step -1
      if ( ansiMap( testIndex ) == codePoint ) then
         // Found and exit loop
         sortIndex = testIndex
      end         
   next
end

Once this has been done for both characters we use the following comparison procedure:

  • If one of the characters is marked as “not found” and the other as “found”, the latter is always sorted before the former.
  • Otherwise we now proceed in a manner similar to the ANSI comparison:
    • If we are using a collation sequence the sorting value for each character is extracted from the appropriate sequence using the sort index we determined above.
      • E.g. if the sort index was 202 then the sort value for the comparison is the value of the byte at position 203 (1-based) in the sequence.
    • If we are using a case-insensitive comparison without a collation sequence the two sort indexes (not values!) are masked with 0xDF and compared.
    • If we are using a case-sensitive comparison without a collation sequence the two original code-point values are compared.
// Pseudo-code
begin case
   case ( sortIndex1 == -1 ) and ( sortIndex2 == -1 )
      // Both Non-ANSI-mapped - use a simple code point compare
      cmpVal = codePoint1 - codePoint2
   case ( sortIndex1 == -1 )
      // sortIndex2 was found in the ANSI map so it's sorted lower
      cmpVal = 1
   case ( sortIndex2 == -1 )
      // sortIndex1 was found in the ANSI map so it's sorted lower
      cmpVal = -1
   case OTHERWISE$
      // Both are ANSI mapped
      begin case
         case hasCollationSequence
            sortVal1 = seq( collationSequence[sortIndex1+1,1] )
            sortVal2 = seq( collationSequence[sortIndex2+1,1] )
            
         case isCaseInsensitive         
            sortVal1 = bitAnd( sortIndex1, 0xDF )
            sortVal2 = bitAnd( sortIndex2, 0xDF )
            
         case OTHERWISE$
            sortVal1 = codePoint1
            sortVal2 = codePoint2
            
      end case
      
      cmpVal = sortVal1 - sortVal2
      
end case

So, this system works pretty well out of the box for languages that can be expressed using the ANSI character set, but for other languages much of the burden falls on the application developer to maintain and tune the language settings and collation sequences to their requirements. This could require considerable effort and ignores much of the functionality provided by the OS itself, so in the next post we’ll take a look at how this is being addressed in the next release.

String comparison in OpenInsight – Part 1 – ANSI Mode

Developers with systems that require Unicode processing will be please to know that the next release adds some new and much-needed string comparison functionality, to which we have given the super-catchy title of the “Linguistic String Comparison Mode”. However, before we get into the details of that, it’s worth taking a look at how string comparison is currently handled in the system as it has a huge effect on how your data is sorted, and it seems to be one of those murky and little understood areas which includes arcane terms like “collation sequences” and “ANSI Maps”.

(Note that we’ve included some Basic+ pseudo-code examples in this post to illustrate more clearly how some parts of the comparison routines work. These are simplifications of the actual C++ internal functions and not actual code from the system itself.)

String comparison in ANSI mode

String comparison in ANSI mode (i.e. where every character has a value between 0 and 255) is usually a straightforward exercise of comparing the byte value of characters against each other. However, it is possible to customize this using the classic ARev technique of “collation sequences”, which allow a developer to assign a custom weighting, or “sort value”, to a specific character when it is compared against others.

Collation sequences are contained in records in the SYSENV table with a prefix of “CM_”. By default OpenInsight includes the following “CM_” records:

  • CM_ANSI
  • CM_ISO
  • CM_US

A collation sequence may be attached to a language definition (“LND”) record by specifying the “CM_” key in field <10>, and when you load the LND record the collation sequence is loaded too. For example, in the German LND record (LND_GERMAN_D) you will see CM_ISO has been specified like so:

SYSENV – LND_GERMAN_D

Each CM record may contain one or two fields, each field containing a block of 256 bytes that form a collation sequence:

<1> Case-sensitive sequence (required)
<2> Case-insensitive sequence (optional)

To give a character a weighting simply find it it’s 1-based index in the sequence, and enter the byte value you want to give it instead. For example, if we wanted to give the “3” character a sort value of “87” then we do the following:

  • Find the ANSI byte value of the character “3”, which is 51 (0-based value).
  • In field<1> we replace the character at index 52 (1-based value) with a character that has the ANSI byte value of 87, which is “W”.

Points to note:

  • The first field (sequence) in a CM record is required, while the case-insensitive sequence is not – the system will use a default method for case-insensitive comparison as discussed below.
  • A collation sequence must be 256 characters in length. If not the system will not use it.
  • The last two characters in the sequence (255 and 256, 1-based) are always set to the byte values of 254 and 255 (@fm and @rm).
  • Some LND records have data in field<11> – this is for historical reasons and may be ignored.

So, now that we know what collation sequences are, we can see how the system uses them at runtime when it needs to compare strings values.

ANSI comparison when using a collation sequence

If a collation sequence has been specified (for either a case-sensitive operation, or a case-insensitive operation) the system uses it to extract the sorting value of the character being processed:

  • Get the ANSI byte value of the string character being compared.
  • Use the ANSI byte value as an index into the appropriate collation sequence.
  • The sorting value for the string character is the byte value at that index.
    • E.g. The character “3” has an ANSI byte value of 51 – using this as an index we get byte 52 (1-based) from the collation sequence – the value of that byte is the sorting value.
// Pseudo-code
ansiVal1 = seq( ch1 )
sortVal1 = seq( collationSequence[ansiVal1+1,1] )

ansiVal2 = seq( ch2 )
sortVal2 = seq( collationSequence[ansiVal2+1,1] )

cmpVal   = ( sortVal1 - sortVal2 )

ANSI case-insensitive comparison without a collation sequence

Case-insensitive comparison of two characters without a collation sequence uses the classic ASCII-style technique of bit-masking both character ANSI byte values with a value of 0xDF and then comparing the results.

// Pseudo-code
sortVal1 = bitAnd( seq( ch1 ), 0xDF )
sortVal2 = bitAnd( seq( ch2 ), 0xDF )

cmpVal   = ( sortVal1 - sortVal2 )

ANSI case-sensitive comparison without a collation sequence

Case-sensitive comparison of two characters without a collation sequence simply compares the ANSI byte value of the characters against each other.

// Pseudo-code
cmpVal = ( seq( ch1 ) - seq( ch2 ) )

That concludes this look at ANSI string comparison – in the next post we’ll take a look at how string comparison is handled in UTF8 mode, which is, as you might expect, a little more complex.

LocateC, GSClear and Expendable

The next release of OpenInsight sees some updates to Basic+ with the addition of two new statements and the return of an old Arev compiler keyword.

The LocateC statement

A counterpart to the well-known IndexC function, the LocateC statement simply performs a case-insensitive “locate” operation on a string, using the normal Locate statement syntax like so:

LocateC substring In string Using delimiter Setting pos Then/Else

The GSClear statement

This statement clears down the internal GoSub return-stack of the currently executing program, so that a subsequent Return statement will return to the calling program rather than jump back to an originating GoSub statement as normal. This is usually used to handle severe error conditions where an “early return” to the caller is desirable and there is a need to pass a value back to the caller. E.g:

Compile Function Test( Void )

   GoSub DoTheThing

Return "The Thing Was OK"

DoTheThing:
   
   GoSub CheckTheStuff

Return

CheckTheStuff:

   If TheStuffIsReallyBad Then
      // Return directly to the caller
      GSClear 
      Return "The stuff is really bad!!!"
   End

Return

This is basically the same as performing an Abort operation, but allows you to return a value, which Abort does not.

The Expendable statement

Marking a program as “expendable” was a useful feature in Advanced Revelation that instantly removed a program from the cached program array after it had finished executing. This was very useful in networked development scenarios where programs being edited could be loaded by different workstations – they could still get an updated version without having to restart the application or issue a manual GarbageCollect statement to clear the cache. This is a similar scenario to using a current tool like the OpenInsight EngineServer, where individual engines can cache programs during development – it can become tedious to force them to load an updated version after making changes.

To mitigate this the Expendable keyword has been reintroduced and is simply added to the program header declaration like so:

Compile Expendable Function( Param1 )
   // Do stuff
Return RetVal

or, if you don’t use the optional “Compile” keyword:

Expendable Function( Param1 )
   // Do stuff
Return RetVal

With this the engine now deletes the program from the cache after all references to it have been removed from the call stack, forcing it to reload from disk the next time it is called.

What’s “this”?

As I’m sure many of you will know, when you’re working with object oriented languages like C++, JavaScript and VB , the compiler provides you with a keyword (e.g. ‘this’ or ‘Me’) that you can use as a reference to the specific instance of an object under which the code is currently executing.  This provides a neat and easy way to access details about the current context when responding to events and methods, and generally improves the clarity of the code.

However, a ‘this/Me’ construct is not something we’ve really had in OpenInsight, because when we write our event handling code the system explicitly passes the name of the current object as the first argument called “CtrlEntID”, so in our event scripts we can use that instead.  Obviously this works well enough, but over the years I’ve found some situations where it would be nice to go a little further:

  • In many of the OpenInsight training courses I’ve run, one of the most common questions I get asked from new students is: “what is the equivalent of ‘this’, or “where is the ‘Me’ keyword”?   Having some sort of “this/me” construct in Basic+ would make learning the system much easier for them, and the name “CtrlEntID” hardly seems slick!
  • With the trend away from using event scripts and moving code into commuter modules the name “CtrlEntID” is no longer enforced – it can be named anything the author of the commuter module wishes, leading to a possible loss of clarity (for example in my own commuter modules I always use the variable name “Object” in place of “CtrlEntID”, but that’s just my convention, and is something subsequent code maintainers must adopt).
  • As code becomes deeper and more nested passing the “CtrlEntID” variable to each subsequent procedure as an argument becomes more of a chore, and I’ve seen global variables used in place of this which can lead to code that is difficult to maintain.

Of course, we do have the “@Window” system variable, which contains the name of the parent WINDOW instance for the currently executing context, so we’re nearly there, but unfortunately that’s not the same as ‘this/Me’ unless you’re responding to a WINDOW event.

So, with the release of version 10.0.8 we’ve now gone the whole way and added a new system variable called:

@Self

When an event is triggered this variable contains the full name of the control instance under which the event code is executing, just like @Window contains the name of the parent WINDOW instance.

E.g:

// Using CtrlEntID
Name = Get_Property( CtrlEntID, "TEXT" )

// Using @Self
Name = Get_Property( @Self, "TEXT" )

We’ve also included two more system variable names as synonyms for @Self as well:

@This
@Me

You may use these in place of @Self if you’re more inclined to use names that are familiar from another language (@Self was chosen because it is already referenced in OpenInsight as a pseudo-variable name when defining QuickEvents).

Hopefully, moving forward, this small addition may help to maintain a cleaner code-base and make teaching new students a little easier.

 

Extended Precision Mode

One of the common problems faced by many programmers is the ensuring the accuracy of floating point arithmetic, mainly due to the rounding errors that can occur when calculations are performed on numbers that cannot be fully represented in a binary floating-point format (The size of this format determines the precision, and therefore the accuracy of calculations).

OpenInsight is built with the Microsoft C++ compiler, which limits the floating-point format to 64-bits (known as the “Double” type), and Basic+ variables that represent non-integer numbers use this type internally.  The use of this 64-bit format is one of the problems commonly noticed by developers who have moved their systems from the older Advanced Revelation platform to OpenInsight, because the internal floating-point format for R/Basic variables was the 80-bit “Long Double” type instead.  This means that calculations from the same RBasic/Basic+ code running on ARev and OI may produce different results due to this reduction in precision.

(The Long Double is unfortunately not supported by the MS C++ compiler as, according to the VC++ compiler team back in 2006: “The major reason is that FP code generation has been switching to the use of SSE/SSE2/SSE3 instruction sets instead of the x87 FP stack since that is what both the AMD and Intel recent and future chip generations are focusing their performance efforts on. These instruction sets only support 32 and 64 bit FP formats”).

The “integer workaround”

A common workaround for precision problems is to control the calculations at each step so you are effectively dealing with integer operations – this can be done by using the “MD” IConv/OConv functions to control the precision, or by simply multiplying values by a known factor and dividing the result again afterwards.

Both of these methods can make the code messy and obscure the intent, and, depending on the values used, may result in integer overflow if they are too large (though this is less likely to happen on a 64-bit system like OpenInsight 10 however).

The “Extended Precision Operators”

Another option to help mitigate these calculation problems are the Extended Precision Operators that were introduced in OpenInsight 9.3:

  • _addx
  • _subx
  • _mulx
  • _divx

These do allow you to specify the precision to use at each step, but suffer from the need to rewrite existing code to use them. It is also easy to lose precision if you inadvertently use a “normal” operator in between them as well, as the following example demonstrates:

a = _divx( c, b, 24 )   ; // "a" is full precision (24)
if ( a > 1 ) then       ; // "a" converted to "double" by ">" operator
   // do stuff
end
b = _addx( a, z )       ; // "a" is no longer full precision passed to _addx

The new “Extended Precision Mode”

With the upcoming release of version 10.0.7, OpenInsight now supports a new feature for dealing with high precision calculations called Extended Precision Mode (EP Mode). When this mode is enabled all the normal maths operators switch into an “extended mode” and the results of calculations are stored in Basic+ variables using a new internal type introduced specifically for maintaining the precision.

This means that existing code can be reused by simply adding statements to activate and deactivate the mode as needed via the new SetEPMode() function , e.g:

Compile Function Mickey_Mouse_EPM_Test( void )

   Declare Function GetEPMode
   $Insert Logical
 
   epMode = GetEPMode()

   Call SetEPMode( FALSE$ )
   GoSub runTest
   
   Call SetEPMode( TRUE$ )
   GoSub runTest
   
   Call SetEPMode( epMode )

Return

runTest:
   a = 10.12346 * (22/7); 
   b = 100000; 
   For x = 1 To 1000 
      b -= a 
   Next
   Call Send_Dyn( b ) 
Return

Output:

(Normal) 68183.4114285739 
(EPMode) 68183.41142857142857142857142857143725

The following operators are affected by the EP Mode:

+
+=
-
-=
*
/
== or =
!= or <>
>
<
>=
<=
mod()
int()
abs()
atan()
cos()
exp()
ln()
pwr()
sin()
sqrt()
tan()

Performance considerations

By default EP Mode is not enabled because calculations are slower due to the extra processing needed to maintain precision, and this would have a detrimental effect on the performance of your applications if it were permanently enabled, so you should only use it when absolutely required.

You should also note that the EP Mode and Precision are set to their default values for each new request made to the engine (i.e each event or web-request).  This is to protect against situations where an error condition could force the engine to abort processing before the EP Mode settings could be reset, thereby leaving it in an undesired state (This is similar to the way UTF-8 mode works so that data integrity is preserved).

Controlling the precision level

The actual level of precision (i.e. the decimal places) is controlled by another setting, which is updated by the new SetEPModePrecision() function (The greater the precision the longer calculations will take to perform).   By default the precision is set to 32.

   decPlaces = GetEPModePrecision()
   Call SetEPModePrecision( 24 )

 Default EP settings

The default settings can be changed in the Application Properties dialog launched from the IDE Settings menu.

Further reading on floating point arithmetic

More information on floating point arithmetic can be found by following the links below:

Region Blocking

One small (but useful!) feature we added to the Basic+ editor was the use of “region blocks” to help with code organization in large programs. The blocks group together related sections of code under a descriptive name so they may be navigated and handled more easily (those of you who have programmed in other languages such a C# and C++ might be familiar with this concept already).

Essentially region blocks are simply a pair of statements ( “#region” and “#endregion”) that you insert before and after a block of code to define it, along with a name that describes the region.  Once you have done this the entire region becomes a “fold point”, so it can be folded to hide it, and it also appears as a “jump point” in the editor navigation dropdown so you can get to it quickly.

E.g.

#region ScrollMode

// Here's some code for handling the ScrollMode property in the FormDes etc...
onParseStruct_HandleScrollMode:
   if bitAnd( psPSStyleEx, PSSX_VIEW_SCROLLMODEPAGING$ ) then
      psWinStyle = bitOr( psWinStyle, WS_VSCROLL$ )
   end
return

// More stuff ....
#endregion ScrollMode

This now becomes a fold point in the editor:

Region Folding

Region Folding

And can be jumped to in the navigation dropdown like so:

Region Dropdown

Region Dropdown

So, if you do have some programs with large amounts of code hopefully this feature might help find your way around it quicker.

Assertions in Basic+

For those of you who have done any Java or C/C++ programming in the past, assertions may be a familiar programming construct. For those who have not, an assertion is simply a way of embedding tests in your programs to check that a condition is true: if the condition evaluates to false then the program stops to display a message informing you of the failure, and presents a set of choices for dealing with it.

Assertions are basically “sanity checks” that you can employ anywhere in your programs to ensure that the state of your data is as you expect it.  You should use normal error-handling code for errors you expect; you should use assertions for errors that should never occur.

The $assert() statement

In order to support assertions Basic+ a new statement, “$assert” has been introduced into Basic+. This takes a simple comparison expression as an argument like so:

$assert( a > 3 )

The statement above checks that the value of the variable “a” is greater than 3.  If so then the program continues as normal, otherwise an assertion is raised and the assert message displayed.

When passing expressions to the $assert statement ensure you keep them simple.  The $assert is effectively turned into an “if () else” statement when it is compiled, so the resulting runtime code from the example above looks like this:

if ( a > 3 ) else <show assert message>

Therefore you can only pass what you would legally be able to pass to the normal “if” statement.

The $assert_here statement

This statement is an extension of the normal $assert statement and is used to simply assert without testing for an expression.  This is commonly used in an “end else” clause when you enter an unexpected code branch.

It has two forms:

  • $assert_here
  • $assert_here( <text> )

The latter form takes a text string (you don’t have to quote it) like so:

if bLen( winID ) then
   // All good
end else
   // We can't get here amiright?
   $assert_here( winID is null - how did this happen? )
end

Failed assertions

If the assertion fails you are presented with a message that looks like this:

Assertion message

Assertion message

It tells you the program name, the line number and the expression that caused the failure and gives you three options for processing:

  • Abort – This stops all program execution and returns the system to an idle state.  Just as if you had hit the debugger and then immediately closed it.
  • Debug – This loads the debugger at the point of the assertion.
  • Ignore – This continues the program execution as normal.

There is also a checkbox that allows you to turn off all assertions for the rest of the session.

Note that if you are running a program from the system monitor the assertion dialog looks slightly different as we rely on the Windows MessageBox function to display the error instead:

Assertion MessageBox

Assertion MessageBox

As you can see, the “Debug” button is replaced by a “Retry” button, but they both perform the same function.

Currently assertion messages will not display outside of event context, due to the fact that the message needs a UI to display, and if called from something like a web-server this would be problematic.  In this case assertions are ignored.

Disabling assertions

Assertions present a tidy way to deal with some problems without crashing straight into the debugger, but even so you may not not want your customers to see them in your released code.  In this case you ensure they are never executed by using one of the following methods:

Disabling assertions with the SetDebugger() function

The SetDebugger function can be used to programmatically turn assertions on or off at runtime using its “ASSERT” method.  This can be used when your application is started, and then turned on later for diagnostic purposes if you wish.

$insert logical

// Turn off assertions at runtime
call SetDebugger( "ASSERT", FALSE$ )

// Turn assertions back on
call SetDebugger( "ASSERT", TRUE$ )

Disabling assertions with the compiler

You can also disable assertions when compiling, in which case the checks are never included in the object code.  To do this simply use the NOASSERT token with the #define statement like so:

#define NOASSERT

You can add this to individual programs as you wish,  or to an IDE build configuration, or to the compiler IFDEF list itself if you call it manually.

(Disclaimer: This article is based on preliminary information and may be subject to change in the final release version of OpenInsight 10).

BLen is the new GetByteSize

As one observant commenter noticed in our last post there’s a new Basic+ function called “BLen” in the version 10 compiler.  This is simply a synonym for the standard GetByteSize function, and was added to:

  1. Save me some typing effort (very important)
  2. Fit in with some of the other binary functions like BRemove and BCol2.

Of course, you may be wondering why GetByteSize/BLen is being used so much that I got tired of typing it?  It’s simply that as we progress through the v10 codebase we’re updating the code to be  “UTF8-safe” – i.e. we’re aiming to ensure that we don’t lose any performance when running in UTF8 mode, and a common Basic+ programming pattern for detecting a non-null variable is this:

   If Len( someVar ) Then
      // Variable is not null
   End

Variables in Basic+ are length-encoded, i.e. they cache the number of bytes that they occupy in memory.  When running in ANSI mode the Len statement simply returns this number (because 1-byte always equals 1 character) so if it’s zero you know you don’t have any data. However, because UTF8 is a multi-byte character-encoding format, the Len statement in UTF8-mode has to scan the contents of the entire variable to count the number of characters – it can’t use the cached byte-count.  This means that a simple check with Len could trigger this counting process when all you really want to know is if the variable contains data, and this could impact performance when dealing with large strings or arrays.

So, the best option is to use GetByteSize rather than Len, which always returns the cached byte-count regardless of ANSI or UTF8-mode, but as I don’t like typing very much you can now use BLen instead.

If you’re interested in writing UTF8-safe code and you’re not familiar with the Basic+ binary functions, you can find more details on them in a series of posts I wrote a few years ago on the Sprezzatura blog.  You may also want to check out the Internationalization section in the OI Coding Standards document too for some more UTF8-mode hints and tips.

(Disclaimer: This article is based on preliminary information and may be subject to change in the final release version of OpenInsight 10).

 

 

 

 

 

Object Notation Syntax

The OpenInsight event compiler supports an enhanced “shorthand” syntax for working with the Presentation Server object model, much like that provided in standard Basic+ for use with OLE objects.  Like the OLE notation, this provides a more natural API to working with properties and methods, rather than the relatively verbose and flat interface provided by the familiar Get/Set_Property and Exec_Method functions.

In a nutshell, object notation provides the use of a special “->” operator to allow an object to refer to its properties and methods, along with the “{}” operators to specify object or property indexes. It may be used in place of the following function calls:

  • Get_Property
  • Set_Property_Only
  • Exec_Method

Using Properties

The general format for accessing properties via object notation is illustrated below. In all cases objects that support sub-objects (such as controls that support an IMAGE sub-object) may reference the sub-object by suffixing them to the main object with a “.” character as a delimiter.

Get_Property syntax

  value = object->property                        ; // Non-Indexed
  value = object{index}->property                 ; // Object-Indexed
  value = object->property{index}                 ; // Property-Indexed

  // With sub-object support
  value = object.subObject->property              ; // Non-Indexed
  value = object.subObject{index}->property       ; // Object-Indexed
  value = object.subObject->property{index}       ; // Property-Indexed

Set_Property_Only syntax

  object->property = value                        ; // Non-Indexed
  object{index}->property = value                 ; // Object-Indexed
  object->property{index} = value                 ; // Property-Indexed

  // With sub-object support
  object.subObject->property = value              ; // Non-Indexed
  object.subObject{index}->property = value       ; // Object-Indexed
  object.subObject->property{index} = value       ; // Property-Indexed

Where:

  • object is either:
    1. An equated constant (suffixed with a “$” symbol), or
    2. The contents of a variable (prefixed with an “@” symbol), or
    3. An embedded name (prefixed with the “$” symbol), or
    4. A path prefix (prefixed with the “.” symbol to represent the name of the object’s parent window (i.e. “@Window”)
  • property can be an equated constant, the contents of a variable, or an embedded name. It may also be the special token “@@” which means use the DEFPROP property.
  • index is either a one or two dimensional index value, delimited by a “,” character and surrounded by curly braces.

Get_Property examples

 // Get_Property object notation using variable contents
 CtrlID = @Window : ".MY_LISTBOX"
 
 // PropVal = Get_Property( CtrlID, "TEXT" )
 PropVal = @CtrlID->Text
 
 // PropVal = Get_Property( CtrlID, "LIST", 4 )
 PropVal = @CtrlID->List{4}
 
 // PropVal = Get_Property( CtrlID, "LIST", ItemIdx )
 ItemIdx = Get_Some_Index()
 PropVal = @CtrlID->List{ItemIdx}
 
 // PropVal = Get_Property( CtrlID, "DEFPROP" )
 PropVal = @CtrlID->@@
 
 // PropVal = Get_Property( @Window, "TEXT" )
 PropVal = @@Window->Text
 
 EdtID   = @Window : ".MY_EDITTABLE"; Col = 2; Row = 3
 
 // PropVal = Get_Property( EdtID : ".CELLS", "TEXT", Col : @fm : Row )
 PropVal = @EdtID.Cells{Col,Row}->Text
 // Get_Property object notation using a path-prefix 
 
 // PropVal = Get_Property( @Window : ".MY_LISTBOX", "TEXT" )
 PropVal = .My_ListBox->Text
 
 // PropVal = Get_Property( @Window : ".MY_LISTBOX", "LIST", 4 )
 PropVal = .My_ListBox->List{4}
 
 // PropVal = Get_Property( @Window : ".MY_EDITTABLE.CELLS", "TEXT", 2 : @fm : 3 )
 PropVal = .My_EditTable.Cells{2,3}->Text
 // Get_Property object notation using equated constants  
 Equ CTRLID$ To "MYWIN.MY_LISTBOX"
 
 // PropVal = Get_Property( CTRLID$, "TEXT" )
 PropVal = CTRLID$->Text
 
 // PropVal = Get_Property( CTRLID$, "LIST", 4 ) 
 PropVal = CTRLID$->List{4}
 
 Equ EDTID$ To "MYWIN.MY_EDITTABLE"
 
 // PropVal = Get_Property( EDTID$ : ".CELLS", "TEXT", 2 : @fm : 3 )
 PropVal = EDTID$.Cells{2,3}->Text
 // Get_Property object notation using an embedded name
 
 // FocusID = Get_Property( "SYSTEM", "FOCUS" )
 FocusID = $System->Focus 

 // PropVal = Get_Property( "MYWIN.MY_CONTROL", "TEXT" )
 PropVal = $MyWin.My_Control->Text

Set_Property_Only examples

 // Set_Property_Only object notation using variable contents
 CtrlID = @Window : ".MY_LISTBOX"
 
 // Call Set_Property_Only( CtrlID, "TEXT", PropVal )
 @CtrlID->Text = PropVal
 
 // Call Set_Property_Only( CtrlID, "LIST", PropVal, 4 )
 @CtrlID->List{4} = PropVal
 
 // Call Set_Property_Only( CtrlID, "DEFPROP", PropVal )
 @CtrlID->@@ = PropVal
 
 // Call Set_Property_Only( @Window, "TEXT", PropVal )
 @@Window->Text = PropVal
 
 EdtID = @Window : ".MY_EDITTABLE"; Col = 2; Row = 3
 
 // Call Set_Property_Only( EdtID : ".CELLS", "TEXT", PropVal, Col : @fm : Row )
 @EdtID.Cells{Col,Row}->Text  = PropVal
 // Set_Property_Only object notation using an embedded name
 
 // Call Set_Property_Only( "SYSTEM", "FOCUS", focusID )
 $System->Focus = FocusID
 
 // Call Set_Property_Only( "MYWIN.MY_CONTROL", "TEXT", PropVal )
 $MyWin.My_Control->Text = PropVal
 // Set_Property_Only object notation using a path-prefix 
 
 // Call Set_Property_Only( @Window : ".MY_LISTBOX", "TEXT", PropVal )
 .My_ListBox->Text  = PropVal
 
 // Call Set_Property_Only( @Window : ".MY_LISTBOX", "LIST", PropVal, 4 )
 .My_ListBox->List{4} = PropVal
 
 // Call Set_Property_Only( @Window : ".MY_EDITTABLE.CELLS", "TEXT", PropVal, |
 //                         2 : @fm : 3 )
 .My_EditTable.Cells{2,3}->Text = PropVal
 // Set_Property_Only object notation using equated constants
 Equ CTRLID$ To "MYWIN.MY_LISTBOX"
 
 // Call Set_Property_Only( CTRLID$, "TEXT", PropVal )
 CTRLID$->Text  = PropVal
 
 // Call Set_Property_Only( CTRLID$, "LIST", PropVal, 4 ) 
 CTRLID$->List{4} = PropVal
 
 Equ EDTID$ To "MYWIN.MY_EDITTABLE"
 
 // Set_Property_Only( EDTID$ : ".CELLS", "TEXT", PropVal, 2 : @fm : 3 )
 EDTID$.Cells{2,3}->Text = PropVal

Using Methods

The general format of the Exec_Method object notation is described below.  It may be used to execute the method as a subroutine (i.e. no return value) or as a function.

Exec_Method syntax

  object->method( arg1, arg2, … argN )            ; // Call as subroutine
  result = object->method( arg1, arg2, … argN )   ; // Call as function

Where:

  • object is either:
    1. An equated constant (suffixed with a “$” symbol), or
    2. The contents of a variable (prefixed with an “@” symbol), or
    3. An embedded name (prefixed with the “$” symbol), or
    4. A path prefix (prefixed with the “.” symbol to represent the name of the object’s parent window (i.e. “@Window”)
  • method can be an equated constant, the contents of a variable, or an embedded name.

Exec_Method examples

 // Exec_Method object notation using variable contents
 CtrlID = @Window : ".MY_LISTBOX"
 
 // Pos = Exec_Method( CtrlID, "INSERT", -1, Item )
 Pos = @CtrlID->Insert( -1, Item )
 
 // Call Exec_Method( CtrlID, "DELETE", 4 )
 @CtrlID->Delete( 4 )
 // Exec_Method object notation using a path-prefix 
 
 // Pos = Exec_Method( @Window : ".MY_LISTBOX", "INSERT", -1, Item )
 Pos = .My_ListBox->Insert( -1, Item )
 
 // Call Exec_Method( @Window : ".MY_LISTBOX", "DELETE", 4 )
 .My_ListBox->Delete( 4 )

 // Call Exec_Method( @Window : ".MY_EDITTABLE", "APPEND", RowData )
 .My_EditTable.Rows->Append( RowData )
 // Exec_Method object notation using equated constants
 Equ CTRLID$ To "MYWIN.MY_LISTBOX"
 
 // Pos = Exec_Method( CTRLID$, "INSERT", -1, Item )
 Pos = CTRLID$->Insert( -1, Item )
 
 // Call Exec_Method( CTRLID$, "DELETE", 4 ) 
 CTRLID$->Delete( 4 )
 // Exec_Method object notation using an embedded name
 
 // RetVal = ( "SYSTEM", "CREATE", createStruct )
 RetVal = $System->Create( createStruct )
 
 // Call Exec_Method( "SYSTEM", "DESTROY", ctrlID )
 $System->Destroy( ctrlID )

Using Object Notation in Stored Procedures

Object Notation was originally designed for use with the event compiler, and therefore prior to version 10 could only be used with event scripts.  In the current version however, it may be used in Stored Procedures by including the event pre-compiler in the compilation chain.  This is done by adding the following at the top of the program before the other statements:

  #Pragma PreComp Event_PreComp

You should also declare the following functions before you use any object notation – the pre-compiler does not insert these into the program itself:

  • Get_Property
  • Exec_Method

E.g.

 Compile Function MyWin_Events( CtrlEntID, Event, Param1, Param2 )
 
   #Pragma PreComp Event_PreComp
   
   Declare Function Get_Property, Exec_Method
   $Insert Logical
   
   Locate Event In "CREATE,CLICK,CLOSE" Using "," Setting Pos Then
      On Pos GoSub OnCreate,OnClick,OnClose
   End
   
 Return RetVal 

   // ... etc ... 

Unlike in previous versions this object notation may also be used safely with OLE object notation in the same Stored Procedure.

Object Notation limitations

The current version of Object Notation is currently handled by a pre-compiler, rather than the actual Basic+ compiler itself, thus its parsing accuracy is somewhat limited in comparison.  Because of this, the following guidelines should be adhered to:

  1. The passing of complex expressions to the object notation Set_Property_Only and Exec_Method statements should be avoided; It is better to resolve them to a variable first, and then pass that variable as an argument instead.
  2. The curly-brace Calculate operators (“{” and “}”) are also used to resolve the value of a dictionary column at runtime, and should not be used on the same line as an object notation statement: These operators are interpreted as object or property index tokens instead, and will lead to parsing errors if used incorrectly.

Object Notation troubleshooting

Behind the scenes the pre-compiler converts the object notation syntax to actual Get_Property, Set_Property_Only and Exec_Method calls before passing them to the Basic+ compiler.  If you use object notation and run into problems that you cannot resolve easily you can see exactly what gets passed to the compiler by using the Output compiler directive, which will write the pre-compiler output to a specified record.

To enable this functionality, place the following statement at the top of your program (before or after the Event_PreComp statement), and replace <table> and <record> with the table and record names of your choice:

   #Pragma Output <table> <record>

E.g. Send the output to the PRECMP_OUT record in SYSLISTS

   #Pragma Output SYSLISTS PRECMP_OUT

(Disclaimer: This article is based on preliminary information and may be subject to change in the final release version of OpenInsight 10).