Index fundemantals

Question : I just would like to understand a bit what is the difference between DBFNSX and SIXCDX, please. Are not  they the same thing?

Answer : I described it few times in the past.

Lets forget about SIXCDX – it’s slighlt modified DBFCDX not worth to talk about.

DBT, FMT, SPT are different MEMO formats. All of them are supported by Harbour and automatically recognized when DBF file is open. NTX, CDX and NSX are different index formats. They can be used in any combinations of MEMO formats, i.e. DBFCDX perfectly well works with DBT memo files just like with FPT and SMT ones.

In Harbour all core RDDs using above index formats (DBFNSX, DBFCDX, DBFNSX) have nearly the same functionality which cover nearly all index feautres known in xbase world and many of them are unique to [x]Harbour only so they are not supported by other drivers.

With all above RDDs user can use all ord*(), db*(), sx_*(), hsx_*() , … functions, can create multitag indexes (many orders in single file, also for NTX format), autooreder, autoopen, production indexes, etc. so for programmer used RDD should not create any difference. It’s also possible to enable disable some features using RDDI_* interface, i.e. this code change default DBFNTX behavior so it behaves just like DBFCDX and even uses “.cdx” as default file extnesion (of course internally it’s still NTX format with Harbour extenssions – we support CTX format from CLIP)

 // default index extenssion 
 rddInfo( RDDI_ORDBAGEXT, ".cdx", "DBFNTX" ) 
 //support multi tag in single index file 
 rddInfo( RDDI_MULTITAG , .t. , "DBFNTX" ) 
 // structural indexes support 
 rddInfo( RDDI_STRUCTORD, .t. , "DBFNTX" ) 
 // record number is hidden trailing part of key duirng sorted 
 rddInfo( RDDI_SORTRECNO, .t. , "DBFNTX" )

There are only few minor excpetions rather unimportant for most of users unique to given RDDs. The most important are two:

1. I implemented dynamic unique indexes only in DBFCDX It means that ordUnique( ,, .t. ) -> can enable/disable unique mode only in DBFCDX (and SIXCDX)

2. Only in DBFNTX and DBFNSX I implement special mode which allows to use page numbers instead of file offsets in index pages. In this mode indexes are not binary compatible with other languages but their maximal size has been greatly extended and for NTX and NSX files is 2^32 * index_page_size what gives 2^42 for default 1024 pages in this formats – 2^42 is 4TB I haven’t implemented it in DBFCDX so far and for this format maximum index size is still 4GB. Maybe in some spare time I’ll do that and also add support for different page sizes. BTW in ADS .adi indexes are slightly modified CDX files where page numbers are used instead of offsets and index page size can be changed.

Of course there are very serious differences in low level implementation and structures used by these formats.

NTX is simple BTREE without any compression. The operation are extremely fast but indexes are much bigger then in CDX or NSX format so performance is usually storngly reduced by cost of IO operations. Anyhow theoretically having very strong server application with a lot of RAM so all data are accessed from memory not from harddisks this is the best choice.

CDX and NSX compress leaf nodes so total size of index files is much smaller then in NTX format. NSX uses simple BTREE when CDX uses three of the most significant keys. It means that update in CDX files can be more expensive then in NSX format especially if we are adding keys which should be sorted as last (it’s the most common situation, i.e. when we are adding records with current date) because all nodes from leaf to root have to be updated. All keys are repeated in leaf nodes and there are internal bindings for nodes on the same level (BTW some RDDs like SIX3 do not update them correctly for interior nodes) so CDX format is also a little bit redundant. Anyhow all keys are repeated in leafs nodes and we have internal bindings between all leaves so skipping can be a little bit faster, etc.

Page size in CDX is smaller then in NSX so for very long keys, i.e. over 100 bytes NSX format should be much more efficient. It also uses different compression method which should be better for keys having long space substrings inside, i.e. due to concatenation of few longer fields like: FNAME[40] + LNAME[40]

In general this is to big subject to describe it here in few words.

In include/hbrddnsx.h I’ve made some small description of NSX format when I was implementing it.

best regards,
Przemek
Source : https://groups.google.com/d/msg/harbour-devel/9nT9lZmtztk/Q3X-s81UpYYJ

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s