@Black Cipher Box

A cryptographic framework for web applications and file automations.

Project changelog

@may 2022; the Black Cipher Box engine has been fully integrated in our development framework and started trickling into staging environments. Work is now proceeding on extending the functionality to file cipher constructs.

@april 2022; beta-testing and documentation revising. Polishing the details of the implementation as we're testing and discovering minor hiccups. The engine performs relatively well for ciphering/deciphering (x10 slower than our previous engine, but that was totally expected), we're now testing the hashing functions.

@beta version, march 2022 and tidbits making it into production already.

Rationale behind this project

This implementation is a prototype meant to solve a couple of problems that I find troubling for proper crypto adoption within the enterprise. It basically attempts to attack full-front the issue of key-handling for programmers.

It basically abstracts the key concepts from the programmer's tasks and moves it to the backend, while offering application-level cryptographic services.

Its extremely different from the typical cloud-based key services, in that rather than exploiting hardware level ciphering to simplify programmer's lives, this engine assumes the infrastructure can never be secured against exploiters. By ciphering at an application layer we also gain the benefit of;

handling key material at the application level (no need to talk to BIOS)
permitting the secure transfer of ciphered content, in its original ciphered content, along with the keys
delegating the security "seed" where it belongs, with security technicians and automated processes alike

I found the prototype so sweet in its simplicity that I decided to integrate it in a flash.

Key management

Keys should typically have the following attributes in an enterprise;

be transferable to another employee on a shift change
be recoverable by the IT
not cause service delays or issues due to key handling
be upgradable, either through parallel scripts or in-engine algorithm & key selection mechanisms
support different types of constructs (secret, public, anonymous, multi-key sigs & ciphers, time locks, etc...)
be relatively agile in their devops cycle, meaning that if you find an exploit it shouldn't cost your team 3 months to upgrade.

A mechanism to forge & distribute keys

I think it's essential to consider a strong mechanism to secure the creation and distribution of keys.

Furthermore, key revocations are an essential element to a key management infrastructure. -Delete does not exists !-

Keeping keys apart from the ciphered data

As a basic recommendation in the many books on cryptography, we should never keep the keys right next to the data. That makes sense. But what a chore this can translate into for a programmer ! :O

It is therefore essential to kick the right motions into play, and start considering the quantity of keys that are really required (so many it makes most programmers dizzy) and jump right into the fact that as a programmer, we have to manage Key Rings, or Wallets of Keys from the start. To this effect, I decided to transfer some of that complexity into the class objects themselves. (And this becomes another reason for the Black Box reference in the name.)

In this paper, our approach involves ciphering the keys for storage, albeit in the same database. A master password is used to cipher those keys, and that password remains at the application level for now. Therefore, we're complying with this requirement by storing Passwords and Keys in different locations, and in different formats and accessibility.

Later on, it will be possible to pass the storage of the Master Password to a network-based Key service, with its own ACL filtering.

Wallet of Keys

So the idea is to have a cryptographic engine handle a Wallet of Keys internally, and exposes a pair of methods for loading and unloading keys from the protected Wallet. Programmers shouldn't have to handle the keys or the ciphering constructs themselves, this task is transferred to the class which takes care of ;

implementing a Key Identification process to the key generation procedures
storing and indexing Keys based on this identification in an easy to manage location
figure out the appropriate class objects to instantiate for the loading of keys (contextually)
annex the key identifier in the cipher/decipher methods when handling ciphered content, the decipher method can thus determine which key to load and load them just in time for the work to be done, in a secure fashion.
maintaining the keys in a hardened memory location during usage until destruction

Key Identifiers

So, in order to identify keys in ciphered messages (which may be relatively short and many in some environments like a database), we opt for a systematically short identifier. We could make it shorter by a couple of bytes at this point, but I want to be on the safe side when it comes to file crypto and magic bytes.

Technically, each ciphered blob will contain this extra key identifier (and another layer of identifying data as we wrap everything up), so we have to make it count.

The construct is as follows : Key-class-id.Key-Ident

Key-class-id is a 4 byte construct that should be in the following table (this could change, still prototyping, remember that note about files and magic bytes.)

name	\x Hex class identifier	details
previous key constructs "Sodium Keys"	\x05\x01\xdb\x1a	implemented in previous web apps, and mentioned here because we need to upgrade from those. Our efforts are therefore to include this particular key.
BoxedKey	\x01\x01\xbb\x10	Symmetric keys protected by a password. These passwords are to be kept in the include/conf-*.php in the web apps. As it is right now, lose the password, lose the keys associated with it. These should function with our old constructs, and the more recent ones based on Symmetric Crypto Boxes ( ref: https://libsodium.gitbook.io/doc/secret-key_cryptography/encrypted-messages ) for database and variable storage.
PrivateKey	\x01\x01\xbb\x40	Asymmetric keys protected by a password. These are key pairs of Private/Public keys. Same rules apply to the password. These keys are used by CryptoBox ( ref: https://libsodium.gitbook.io/doc/public-key_cryptography/authenticated_encryption ) and CryptoBoxSeal ( ref: https://libsodium.gitbook.io/doc/public-key_cryptography/sealed_boxes ) constructs. For private messaging and our "black box" data collectors respectively.
PGP keys	\x10\x10\xVersion\xMinor	it is possible to handle third party keys in our engine, by developing an extended class and wrapping the desired key system with our own. Result will be a ciphered box containing a third party key (ciphered on their side or not, we don't really care.)
SSH Keys	\x10\x20\xVersion\xMinor	idem
IKE Keys	\x10\x30\xVersion\xMinor...	idem
3rd party certs	\x10\x.. to be determined..	idem...
FileStreamKey	\x20\x01\xbb\x01	Symmetric keys associated with file objects. Implemented from Sodium File Stream constructs using the POLY1035CHACHA algorithms.

Key-Ident should therefore be a fixed size for the corresponding identifier category. In practice we also note that the crypto engine version (the wrapped methodology handling the cipher constructs that follow) really dictates how the Keys are integrated and managed at the application level. In our research we can more easily solve and program an application's cryptographic framework by containing the complexities between the storage, key-handling and UX interfacing if the application's framework is built on said cryptographic framework, where the programmer doesn't need to pre-determine the keys to use.

Key constructs

(new section, May 23rd 2022)

Keys are therefore compiled, or constructed, in a pre-determined fashion which we lay out in the following way:

Key-Class-ID | Key-Identifier | NONCE | ciphered-key-content | hash

segment	length	notes
Key-Class-ID	4 bytes	detailed in the section above
Key-Identifier	(originally 4 bytes), but now 20-32 bytes?	see our discussion concerning Key-Identifier revisions below
NONCE	24 bytes, algo dependant	24 bytes for SODIUM_CRYPTO_SECRETBOX_NONCEBYTES
ciphered-key-content		actual key material, ciphered with the Master-Password and NONCE
Hash	16 bytes	sha256 hash of the Class-ID\|Key-Identifier\|NONE\|encrypted-key

We implement our own Hash on the construct so as to provide a minimum of authenticity to our key structure. And I realize that it might not be optimal contextually, but it serves to provide a modicum of validation when passing, storing and retrieving keys. A minimal protection against simple corruption really.

To implement a proper HMAC, we would need to use an authenticated Hash method rather. But tagging our keys with an HMAC forces us to introduce portability in the HMAC keys as well if we want to be able to port keys between different applications. At the very minimum, the keys used to calculate the HMAC would need to be public keys that can be shared in our engine as well, putting a different burden on our master passwords.

The Key-Identifier requires a bit more investigation. After some beta testing in different sites (sites that need to share keys between themselves), I realized that the serial generation of the Identifier field is source to some portability problems. Since invariably, starting our identifier series at 1 generates a conflict for merging keys from different application systems. For this reason I'm considering using the sha-224 hash of the deciphered key data as identifier. It results in an 224-bits string though, getting a bit long for data fields in a database. (its 28 chars, in binary form, maybe 32 chars in string format.)

Lately I've also been considering adding a reference to the Master-Password in the key constructs. Perhaps in the form of its hash function. The usefulness would be to be able to identify the Master-Password, if one knows the original password itself. But .. since it looks the Master-Password might need to be looked up, and retrieved in order to implement proper key portability, I'm reconsidering the entire concept of Master-Passwording in itself at this point.

Key storage

In our current implementation we're relying on a simple relational database to store and manage our Keys. I've considered different scenarios like using a NoSQL database, a Message Queue derived REST framework, implementing a Network-based Key Management System and the Cloud KMS solutions. But for our practical purposes in the office, the relational database simply makes more sense, as it doesn't add any considerations to our current resiliency plans. (see Backups and Redundancy). (Yes, I'm a bit biased towards SQL myself. It's quite useful when you got a good grasp of your query optimizations, and for the fact that we can manage a central cluster of databases quite easily.) Plus, extending beyond SQL as the needs require it is a very simple operation.

Keys are compiled in a "safe" format within the Key class object itself (a method of the parent Key class which shouldn't need to be overridden) and saved to the database by the BlackCipherBox engine in a base64_encoded() format. The BlackCipherBox engine also extends the storage with its own annotations, a Key ID derived from the SHA3-512 hash of the entire key is used as primary key, and a serial Key IDENT is generated from the database and used during key generation.

The underlying compiled key is of course encrypted using an application level password. This "Master Password" is currently maintained in clear text format in the web application's configuration file.

Keys remain somewhat portable, they can be migrated from one web application (in our current implementation form) to another web application as long as the Master Password corresponding to it is maintained. Currently we haven't programmed the facilities to supply a Master Password in our integration, as it is part of our global configurations. For the time being, suffice to remember this detail and make sure web applications sharing their keys should also share their Master Password. I foresee programming a facility for exporting Wallets of Keys while assigning a new Master Password and re-ciphering the Keys. Easy peasies normally.

Master Password

Here lies the crux of the security hierarchy. I've decided to use the config variable approach to storing the password in our web application frameworks because in general we don't have to worry too much about exploits reaching these configuration files. But this is very specific to our environment, its protection and its avoidance of public access. There's a whole philosophical discussion to be had on the subject as well.

Considering that we're contextually working with web requests, and that our application frameworks are angled towards REST interfaces (for the convenience of the administrative and programming tasks), an obvious solution would be to converge on the usage of a network based KMS system. But... here's the rub. Implementing a (packaged) KMS system leaves us with very little options. Options which we're attempting to approach from an Open-Source perspective are quasi-nil.

Because we're facing some serious legal and contractual constraints at the enterprise level.

So, point #1 ) I feel it would be a mistake to store this master password anywhere ELSE than our internal systems at the current time. We still have to figure out the legal complexities on paper. Choice of a solution would also constrain our development processes.

The other issue with all these cryptographic constructs is that the usage is relatively expensive in terms of web requests. Considering that an average request requires around 200 to 300ms to generate on the server, the cryptographic layer adds an additional 100ms as it is right now. Basically it represents a good 1/3 of a request's processing.

By having the Master Password on the network, gated from the network with its own ACL logic would add another 100-150ms to a request's overhead. (If we are to deploy a simple non-optimized prototype that is. It could be optimized by the gazoo in the range of the 10-50ms using speedy or dedicated hardware.)

But comparing a 150ms network call to a sub-2ms variable declaration is where we realize that the optimization room will be aplenty. :)

branded or packaged KMS solutions

Searching and scouring for adequate Key management solutions, for some reason, is quite arduous. The subject matter has been greatly distorted with the years, snake oil merchants fester search results, security companies overload us with white papers that fail to detail "how" their key management solutions can integrate with an enterprise's services, and if we look at the other side of the equation, attempting to surmise what our deployed services CAN use in the commercial realm, all we find in the common denominator section is AWS KMS.

The issue with AWS KMS is that implementing it is actually a lead developer job. The lead must then mentor the rest of the team on how to use the AWS engine, code requires adaptation, and one must implement the AWS API somehow in internal libraries. That's a 2years+ integration off-the-bat.

In our corporate environment, with our security hardening constraints, developers are required to stay on top of security issues contained in their managed code. Adding an AWS API client to the mix is a project in itself. And a project with proprietary flavors acting as a disruption on internal development. Eventually the developers are biased towards AWS and start showing up at my desk with prototypes built on the cloud. Which, contractually, we cannot support.

Key matters are crucial matters for the security architecture. And the security architecture is now directly shaped by the legal context coming at us from all sides.

It would be senseless to lose track of how keys are generated, stored and managed in the cloud when our contractual obligations mandate us to fully grasp our cryptographic approaches. (And if they further impose geographical constraints).

HSM and TPMs

I shall shortly draw your attention to these technologies, as to me, they represent a better improvement over using a network-based KMS solution. HSM (which stands for Hardware Security Module) and TPMs (Trusted Platform Modules) normally sit inside the hardware box. TPMs are capable of generating new keys from a base seed, at the hardware level, and are capable of providing said keys to the BIOS of the system. HSMs and TPMs are logically equivalents, from a programmer's perspective, but HSMs present something of a quirk for the desperate. They can be constructed using Virtual Machines. (Not a NIST recommendation, but a usable fallback in case of the absence of physical hardware for the task, according to the NIST.) Personally I prefer to dive into TPMs because they are supported by our hardware park, and our preferred OSes.

The only gotcha I got with these is that they contain a seed key which can be hard to overwrite. Normally the seed key on these devices should be randomly generated on first usage, and our system should figure a way to poll that information (obtaining its public key for example) to knit it into the security hierarchy.

HSM technology should also have a facility for writing a seed key coming from our central management system in order to derive local keys which can then be validated against the centrally-generated seed key. This would come in the form of chipped cards or devices that we can format on-site.

TPMs are devices that are permanently connected to the motherboard, whereas HSMs are devices that are external to computers, but can be connected to the computers to process crypto messages and keys.

Whether we Pull or Push the seed key will dictate how we validate and derive new keys over time in our environment. And this is the devilish detail.

The devil in the cipher details

One example implementation of HSM is the Yubikey device.

HSM devices use and integrate at the PKCS#11 certificate level, I've found the necessary arguments in OpenSSH 8.6p1+, PKCS11Provider .

The HSM device integrates with OpenSSH's agent to forward the cryptographic materials in the message exchanges.

Some details can be gleaned in this page : https://jpmens.net/2021/06/16/ssh-with-a-smartcard-hsm/

Issue with this is that the HSMs have nothing to do with our application-level cryptographic approaches which execute in web interfaces. For this our best approach is using TPMs in the servers.

In order to integrate HSMs, we'd need to bridge the HSM devices through web interfaces (at which point javascript presents its own range of constraints to access these devices, I think.) TPMs keep the complexities on the servers, and leaves the User-ACLs to the web programmers. I know this sounds a wee-bit overtechnical at this point.

But the detail that rises on top of the water on this iceberg is that TPM and HSM keys cannot be directly overwritten or set (no support for pushing data unto the devices). They can only be read from their public properties and integrated in the central KMS using said public key counterparts. (Thus, deriving a key for this user on the server would mean deriving a private/pub key from a public key. I remember seeing some gotchas in the cryptocoin domain using this approach. Something about the back & forth generating from Public & Private material ends up revealing the original private seed to savvy hackers.)

Cryptographic Seeding

With the absence of TPM or HSM devices, our engine will by default depend on the libsodium libraries to generate the seeding material. Libsodium offers convenient methods to maintain memory and process isolation during the generation of seed material. Typically, the engine will handle its own key generation internally (automatically generating keys when necessary, and storing them in the central database location, wrapped in their own application-level ciphered constructs.)

In a future version, once we have our hands on TPM and HSM modules, we'll be looking at expanding our engine to make use of them. I believe TPMs should be easier to integrate than HSMs with their OS hooks. They could provide the required random seeding components as a replacement to using the libsodium seeding methods. Downside of using TPMs and HSMs though is that the secret material remains on the hardware and cannot be backed up to our central storage locations. It can also bind some datasets to specific hardware in case of application problems (loosing a decryption key for example, which would require moving the dataset to the original seed machine to decipher using the primary key.).

Cipher constructs

In my previous implementations when ciphering for database or web variables I would use a construct as such:

ciphered = nonce . sodium_crypto_secretbox( message, nonce, secret_key )

So it was assumed that the web application would use the same key for its entire dataset (or at least, we had to dedicate a key to each task, quite manually.)

Now, we plan on implementing it like this:

ciphered = engine-version . key-identifier . nonce . sodium_crypto_secretbox( message, nonce, secret_key )

And this allows us to mix keys of the same type, and even, of different types in our data sets.

Technically, our construct allows us to parse the entire message in an empirical byte-process which remains linear from the start of the ciphered message. Let me explain;

1) in a first step, the engine-version is read. It is composed of a 4 byte string, which should give us some room. The engine-version should be unique for each cryptographic engine revision (when changes are introduced in the ciphered message format for example), and when deciphering a message, the engine knows which branch to execute on the first 4 bytes.
2) second step consists in parsing the key-identifier, at this point, another 4 byte segment. Currently we base36_encode this numerical value to obtain a range of some 1.6million unique IDs. The engine now knows which key to retrieve from the storage system in order to decipher the message.
3) we then read the nonce which is a fixed size for the particular engine-version (by default it corresponds to the value of SODIUM_CRYPTO_SECRETBOX_NONCEBYTES defined in the Sodium library as an int 24 currently). Note; this parcel is currently in clear-text. But its entirely possible to further wrap this segment with a different keyed cipher.
4) the loaded key and nonce are then used to decipher the contained message.

Since the Sodium SecretBox does the HMAC insertions and verification for us, this current version doesn't need to implement it in this message construct. It's built-in with the ciphered content.

When using public cryptography, our construct differs a tidbit, as we rely on a different branch of libsodium to accomplish this, using sealed boxes. (https://libsodium.gitbook.io/doc/public-key_cryptography/sealed_boxes)

In practice, I find that keys are somewhat quite proprietary to their applied algorithm or service. I've personally never managed (or had the time to dig into) making my SSH use PGP keys, or my IKE service use the same Certificate Authority as my Web services. There's always a byte that differs somewhere down the line, and thus, perhaps dedicating and even routing keys to their corresponding services is not such a bad thing? Here, all we're striving to do is develop a >Programmer Wallet< that can store all kinds of keys for corporate management and proper procedures build by the same programmer.

The same cipher construct is used to wrap Keys, and wrap data snippets (using symmetric ciphering) destined for database storage, or config storage, as well as files.

Ciphering with an automatic rifle

(section revised on may 19th 2022)

As it is, with the Wallet ability that was built into the engine, we come across a little discovery where using a pseudo-random key from the Wallet to cipher data is actually cleaner code than pushing the pre-determination of specific keys to the programmer doing the higher-level coding.

So by default, the engine will pick a key from the available keys in the system, and use the one with the lowest usage and some pseudo-randomness. We've defined a number of constants that put limits in the algorithm, and this allows us to set a maximum number of reuse for individual keys, number of keys to load when requesting a key for ciphering, number of keys to maintain in the memory wallet, and so on. In theory the rifling ability presents a higher processing cost during ciphering, but considering the contexts it is quite acceptable. At reading time, when going through long lists of data that was ciphered with different keys, we might have some optimization surprises to take care of. But reasonably, by tuning the constants we should obtain pretty decent results.

We've recently optimized the rifling engine and improved both its performance and key toggling during ciphering operations. We've implemented a "local" limit to reusing keys (local in the sense of runtime / current script execution, we assume most of our scripts are of short lived natures like web requests and cron jobs.), and improved the loading time by preloading a number of existing and virgin keys. Our current settings preload 4 ciphering keys and reuses them up to 10 times within an execution. Following this implementation and our reality tests we've realized that our engine now needs a programmer-accessible method for controlling this rifling aspect; were he attempting to cipher a million records, we'd want to avoid cycling keys so aggressively. We would also assume the programmer knows what he's doing (ciphering a data table with the same key is feasible, under particular security constraints). But the goal of this project being to implement the most proper and recommendable methodologies for ciphering, key toggling should really come on a 1:1 basis. (Which is feasible by adjusting the parameters, of course.)

Technically, each new key encountered during deciphering requires a database trip, for now. So limiting the number of used keys within an application to a reasonable 5 or 10 should tremendously help with the performance. We also foresee the possibility of programming a method for loading all the necessary keys in one database call, after we've covered the philosophical aspects of such an approach.

We also foresee offering new Ciphering & Deciphering methods for the programmers that do desire to predetermine their keys. After all, we're attempting to provide a vertical approach to cryptographic keys.

Natural key rotations

(new section, 19th of May 2022)

Another side-effect discovered during our experimentations was the natural rotation of cryptographic material as users accessing the web applications were updating and modifying their data. Basically, presenting a field for updating will invariably re-cipher the data using a new key. It would be statistically very improbable for a particular field to be reusing the same key, just like its statistically very probable that the key used on a particular data field was also used on other data fields.

This poses a major challenge to our key usage counters. It requires that the deciphering and ciphering engines be connected (between independent requests, devilish detail), or integrated in a 4th/5th generation framework. (a framework where the database calls are encapsulated by the 3rd layer, such as DBTables, or painfully knitted into a 3rd gen framework like Wordpress or Laravel.) ... just nobody goes there yet... its just insane coding to maintain.

It might work better in a React application environment, where updating a field in the client -could- propagate changes back to the server in a more real-time fashion. It'd be possible at that point to transmit additional information (such as the previous value) and have the engine recover the previous key IDENT, decrement its usage, and recipher using a new key. (come to think of it, that is an interesting methodology that could fit better in 3rd gen frameworks.).

But again, this pesky detail, although brain-numbingly crazy, is better left for later considerations when we'll be attacking the Client cryptographic environment using the newest cryptographic browser APIs. (-may 2022)

Meanwhile, the effect of the natural key rotations is quite nice. And I think that my plans for rotating expiring key material in existing data sets would serve way better, considering that we can use that opportunity to compile, verify and certify some security parameters. (such as in an hardened security environment where these details need to be reported on during audits.) Technically our scripted key rotations will be quite capable of identifying the keys, whether they're expiring or not.

HKDF

More investigation is required in the realm of Key Derivation functions, because there would be a need for it when generating new keys. If we could introduce the notion of parenthood in the key rollout procedures (we could, and I think we should, because keys can and should always be traceable to a "Creator", unlike us, humans...), using HKDF, I think we could get rid of the chicken&egg problem with generating unique key IDs. (beyond doing a sha512 on a sha512 on a random string of 512 bytes...?)

If my memory serves me right, PHP has an HKDF function somewhere around their sha implementation.

( foray into HKDF is being kept on ice for now; April 2022. We have enough playground with our current constructs to keep us busy for a couple of months. :D )

Class Implementation Details

In this section we detail the modeling and hierarchy that goes in the class objects. Firstly I'd like to explain some of the reasoning that goes into this intended architecture.

The goal of the project is to centrally deploy a class object that can be reused by internal developers. It should make it easier for our developers to use cryptographic functions, it should ideally be seamless in their coding efforts, and it should introduce a minimum of additional brain numbing learning. Cryptography is a really complex subject which I've personally taken 40 years to truly master (and I'm somewhat of a genius.), so we can't expect new hires to fully grasp the cryptographic implications of software development from the get-go.

Also, in my model of "transferring complexity" for business purposes, it just sits better. Eventually internal developers will reach a mastery level making them capable of participating in the cryptographic development area, which would normally come a bit of time after their initial enrolment. We want people to be running data projects instead of rewriting complex and political modules.

So with this in mind, the engine currently takes care of:

Generating the appropriate keys for the task at hand, just in time, and automatically. (It will preemptively generate a set of additional keys when it detects its running out, according to our constant flags and rules inside the class.) The developers don't need to generate keys, ever.
Tagging ciphered content with key IDENTs that serve to locate and retrieve the exact key from the database.
Locating the appropriate keys when deciphering.

I think this is what makes the engine really useful already. :)

To instance the engine from PHP (outside of the described Core framework detailed later) we need to call it as such:

$CryptoEngine = new \BlackCipherBox\BlackCipherBox( 
    \CORE\QueryDB &$QueryDB, 
    \CORE\StrongString $Master_Password_Override = null
    );

Later, when we want to use the engine, we simply need to call its Hashing, Ciphering or Deciphering methods described below:

Hashing functions

In our previous implementations, we relied on a HMAC() hashing function to generate parallel values (for database indexing) for ciphered fields. The logic was that since these normally indexed fields required an alternative to maintain some sense in the indexing, we figured we could privately hash the data, in its deciphered form, to generate a keyed hash that can then be looked up in a btree index in the database. Although this resultant index cannot be sanely ordered (we couldn't produce a list ordered on the passwords or the ciphered emails, after all, it would reveal a bit too much.). The resultant data cannot be ordered, but it can be indexed in a b-tree fashion for quick lookups, say, on a email login prompt.

Hashing keys are stored as HashedKey types, but they are implemented as BoxedKey classes. (Just so we don't mix Hash keys with Ciphering keys later.)

In the BlackCipherBox implementation, I was considering splitting the hashing functions in 2 as such;

BlackCipherBox->generic_Hash($input_str), privately hash the data using a preset application-level key ($key_APP_HASH) from the config file.

QA Status: Testing, April 22. Functional May 22.

example:

$hash_value = $CryptoEngine->generic_Hash($input);

The function will return a String containing the resulting hash value. The hash value can be compared to a previously generated hash (on the exact same input, of course) using any string comparison. (It doesn't represent a security risk per say, because we do not use this method for passwords.)

BlackCipherBox->validate_generic_Hash($input_str, $hashed_string), validate that the hashed_string corresponds to the supplied input_str after the same generic_Hash() application. (we never had to use this before, since the hashes are used for db-lookups rather.) Might be useful one day, but its limited to validating against the app-level $key_APP_HASH key. Generally we just compared hashes as simple strings and begone with it. This is very different from Password hash validations though, which we now delegate to the Password() functions in PHP directly. (They look to be up to standards nowadays.)

* BlackCipherBox-> private_Hash($input_str, $keyIDENT), privately hash a data string, but this time, using a key chosen by the programmer. This would allow the creation of keys at a record-level, and up to the developer to re-use such a key to hash record-related data (for validation purposes for example.) Benefit would be that it could accept a pre-determined key in order to build an externally validated hash. (Our generic_hash() would then remain private to the application that uses them, and the private_hash() would allow data portability between apps.)

Ciphering function

BlackCipherBox-> cipher( $input_str [ , $keyIDENT ] ) : string

Cipher an input string, optionally using a specific key (which can be Private or Boxed, provided in base64 string format, or as a key IDENT string, or as an actually loaded key.)

By default, the method will look up and use any available key from the database, prioritizing the one with the less usage in the (default) universe of "webapps" keys.

QA Status: Load tested, march 22. Functional April 22.

example:

$ciphered = $CryptoEngine->cipher($input_text);

the method returns a String which can be transmitted or stored in a database. (because the content is ciphered after all, it doesn't require memory protection anymore.) The input can be a String or a StrongString. (We recommend using StrongStrings when the input in cleartext would be too sensitive.)

* BlackCipherBox-> cipher_File( $input_filename, $output_filename [ , $keyIDENT ] ) : boolean

@todo ! We never used this internally, but we're looking at implementing it now for some file services. This builds on our R&D constructs BlackCipherBox/FileCipherCryptoBox*.

With a special twist; we also tag an extra header with the Crypto Engine version + Key IDENT, along with the Sodium headers that come with the Sodium SecretStream_XCHACHA20POLY1305 construct, so an extra 32 bytes per file is to be expected.

QA Status: Concept tested, feb 22. integrated in the class object, may 22, load tested, may 22.

Deciphering function

BlackCipherBox-> decipher( $ciphered_str ) : StrongString

Decipher a previously ciphered string (using the above method!). Internally the method will identify and retrieve the necessary key from the ciphered content which now contains the keyIDENT string that uniquely identifies it.

QA Status: Load tested, march 22. Functional April 22.

example:

$deciphered = $CryptoEngine->decipher( $ciphered );

The return value at this point would be a StrongString because our engine assumes that if you deciphered something, it simply is sensitive. To read the deciphered content in your code remember to use its ->getString() method which will properly retrieve the value from the protected memory space. (See the StrongString article for more information)

* BlackCipherBox-> decipher_File( $input_filename, $output_filename ) : boolean

Built on our R&D constructs BlackCipherBox/FileCipherCryptoBox* , where files are ciphered using the Sodium SecretStream_XCHACHA20POLY1305 construct, and wrapped with a key IDENT header allowing the engine to retrieve the concerned key from the configured database. Part of the magic engine described above.

QA Status: Concept tested, feb 22. integrated and load-tested in May 22, ready for application QA tests.

Key-Related functions

Generating new keys

Keys require a password to protect them against access on the filesystems. When stored in the database (the default behavior for our engine now), keys are also wrapped in a cipher using the same password (although, the engine might decide to re-forge the passwords on first import, for security purposes, be warned.)

When integrating in a web app, the password is to be stored in the configuration files (which themselves should be outside of your webroot/) as a global variable. We are specifically avoiding storing that password in the database for now, because we want to avoid some unecessary roundtrips.

Security Analysis; we assume that if an intruder is capable of accessing the password, he is still required to access the keys which are stored in the database (usually off-server as well). We could store the password in a different remote location and gate-protect it using an independent ACL system. (geuk!) Currently there IS a facility for getting/storing keys over REST using the Core framework's CODEX/ API. (But that also needs to be extended to handle the wallet and latest formats? not sure. -apr22)

The same process exists for all the supported key types in the BlackCipherBox engine. Generate a password using the engine's recommended settings (a must!), and then generate a new_Key() using the generated password. The returned Key will correspond to the class object represented by it. But typically, developers will want to use this only if inserting data into a system using a pre-defined key. The engine is fully capable of generating its own keys now.

Follows is a snippet example of how one would go about creating and saving keys. But remember, this is handled magically internally, no need to do this. We're just explaining how the Master Password fits in the model essentially.

$password = \BlackCipherBox\BoxedKey::generate_Password();
$datakey = \BlackCipherBox\BoxedKey::new_Key( $db, $password );
if ( ! $datakey->save_key( $db, $key_Universe, $key_Type) ){
   print "we're in trouble!"
}

When initializing a new web application, the recommended method is to use the framework skeleton script located under Scripts/Generate_newCryptoKeys.php. The script will provide instructions on how to install the generated keys in your web application's configuration. (a manual step, but very simple).

The preferred methodology for transporting keys between web applications should be by exporting/importing Wallets which contain the entire web application's key sets. (The devil being in the details though, the engine is not implementing such methodology for now. We need to figure out some details first. Some keys can be portable whereas others should not.)

->load_a_key(string $key_Universe = 'webapps', string $key_Type = 'BoxedKey') this method should be seldom used by developers, it allows loading a "random" key out of the existing and available keys (from the Database), for a specific universe/realm. If the engine cannot locate an available key, it creates new ones, according to predefined constants in the class object. Combined with the engine's internal feature of Key-Tagging ciphered content, we are now able to manage all this automatically. The method is used by the Wallet management internally in the BlackCipherBox. Loaded keys are kept in protected memory space (using StrongStrings) for the duration of the script's lifetime to avoid database roundtrips.

->load_key_from_db( &$QueryDB, string $keyIDENT ), loads a particular key from the database unto the memory wallet. Seldom used by programmers, this one is called from the decipher() functions internally.

->export_key($keyIDENT, $password_for_Exported_string) this method should be seldom used as well, but its meant as a facility to export keys with the goal of transferring them to another system (in a safe format for public networking). The receiving end must be able to decompile the ciphered key at importation time, of course.

Integration in the Core PHP Frameworks

Because the purpose if to use this cryptographic engine in our web applications, we break the ice by implementing it in our own Core PHP Framework. This allows us to replicate and centralize the inner-works of our web applications in our enterprise, so as to maximize our code quality and time.

During the beta phase program, we are testing this in a branch of our latest core web tree called blackcore. It is located under our sites/blackcore.dev.kopel.ca/ development folder, and can be accessed online at https://blackcore.dev.kopel.ca/ and https://admin.blackcore.dev.kopel.ca/.

The engine is integrated with the QueryDB currently and is NOT integrated in the Session classes. (For Session we maintain the old methodology because we we want to avoid ciphering the key in the configs to save on the CPU resources during web requests.)

The cryptographic subsystem is located under core2022.dev.kopel.ca/include/BlackCipherBox. (outside of the include/CORE hierarchy for now).

The engine requires a minimal starting configuration to be included in every script instance (including cli-based instances). This configuration can be generated with the Scripts/Generate_newCryptoKeys.php script. It includes the app-level password to decipher the keys, a specially-shortened session key (for internal usage) and an app-level Hashing key (see the above section on Hashing functions).

Important Note: the app-level password needs to be instanced after the autoloader declaration has been executed. This is because it depends on the autoloader to instance itself. Therefore, the app-level password is normally configured in the autoloader logic at the bottom of the config files, according to our framework structures.

Version compatibility in our Core PHP frameworks

Because continuity is important to us, and because our resources are constantly limited, we've prototyped, tested and decided on a preferred method for adapting our versioning of ciphering methods.

2 solutions were available; either -a- upgrade a web app's dataspace using a crypto-upgrade scripts that executes against all the data tables and ciphered fields (and files), or -b- implement version toggling in the ciphering+deciphering methods in the BlackCipherBox class to fallback on the previous methods in case the content to be deciphered be recognized as belonging to the old system.

During prototyping and tests, I've found the second method represented much less complexities for a stable implementation. The first solution presented a lot of issues localizing encrypted content in the database since the previous engine lacked some identifying headers. Whereas the second approach literally involved less than 20 lines of coding that don't need constant revisions (the first approach was generating a scripted engine of 2000 lines that remained quirky depending on the target applications, see, custom for each application).

Downside of the chosen approach though is that we do not have a method, currently, for making assertions about the overall coverage of the ciphering methods in a dataset. We will need to program a script that will parse the database looking for ciphered content and determining which cryptographic engine (and keys) are still in use. Once assertions are built and confirmed, then it becomes possible to phase-out the previous cryptographic engine.

Newly developed applications will by default use the new cryptographic engine. As long as their configuration files include the setting:

$BLACKBOX_ENFORCED_VERSION=1;

The ciphering methods will automatically prefer the BlackBox ciphering modules, and the deciphering methods will still parse & toggle based on the ciphered content's headers.

First beta applications making use of the BlackCipherBox cryptographic constructs

I've determined that the best candidate for the BlackCipherBox beta phase would be data.kopel.ca. The reasoning is that in this particular application, data of concerns remains on the system for a very limited time. (See, less than 1 year, with fine-tuning it should drop to less than 3 months, in general). The application also presents an interesting facet through some usage quirks where file and data are copied to another production system internally rather quickly, and through human intervention in our internal data classification procedures. So this presents an environment where humans are already revising each and every entry, on fresh data, which can always be corrected with the help of good customer relations. It is also the bastion of our digital exchanges with customers, requiring the latest innovations in cryptographic development that we can provide and support.

In order to fit in our development model, I've had to implement the BlackCipherBox class hierarchy under core2022.dev.kopel.ca/. data2022.dev.kopel.ca/ is bound to the first coding framework through local folder links in the development area (at the include/ hierarchical level). Other internally developed applications also link against that code repository in the same manner; they should be: stech.dev.kopel.ca, automation.dev.kopel.ca and the *.cryptochocolate.com websites that I personally develop and where I can more freely test the limits of the developed engines without affecting customer data.

Beta phase discoveries

Portability issues

Because we depend on a over-optimized serial ID, and because we toggle keys aggressively, we're putting a lot of strain and stress on the portability issues. From our first implementations, in particular in cryptochocolate & Tonantzin (side r&d projects where we test our object models), because we end up having to share part of the keys between both web applications, we've quickly discovered the limitations of our Key-IDENT in base36 format. A key assigned the IDENT # of 1 will quickly clash when transferred to a different webapp, which will invariably compete for the same serial IDs in its own little universe.

We therefore have to rethink how we define the key-IDs which we then include in the ciphered strings and bits. This will place a heavy toll on our ciphering in the database where our field sizes will invariably increase by an extra 20-32bytes. (That's an extra 20-32 chars, and even 80-128 chars if stored in a utf8mb4 charset field.)

There is another approach to Key Identification which we could implement, but I feel its taking the goal of this project a bit off track; that would be to maintain a Key Usage Index in the database where we maintain a relationship data row indicating Where each key has been used; namely; tablename.fieldname + rowID. Its feasible with the architecture of our web apps, -but- makes the engine even less portable to other projects, and presents its own exceptional headaches for file handling.

I therefore think it would be more wise to focus on redefining the IDentification markers we wish to use inside the ciphered boxes.