Digitalglen

Caching was hard, so I concentrated instead on something more concrete: the display of CKRecords and their fields.

I've also been considering how to make it easier to describe and upload reasonably complex object graphs. Say you have Artists and Artworks.


	Artist
	   UID
	   Name
	   Image // reference
	   ImageThumbnail // Asset
	Artwork
	   UID
	   Title
	   Artist // reference
	   Image // reference
	   ImageThumbnail // Asset

This small example already includes 4 images and 3 references, each of which represents a possible failure point. The developer has to:

Store the full-sized image files somewhere accessible to the Builder app.
Create the thumbnail image files and store them somewhere accessible to the Builder app.
List in an import file the Artists and Artworks, including the various image files by UID as references and Assets.
List in an import file the Image records that are referenced by the Artists and Artworks.
Import and upload

There's a lot that can go wrong there. What can be simplified? A few possibilities:

Have the builder auto-generate the thumbnail files from the full-size images, based on dimensions included in the import file? (Doable and probably useful, but how to specify which images need thumbnails?)
Perhaps even more useful—but unhelpful for batch processing—would be to support individual record creation including wells into which you could drop things:
- A dropped image would be copied to the correct directory, a thumbnail generated and copied to the correct directory, a reference created for the image and connected to the record, and the thumbnail embedded in the record as an asset.
- A dropped record would generate a reference to that record.
But, is it worth doing all this macOS work when the workflows are not yet clear?
Establish naming conventions to standardize UID, image UID and file name, and thumbnail file name? For instance, an Artwork with UID 17 might be accompanied by an image named 17.jpg and 17_t.jpg. (Terse, but fragile and limiting.)
Establish naming conventions for directories? import/ (record 17 is mentioned in an import file) images/ 17.jpg thumbnails/ 17.jpg (Terse, but still fragile and limiting.)

Thinking this morning about how the Builder caches and syncs with the cloud. In addition to large-scale importing and uploading to the cloud, the Builder should let you edit existing records and save them back to the cloud.

There are two ways to approach this: records live only in the cloud, or they're shadowed on the client. I'm strongly inclined towards cloud-only but can appreciate benefits to either approach.

The entire test object graph now uploads successfully including references and assets. For now, it's only 3 BaseballCards, 2 Persons, and 2 Images, and BaseballCard and Image both include an image asset. Still, even that took seconds to upload.

Also, reference actions are now supported. Self-deleting references are specified by appending a ! to the fieldType as in <RecordType!> and [RecordType!].

Good progress on the builder app. Record production occurs in three phases: loading, building, and uploading.

class RecordImporter: NSObject {
    func importRecords() {
        let files = ["BaseballCard.txt"]
        
        let protoMap = RecordLoader().load(fromBundle:files)
        let records = RecordBuilder().build(from: protoMap)
        RecordUploader().upload(records)
    }

Loading
The Loader's job is to load data from disk and get it ready for the Builder.
As files are parsed, the various pieces needed to construct each discovered record type are collected in a RecordProtoList, which comprises the record's type along with its field types and values.
```
class RecordProtoList {
    let recordType : String
    var protoFieldTypes : [ProtoFieldType]
    var fieldValues : [Any]
}
	
```
These are not yet CKRecords but rather constructs that provide enough information for the Builder to do its job. Some of the field types, for instance, are placeholders that require additional work (and, for references, knowledge of other records for validation) before a CKRecord can be built. As the proto lists are loaded, each proto record field is given a ProtoFieldType that during the build phase will be processed and replaced with a real field type.
```
enum ProtoFieldType {
    case Standard(StandardFieldType, Any) // field name
    case Reference(String, String) // ref record type, field name
    case Asset // field name
    case ReferenceList(String, String) // ref record type, field name
}
	
```
When the loader's job is done, everything is in memory and ready for handing off to the Builder for making actual CKRecords.
Building
The Builder's job is to transform the data loaded from disk into actual CKRecords.
First, the values have to repackaged properly. Most values for the proto field types require processing before they're ready to be added to a CKRecord:
- .Standard
  This value maps directly to one supported by CloudKit, but still needs to packaged properly. For instance, a Location value stored on disk as "-143.03945, 34.54923" must be repackaged as CLLocation("-143.03945", "34.54923").
- .Reference
  A string representing the referenced record's recordID is repackaged as a CKReference.
  CKReference includes a CKReferenceAction that determines whether the reference is deleted when the target record is deleted. I haven't implemented this yet, but it will be represented on disk as an appended bang in the proto field type as <RecordType!>.
- .Asset A string representing a file on disk is loaded into memory and extracted as Data.
- .ReferenceList An array of strings representing the referenced records' recordIDs is repackaged as [CKReference].
  CKReference includes a CKReferenceAction that determines whether the reference is deleted when the target record is deleted. I haven't implemented this yet, but it will be represented on disk as an appended bang in the proto field type as [RecordType!].
Once all values have been repackaged appropriately, the proto lists are compiled into actual CKRecords and handed off the the Loader.
Uploading
The Uploader's job is to upload the CKRecords from the Builder app into the iOS app's public database. Still a work in progress
I was happy to discover it's easy to associate a macOS app with an iOS app's CloudKit container. And it's easy to upload individual records as shown below. But this is inadequate for several reasons:
1. Performance - there will be many thousands of records. These need to be grouped efficiently.
2. UI - this will take a long time and needs to indicate progress. Uploading thousands of images, for instance...
3. Interruptability - it needs to be able to be cancelable.
func upload(_ records: [CKRecord]) { for record in records { publicDatabase.save(record) { (record, error) in if let error = error { // print error and continue } } } }

Need to think about how to represent a large database (5000+ records) in way that the macOS builder app can import and push to the cloud.

From previous experience, I'd like to store all data in simple text files. Easy to read and modify. So whatever UI the builder presents, it would persist things into flat files, probably delimited lists.

Each record type could be stored in a separate text file:

    BaseballCard
        // name  |  image asset  |  (list of Person references)
        "Mickey Mantle"  |  mantle-card.jpg  |  ?
        "Willie Mays"  |  mays-card.jpg  |  ?
        "Manager's Dream"  |  managers-dream-card.jpg  |  ?

    Person    
        // name  |  image reference
        "Mickey Mantle"  |  ?
        "Willie Mays"  |  ?

    Image  
        // image asset
        mantle-person.jpg
        mays-card.jpg

But how to represent those references? Perhaps by adding a buildtime-only UIDs?

    BaseballCard
        // uid  |  name  |  image asset  |  (list of Person references)
        C_MANTLE  |  "Mickey Mantle"  |  mantle-card.jpg  |  P_MANTLE
        C_WILLY_MAYS  |  "Willie Mays"  |  mays-card.jpg  |  P_WILLY_MAYS
        C_MANAGERS_DREAM  |  "Manager's Dream"  |  managers-dream-card.jpg  |  P_MANTLE, P_MAYS

    Person    
        // uid  |  name  |  image reference
        P_MANTLE  |  "Mickey Mantle"  |  IMAGE_MANTLE
        P_MAYS  |  "Willie Mays"  |  IMAGE_MAYS

    Image  
        // uid  |  image asset
        IMAGE_MANTLE  |  mantle-person.jpg
        IMAGE_MAYS  |  mays-person.jpg

Once all records were loaded, the UIDs could be used to create CKReference records and then assign the references appropriately.

    let rawBaseballCards : [UID: Dict] = loadRawBaseballCards()
    let rawPersons : [UID: Dict] = loadRawPersons()
    let rawImages : [UID: Dict] = loadRawImages()
    
    let baseballCardRecords : [CKRecord] = createCloudRecords(rawBaseballCards)
    let personRecords : [CKRecord] = createCloudRecords(rawPersons)
    let imageRecords : [CKRecord] = createCloudRecords(rawImages)
    
    let baseballCardReferences : [CKReference] = connect(baseballCardRecords, personRecords)
    let personReferences : [CKReference] = connect(rawBaseballCards, baseballCardRecords, personRecords)
    let imageReferences : [CKReference] = connect(rawPersons, rawImages)
    
    pushToCloud(baseballCardRecords)
    pushToCloud(personRecords)
    pushToCloud(imageRecords)

    pushToCloud(baseballCardReferences)
    pushToCloud(personReferences)
    pushToCloud(imageReferences)

This might work, but it's too custom. Need a more general approach. Perhaps preface each file with a header line that describes the column, data type for most columns, but for references include the referenced record type as well.

    RecordType:RecordName  |  UID  |  FieldDataType:FieldName

DataTypes currently include the following, but others would be added as needed:

String
Asset
<RecordType> Reference
[RecordType] ReferenceList

    RecordType:BaseballCard  |  UID  |  String:Name  |  Asset:Image  |  [Person]:People
    C_MANTLE  |  Mickey Mantle  |  mantle-card.jpg  |  P_MANTLE
    C_WILLY_MAYS  |  Willie Mays  |  mays-card.jpg  |  P_WILLY_MAYS
    C_MANAGERS_DREAM  |  Manager's Dream  |  managers-dream-card.jpg  |  P_MANTLE, P_MAYS

    RecordType:Person  |  UID  |  String:Name  |  <Image>:Image
    P_MANTLE  |  Mickey Mantle  |  I_MANTLE
    P_MAYS  |  Willie Mays  |  I_MAYS

    RecordType:Image  |  UID  |  Asset:Data
    I_MANTLE  |  mantle-person.jpg
    I_MAYS  |  mays-person.jpg

Now the builder could parse a single file containing multiple records types, instantiate the records, then create and link references properly.

Things I need to learn

Caching

How aggressive should the cache be?
Where? The Caches directory?
Might different RecordTypes need different caching policies?
How should instantiated assets be cached and referenced? If BaseballCard.image is cached as a UIImage, what's the best way to associate that UIImage with the CKRecord? And how about with a referenced image?
When should cached items be invalidated?
How does the cache get purged? Need to learn more about the Caches directory.

Request

How important is canceling a request?

Updates

When the app is notified of a remote update, how fine-grained should the response be? Right now, the app just refreshes everything, but that's a poor user experience. You can't just download everything that changed, but perhaps you can ask what changed, then compare that against the cache, and invalidate those cache elements.
How to respond to updates to public/private/shared databases

Subscriptions

How to subscribe optimally to all desired changes in public/private/shared databases
How can a new device subscribe without receiving an "already-subscribed" error? For the public database, the developer can subscribe one-time-only to get the app primed, but this won't work for the private and shared databases. I suppose the "has already subscribed" flag can be synced across devices via the key-value mechanism, but that seems pretty high-latency.

Populating

What's the best way to populate a huge database?

Working through CloudKit subscriptions.

I have coarse subscriptions working, where any change to the public database fires a remote notification, but the app for now is responding coarsely as well by instructing all view controllers to refetch any data they're using. The view controllers also pop themselves off the nav stack if they find that their primary model object no longer exists—if the BaseballCard page, for instance, finds that the baseball card it is displaying was deleted.

I'm using baseball cards and the people featured on the card for initial exploration.

Public database

A BaseballCard includes an image asset which gets downloaded along with the BaseballCard. This is the full-sized baseball card image, which is unwise obviously, but lets me test the difference in approach with Person.image which includes an image by reference.

BaseballCard
- Name string
- Image asset
- People list of Person references
Person
- Name string
- Image reference to an Image record
Image
- Data asset

These three classes let me explore a few things:

Fetching all records
fetch all records of recordType "BaseballCard"
Resolving a references list
fetch all records in "Baseball.people"
Instantiate embedded image data
"BaseballCard.image" → UIImage
Resolve an image reference, then instantiate the image data
fetch "Person.image" → UIImage

App

The app includes classes that model the cloud's CKRecords directly, all derived from a common parent class, Model.


class Model {
	let record : CKRecord!
	let database : CKDatabase
	var recordName : String { return "?" }	
		
	required init(record: CKRecord?, database: CKDatabase = .publicCloudDatabase)
}

class BaseballCard: Model {
	override var recordName : String { return "BaseballCard" }
	var name : String
	func image(_ completion: @escaping (_ image: UIImage?) -> ())
	func people(_ completion: @escaping (_ people: [Person]) -> ())
	func cards(for person: Person, completion: @escaping (_ cards: [BaseballCard]) -> ())
}

class Person: Model {
	override var recordName : String { return "Person" }
	var name : String
	func image(_ completion: @escaping (_ image: UIImage?) -> ())
	func cards(_ completion: @escaping (_ cards: [BaseballCard]) -> ()) 
}