My goal is to integrate a mlmodel into the Apple Vision Pro environment. However, I have encountered challenges in finding appropriate code to achieve this integration.
I would like to share my progress so far. My primary objective is to detect the presence of a real phone in the real world using Apple Vision Pro. This involves accessing the Vision Pro camera to capture video input, process it with the model, and then determine the appropriate actions based on the detection results.
Any guidance or suggestions on how to effectively implement this would be greatly appreciated.
//class image recognition
import Foundation
import CoreML
import Vision
import SwiftUI
class ImageRecognitionHandler: ObservableObject {
@Published var recognizedObjects: [String] = []
private var model: VNCoreMLModel
init() {
do {
let configuration = MLModelConfiguration()
let phoneRecognitionModel = try PhoneRecognition1(configuration: configuration)
model = try VNCoreMLModel(for: phoneRecognitionModel.model)
} catch {
fatalError("Failed to load CoreML model: \(error.localizedDescription)")
}
}
func recognizeImage(_ image: UIImage) {
let request = VNCoreMLRequest(model: model) { request, error in
guard let results = request.results as? [VNRecognizedObjectObservation], error == nil else {
print("Failed to perform image recognition: \(error?.localizedDescription ?? "Unknown error")")
return
}
let recognizedObjectIdentifiers = results.compactMap { $0.labels.first?.identifier }
if recognizedObjectIdentifiers.contains("phone") {
print("phone")
} else {
print("no phone")
}
}
guard let cgImage = image.cgImage else { return }
let handler = VNImageRequestHandler(cgImage: cgImage)
do {
try handler.perform([request])
} catch {
print("Failed to perform request: \(error.localizedDescription)")
}
}}
//main
import SwiftUI
import Vision
@main
struct TestVisionTrackerApp: App {
var body: some Scene {
WindowGroup {
ContentView()
}
}
}
//contentView
import SwiftUI
import RealityKit
import AVFoundation
struct ContentView: View {
@StateObject private var imageRecognitionHandler = ImageRecognitionHandler()
var body: some View {
VStack {
Button("Recognize Image") {
if let videoDevice = AVCaptureDevice.authorizationStatus(for: AVMediaType.video) == .authorized {
imageRecognitionHandler.recognizeImage(.????) //i don't know what to put in here
} else {
print("Failed to load image")
}
}
.padding()
}
}
}
#Preview(windowStyle: .automatic) {
ContentView()
}
In visionOS 2.0, Apple introduced the ability to track real-world objects, allowing similar functionality presented in ARKit for iOS when using ARObjectAnchor. To implement this, you'll need macOS 15+ Sequoia and Xcode 16. Using RealityKit's Photogrammetry API, create a USDZ model of your phone. Then in CreateML app, create an Object Tracking
template and place your 3D model at the origin of the XYZ coordinates. Create a .referenceobject
file. Based on the data in this file, you now can generate a trackable ObjectAnchor.
Here's the code:
import SwiftUI
import RealityKit
import RealityKitContent
struct ContentView : View {
let rkcb = realityKitContentBundle
let url = Bundle.main.url(forResource: "iPhoneX",
withExtension: ".referenceobject")!
var body: some View {
RealityView { rvc in
let description = try! await Entity(named: "forPhone", in: rkcb)
let anchor = AnchorEntity(.referenceObject(from: .init(url)),
trackingMode: .predicted)
anchor.addChild(description)
rvc.add(anchor)
}
}
}