






















JT 1078 governs how a commercial vehicle’s video reaches a regulator’s platform, nothing more. It governs the link, saying little about how good the camera itself has to be.
JT 1078, written in full as JT/T 1078, is a video communication protocol. It defines the rules a vehicle terminal and a monitoring platform follow to move audio and video between them. It covers how a live feed is asked for and streamed, how a stored clip is found and pulled back, and how the video tied to an alarm is pushed up as evidence. It is a language for the link between the cab and the centre. Reading it as anything larger is where the confusion starts.
The standard came out in 2016, alongside JT/T 1076 for the terminal and JT/T 1077 for the platform. Between them those three carve up a monitoring system: 1076 says what the box in the cab must be, 1077 says what the platform must be; 1078 fixes the conversation that crosses between them. Everything the number certifies lives inside that conversation. The codecs, the live view, the alarm upload, the playback path: all of it is JT 1078. None of it is the camera.
That last point is the one buyers trip on, so it is settled first, before the rest follows the protocol from the live view through the alarm attachment that carries the safety case, down to the weak-signal reality and the provincial-platform docking where a compliant box still has to be proven. A reader who follows that order can tell what a line of JT 1078 compliance does and does not promise.
Live view, playback and attachment upload are its three main flows.

JT 1078 does not say a camera must hit a certain resolution. It does not grade one recorder against another, or test whether a lens holds focus after dark. A terminal can implement the protocol to the letter and still produce footage too soft to read a number plate at any distance. The standard governs the exchange with the platform. What the box does internally to make the picture sits outside it.
This matters because a compliance line is so easy to misread. A buyer who sees JT 1078 on a datasheet and takes it as a mark of image quality has read a transport protocol as a camera grade. The two have nothing to do with each other. The protocol guarantees the platform can ask for video and the terminal can deliver it in a form the platform can decode. Whether that video is any good is a question for the sensor, the lens and the encoder, none of which JT 1078 speaks to.
Before a common protocol, every supplier spoke its own dialect. A fleet that bought terminals from three vendors needed three platforms to watch them, or one platform welded together with a custom adaptor for each make. The regulator, who wanted a single view of every bus and tanker in the province, faced boxes that could not understand one another. The cost of that fragmentation landed on everyone and grew with the fleet.
JT 1078 settled the video half of the problem by writing the conversation down. With the messages fixed, a platform author builds once and reaches every compliant terminal. A terminal maker sells into any province without porting to each platform by hand. The vehicle owner changes supplier without scrapping the back end. The parts no longer have to be bought as a set from one company. It is also why a regulator chose to mandate one; the market on its own had not converged.
JT 1078 does not stand on its own. It is a video extension to JT/T 808, the older protocol that already carries a terminal’s registration, position, status and basic alarms up to the platform. The 808 link handles the housekeeping and the general messages. JT 1078 puts its signalling on that same session: the request that starts a live stream is a numbered platform message, 0x9101 in the protocol’s tables, naming the address the terminal must push to, a server IP with its TCP and UDP ports, plus the logical channel and stream type wanted. The media itself then flows on a fresh connection to that named server, leaving the 808 session clear for commands. A terminal that cannot do 808 cannot do 1078, because the signalling assumes the 808 link is there underneath it, carrying the identity and the location that every video frame is implicitly tied to.
The live view is the function people picture first. A platform operator clicks a camera on a moving truck and expects an image within a second or two. Under JT 1078 that click becomes a request that names the channel and the stream type and points the terminal at the server that will receive the media. The terminal opens an audio-video session to that server. The feed begins to flow. A separate control message lets the platform steer the session afterwards, to switch streams, pause, or close it when the window is shut.
The terminal does the connecting, not the platform. The reason is structural. The terminal sits behind a mobile network on an address that changes as it roams, unreachable from the outside. So the platform names what it wants and then waits. The terminal reaches out to the server it was told to use. Everything in the live path is shaped by the fact that the vehicle is the side that has to start the conversation.
Live video also leans harder on the cellular link than anything else the terminal does, so the protocol has to behave when that link is poor. The session carries its own keep-alive and its own teardown, so a feed that drops does not leave the terminal pushing bytes into a dead socket and burning a driver’s data allowance for nothing. When the platform shuts the view, the control message tells the terminal to stop. How cleanly a terminal handles that housekeeping is one of the things that separates two boxes both claiming the same standard.
JT 1078 separates a main stream from a sub stream. The split matters more than it first sounds. The main stream is full quality, heavy on bandwidth, the one kept for evidence and close inspection. The sub stream is a smaller, lighter version of the same camera, meant for a platform watching many vehicles at once or working over a thin signal. The terminal offers both. The platform asks for whichever the moment can afford.
That choice is what keeps a wall of live feeds from overwhelming the network behind it. A control room showing a hundred vehicles pulls sub streams and the picture stays fluid; an investigator looking hard at one truck pulls the main stream and accepts the bandwidth. Encoding both at once is a steady load the terminal carries through the whole shift, which is one of the quiet demands the standard places on the hardware without ever naming a chip.

The part of JT 1078 that bears hardest on the safety case is the alarm attachment. When an active-safety device fires, a forward collision warning or a fatigue alert, the bare alarm is not enough for a regulator. They want the seconds of video and the still images that show what happened. The standard defines how that evidence travels from the recorder to a server, on its own path, kept apart from the live stream.
The flow runs as a chain of defined messages, each one keyed to the alarm it belongs to. The alarm carries a type, built from the identifier of the peripheral that raised it and the kind of event, so a forward-collision warning and a fatigue alert arrive tagged as distinct things the platform can sort. The terminal reports that an attachment exists. The attachment server answers, acknowledging the alarm. The terminal then sends the information for each file it is about to upload. Only after that handshake does it begin pushing the files themselves. The attachments are ordinary formats chosen so any platform can open them: images as JPG or PNG, video as a raw 264 stream, audio as WAV, a text record as a bin file. Each file is announced with its name and size before it crosses, so the server knows what to expect and can tell a complete transfer from a truncated one, then confirms receipt as the files land. The whole sequence is built so the platform can tie every uploaded file back to the exact event that caused it, with nothing orphaned and nothing guessed at. That binding, an alarm number that threads from the first message to the last file, is what turns a heap of clips into an auditable record, the same across every brand of terminal a province has to accept. The server the files go to is not assumed either. The platform names it when it acknowledges the alarm, an instruction numbered 0x9208 in the active-safety profiles built on this stack, so a province can route attachments to dedicated storage held apart from the live-streaming servers, free to move that storage later without touching a single terminal. The terminal connects where it is told, for that one alarm, then closes the connection when the upload is done, so a vehicle firing a run of events opens a short-lived upload for each in turn; no channel stays open all shift. The size declared up front for every file lets the server catch a truncated transfer and ask for the rest, so a half-uploaded clip is never quietly filed as the whole of the evidence. The result is a trail a regulator can trust without knowing or caring which company built the box that produced it. For a province signing off a fleet of mixed makes, that brand-blind uniformity is the whole reason the upload was written into a shared standard at all.
The attachment travels to its own server for a plain reason. The files are large. A fleet of thousands of vehicles can raise alarms in bursts when the traffic turns bad, so sending that weight down the same path as the live streams would overload both at once. A separate channel lets the evidence upload in the background, at its own pace, the live wall staying responsive for the operators relying on it. The standard also lets a stalled upload resume from where it stopped when a cellular link breaks mid-file, which on a moving vehicle happens often; without it every dropped bar would mean starting a large file from the top, with many never finishing.
This mechanism is what turns a safety alarm into something a regulator can check. An alert with no clip behind it cannot be checked. The attachment flow exists to carry the proof. By fixing how the clip is announced, uploaded and confirmed, JT 1078 makes the active-safety record real. It makes it uniform across suppliers, since a province cannot write a different evidence handler for every make in a fleet of tens of thousands.
The clip itself comes from the recorder’s own storage, pulled for the window around the trigger. So the terminal has to be recording all its cameras continuously, holding the recent past, ready to surrender the seconds an alarm points back at long after the moment has gone. The upload reads from that stored buffer, which is why a vehicle out of signal still has the evidence waiting on the card when the link returns.
For any of this to work across vendors, the bytes have to be agreed. JT 1078 names the codecs a compliant device is expected to use, so a platform decoder is never handed a stream it has no way to read. The packing is fixed at the same level of detail: each media packet rides a frame whose header carries the terminal’s SIM identity, the logical channel and a timestamp, so a platform can sort interleaved streams arriving from thousands of vehicles without guessing which packet belongs to whom. Video runs as H.264 or the newer H.265, the latter cutting the bandwidth a busy fleet pushes over the air. Audio is carried in the common telephony and broadcast codecs, the G.711 and G.726 families along with AAC and AMR, so a platform can play the cab audio without a codec hunt.
The standard also fixes how the audio and video are packetised for transport, the frame markers and the timestamps that let a player rebuild the stream in order. That packaging, more than the codec names, is the part a terminal maker has to get exactly right, because a platform that meets a frame it cannot parse drops the feed even when the underlying codec is one it knows. The codec list says what the media is encoded as; the packetisation says how it is framed for transport; both have to match for the platform to play the stream.
The move from H.264 to H.265 is the codec change a fleet feels. H.265 carries the same picture in markedly less bandwidth, which across thousands of vehicles streaming over cellular is real money saved and a real load lifted off the network. The catch is that both ends have to support it. A terminal encoding H.265 to a platform that decodes only H.264 has produced a stream nobody at the centre can read. So the codec list in the standard is a floor both sides meet on. The newer codec spreads through a fleet only as fast as the platforms behind it learn to decode it.
Beyond the live view, a platform can reach into footage already on the recorder. JT 1078 defines how it queries what the terminal holds, then asks for a stretch of it by channel and time. The recorder streams the stored video back the way it streams a live feed, with the platform able to seek through it and control the playback; the request that opens it, numbered 0x9201, names the server to stream to and the time span wanted. For footage that has to be kept beyond a single viewing, the standard also covers pulling the file down as a complete file transfer.
This is what lets an investigator recover the half hour around an incident weeks later, from a vehicle that has driven thousands of kilometres since. The footage lives on the card in the cab until newer recording overwrites it, so the protocol is the mechanism that gets the right stretch off the vehicle before that happens. It is also the reason a fleet sizes its storage against how long an investigation might take to begin.
Everything in JT 1078 has to survive a link that comes and goes, because a commercial vehicle spends its life moving between strong coverage and none. A truck drops into a tunnel, or parks under a steel roof at the depot. The cellular link it depends on thins to nothing and back many times a shift. The protocol is shaped throughout by that fact. Much of what separates a usable terminal from a frustrating one shows up here; the feature list does not reveal it.
The sub stream is the first answer, letting a platform hold a watchable picture on a link too thin for full quality. The resumable attachment upload is the second, so the evidence uploads in pieces across a broken connection, resuming from the last confirmed piece after each drop. Recording is local first and always, written to the card the instant it is captured, so a lapse in signal costs the live view and the upload timing but never the footage itself. The vehicle keeps its own complete record on the card regardless of whether the centre can see it in the moment.
None of it appears in a glance at a spec sheet. It is the difference between a system that loses an event because the truck was in a dead spot when it happened and one that delivers the clip an hour later when the link returns. A buyer who has run a fleet through real coverage knows to ask how a terminal behaves at the edge of signal, because that edge is where much of its working life is spent.
Compliance on paper and a clean connection to a live provincial platform are two different milestones. The gap between them is where many terminals stumble. A platform built by one company and a terminal built by another are reading the same standard; a standard still leaves room for interpretation. The two still have to be brought into agreement on a working link before a vehicle counts as monitored.
The docking is the process of proving that link. The terminal registers over 808, the platform recognises it, the live view comes up, an alarm is fired and its attachment lands intact on the platform’s server, the playback path is checked. Each step can fail in a small way that a paper test never catches: a field interpreted one way at the terminal and another at the platform, a timeout set too short for a slow link, an attachment that uploads and then lands unparsed because the file information was framed in a way the platform did not expect. None of these is a failure of the standard. They are the friction of two independent implementations meeting for the first time.
A real docking runs a checklist a paper claim never faces. Registration has to hold across a signal drop; succeeding once on a bench proves little. The live view has to come up on the first click, then recover on its own after the link returns. The hardest line is the alarm: one raised on the road has to land on the platform with its attachment whole and parseable, the end-to-end path the whole safety case rests on. Playback has to reach footage a week back cleanly. Every one of these is a place two readings of the standard can quietly diverge, settled only on the live platform before a vehicle is signed off as monitored.
This is the work that turns a compliant box into a monitored vehicle. It is why an operator does not treat a JT 1078 line on a datasheet as the finish. The question to ask is whether the terminal has been docked, on a real platform, with the alarm attachment flow tested end to end. A terminal that has done that on the platform a fleet reports to in service counts for more than one with a longer feature list that has never been brought up against it.
Knowing what the standard ignores is as useful as knowing what it sets. It does not specify the camera sensor, the lens, the night-vision behaviour or the housing. It does not test whether the device survives heat, vibration or a wide supply-voltage swing. It does not cover the active-safety algorithms that decide an alarm should fire in the first place, only how the resulting clip is carried. All of those live in other standards, or in the buyer’s own testing, well outside the protocol.
It helps to name the neighbours the protocol defers to. A terminal that survives a summer cab owes that to the automotive environmental spec it was built to; speaking the video protocol contributes nothing there. A camera that reads a plate at dusk does so on the strength of its sensor, on a scale JT 1078 never mentions. The alarm that set off an upload was judged real or false by an active-safety algorithm whose output the protocol only carries. Each of those belongs to a standard of its own. A buyer who maps every question to the standard that owns it stops asking a transport protocol to vouch for the camera, the housing and the algorithm all at once.
So a purchase that leans on JT 1078 compliance alone has ticked one box and skipped several. The protocol confirms the device will talk to the platform. It says nothing about whether the footage is clear enough to act on when it arrives, whether the box will survive a summer cab, or whether the alarm that triggered the upload was a real event or a false one. Those are the questions that decide whether the watching protects anyone. The protocol is silent on every one of them.
For a fleet in China, JT 1078 is not optional. A two-passenger-class or hazardous-goods vehicle has to report to the platform, the platform speaks this protocol, so the terminal must speak it too. That makes compliance the floor of any shortlist, the line checked before a conversation about anything else can begin. A device that fails it is out before its camera is even discussed.
The right reading is to treat that floor as a floor. Confirm the terminal speaks JT 1078 cleanly, the alarm attachment flow included and docked on the real platform, since the safety case rests on it. Then judge the camera, the storage, the build and the behaviour at the edge of signal on their own evidence, with the protocol set aside. JT 1078 gets the device onto the network and keeps every brand legible to one platform. What the device does once it is there is a separate question, the one a buyer spends more time on.
No. It is a video communication protocol that defines how a terminal exchanges live, recorded and alarm video with a platform. It says nothing about sensor resolution, night-vision performance or build quality. A device can be fully compliant and still produce footage too soft to use.
The exchanges between terminal and platform: live streaming of a chosen camera, playback or download of stored footage, and the upload of the video and images attached to a safety alarm. It also names the codecs a device uses, H.264 or H.265 for video and the common audio codecs, then fixes how the stream is packetised and how channels are numbered.
JT/T 1076 covers the on-board terminal and JT/T 1077 the platform; JT 1078 is the protocol between them, all riding on JT/T 808 for registration, position and basic messages. A complete terminal sits on the 808 and 1078 stack and is built to 1076; the platform it reports to is built to 1077.
Alarm attachments are large files; a big fleet can raise them in bursts, dozens of vehicles alarming through one morning rush. Sending that weight down the same path as the live streams would overload both. A dedicated attachment server lets the evidence upload run in the background, with the transfer resuming after a dropped link from where it stopped, every file tied to the alarm number that raised it.
Not on its own. Compliance is the entry condition; the terminal still has to be docked on the actual provincial platform, with the live view, alarm attachment and playback all proven on a working link. Two compliant implementations can still need adjustment to talk cleanly, so a terminal docked on the platform a fleet uses counts for more than a paper claim.
No. JT 1078 defines how the video tied to an alarm is carried; the logic that decides an alarm should fire sits in other standards. The detection belongs to the active-safety devices and their own standards. JT 1078 carries the evidence once the event has already happened.