MZ-700 demos

Is there such a thing as an MZ demoscene?
hlide
Posts: 524
Joined: Thu Jan 25, 2018 9:31 pm

Re: MZ-700 demos

Post by hlide »

I have a good news:

Code: Select all

	} else if(mem_bank & MEM_BANK_MON_H && (0xd000 <= addr && addr <= 0xdfff)) {
		if(!blank_vram) {
			*wait = BLANK_S - d_cpu->get_insn_clock() - 4 - get_passed_clock_since_vline();
			//d_cpu->write_signal(SIG_CPU_BUSREQ, 1, 1);
			blank_vram = true;
		}
	}
I added a method which normally gives the number of cycles of the current instruction (d_cpu->get_insn_clock()). Now I can count the remaining cycles until BLANK=1 to block the current instruction with a known /WAIT states count and WITHOUT using /BUSREQ!

I had to put that -4 to reach the 83 limit (yours being 82 :/) with `wt`, so I still need to know why.

And `emutest` is around 16ms!

Now I must check now what the values for d_cpu->get_insn_clock() are in `emutest` and `wt` and to determine why -4.

There is also the case for executing code in VRAM which may need to be checked. And what would be for an instruction doubly fetching in VRAM (opcode fetchin + reading/writing in VRAM).
hlide
Posts: 524
Joined: Thu Jan 25, 2018 9:31 pm

Re: MZ-700 demos

Post by hlide »

Ok, there were an error I fixed in d_cpu->get_insn_clock().

Code: Select all

		if(!blank_vram) {
			int delta = d_cpu->get_insn_clock();
			*wait = BLANK_S + delta - 4 - get_passed_clock_since_vline();
			blank_vram = true;
		}
This is more logical to ADD delta then SUBSTRACT 4 due to opcode fetching considering what I statically tested.

There is still an issue: what happens when executing a code in VRAM? In that case I would expect for the memory access to happen in the opcode fetching, that is, almost immediately.

So I tried to dig the T-states for instructions and found out they are more complex than expected. :(

Code: Select all

PUSH BC, see how the 5-3-3 clocks are distributed:

#003H T1  AB:000 DB:--  M1                                 |
#004H T2  AB:000 DB:C5  M1      MREQ RD                    | Opcode read from 000 -> C5
#005H T3  AB:000 DB:--     RFSH                            |
#006H T4  AB:000 DB:--     RFSH MREQ                       | Refresh address  000
#007H T5  AB:000 DB:--                                     |
-----------------------------------------------------------+
#008H T6  AB:0FF DB:--                                     |
#009H T7  AB:0FF DB:FF          MREQ                       | --> First try to access VRAM
#010H T8  AB:0FF DB:FF          MREQ    WR                 | Memory write to  0FF <- FF 
-----------------------------------------------------------+
#011H T9  AB:0FE DB:--                                     | 
#012H T10 AB:0FE DB:FF          MREQ                       |
#013H T11 AB:0FE DB:FF          MREQ    WR                 | Memory write to  0FE <- FF

Code: Select all

Repeated LDIR,  see how the 4-4-3-5-5 clocks are distributed:

#054H T1  AB:009 DB:--  M1                                 |
#055H T2  AB:009 DB:ED  M1      MREQ RD                    | Opcode read from 009 -> ED
#056H T3  AB:005 DB:--     RFSH                            |
#057H T4  AB:005 DB:--     RFSH MREQ                       | Refresh address  005
-----------------------------------------------------------+
#058H T1  AB:00A DB:--  M1                                 |
#059H T2  AB:00A DB:B0  M1      MREQ RD                    | Opcode read from 00A -> B0
#060H T3  AB:006 DB:--     RFSH                            |
#061H T4  AB:006 DB:--     RFSH MREQ                       | Refresh address  006
-----------------------------------------------------------+
#062H T5  AB:031 DB:--                                     |
#063H T6  AB:031 DB:00          MREQ                       | --> First try to access VRAM
#064H T7  AB:031 DB:00          MREQ RD                    | Memory read from 031 -> 00
-----------------------------------------------------------+
#065H T8  AB:041 DB:--                                     |
#066H T9  AB:041 DB:00          MREQ                       |
#067H T10 AB:041 DB:00          MREQ    WR                 | Memory write to  041 <- 00
#068H T11 AB:041 DB:00                                     |
#069H T12 AB:041 DB:00                                     |
-----------------------------------------------------------+
#070H T13 AB:041 DB:--                                     |
#071H T14 AB:041 DB:--                                     |
#072H T15 AB:041 DB:--                                     |
#073H T16 AB:041 DB:--                                     |
#074H T17 AB:041 DB:--                                     |
Sdw
Posts: 12
Joined: Wed Jul 08, 2020 10:27 am

Re: MZ-700 demos

Post by Sdw »

Running code from VRAM to me sounds like a very theoretical case, no sane person would put code in the 4kb where it runs slow, when you have 64kb where it will run fast, so I'd say that can be ignored for now!

Nice to see that you still haven't given up on this, while I agree with you that it's looking very hard to solve it properly given the current base code of the emulator.

I finally got around to looking around in the source a bit, nowhere near full understanding yet, but at least I think I have identified the interesting parts, and it seems to be right, because the code you posted above seem to be working with those exact rows of code!

Regarding the 82/83 cycle thing on my Sharp, I'm certain it's not a display issue, the flicker I start seeing at 83 cycles is exactly the "outside VRAM-access-window"-indication, same as on 84+ cycles.
hlide
Posts: 524
Joined: Thu Jan 25, 2018 9:31 pm

Re: MZ-700 demos

Post by hlide »

For me it doesn't flicker. It is stable with/without those alternate rows. I don't have "some misses, some hits". For your information, I'm getting the signals on the inner video connector which are pure TTL connected to my logical analyzer (100 Mhz and 50 MHz). But I reckon the edges of CPU clock appear mostly not aligned on the video sync and blank edges. I also got those TTL signals to feed a SCART - so without the RF box.

One thing, I put a CMOS 20MHz Z80 instead of the original NMOS Z80 on the socket. Not sure it may explain the difference here.
hlide
Posts: 524
Joined: Thu Jan 25, 2018 9:31 pm

Re: MZ-700 demos

Post by hlide »

I finally got a file called Timings.xlsx which described the M-cycles and T-states more precisely for instructions including the undocumented ones. But man! it will be very hard to take them into account without some big changes.
S_U_C
Posts: 38
Joined: Sun Feb 17, 2019 6:41 pm

Re: MZ-700 demos

Post by S_U_C »

Note: just to make things more interesting
The MZ-80K and MZ80A ROM and RAM opcode timings are the same.
But the MZ700 runs opcodes faster in RAM than ROM as the LSI chip controls the /WAIT ine of the CPU and it adds 1 extra T-cycle to the opcodes in ROM. This speeds up the tape read/write routines by about 7% when copied to RAM
Or should I say ROM code runs slower than RAM code.

Port access may also add an extra T-cycle to allow extra time for external equipment to make the data available /IORQ triggered ?
hlide
Posts: 524
Joined: Thu Jan 25, 2018 9:31 pm

Re: MZ-700 demos

Post by hlide »

@S_U_C

Any access to $0000-$0FFF through /CS0 signal (from LSI) is emulated by adding one /WAIT state in EmuZ-700 so it should be effective. I checked that when emulating my 512KB ROMDISK (now called IPL512) on EmuZ-700.

I'll check for I/O port access but it should already be handled as I believe IN/OUT must have this extra cycle mandatory in their IORead/IOWrite machine-cycles --- Yep, already counted in the total T-states of IN/OUT by default, so no need to tweak with /WAIT states in case of no specific extra /WAIT states by device.
hlide
Posts: 524
Joined: Thu Jan 25, 2018 9:31 pm

Re: MZ-700 demos

Post by hlide »

ok, here is what I think how we should compute the wait states for:

PUSH rr:

Code: Select all

T1  AB:000 DB:--  M1                                 |
T2  AB:000 DB:C5  M1      MREQ RD                    | Opcode read from 000 -> C5
T3  AB:000 DB:--     RFSH                            |
T4  AB:000 DB:--     RFSH MREQ                       | Refresh address  000
T5  AB:000 DB:--                                     |
-----------------------------------------------------+
T6  AB:0FF DB:--                                     |
T7  AB:0FF DB:FF          MREQ                       | --> First try to access VRAM
T8  AB:0FF DB:FF          MREQ    WR                 | Memory write to  0FF <- FF 
-----------------------------------------------------+
T9  AB:0FE DB:--                                     | 
T10 AB:0FE DB:FF          MREQ                       |
T11 AB:0FE DB:FF          MREQ    WR                 | Memory write to  0FE <- FF
Now, I had to have to compute BLANK_S + delta - 4 - get_passed_clock_since_vline() to get the necesseray wait states to emulate closely what I get with my genuine MZ-700. delta - 4 is the key for PUSH rr, delta is 11 so we get BLANK_S + 7 as the adjusted event limit. Why 7? we know PUSH rr is partially executed until it tries to write to VRAM and it happens to do so at T7 - the 7th T-state. Note that the T7 is in fact the T2 of a READ/WRITE machine-cycle - that's exactly where several /WAIT states may be inserted!

So what we need to know for any instruction which may read/write to VRAM, is at which T-state it may happen.

From the Timings file, the machine-cycles seems to be: [fMfetch][fMread][fMwrite][fIORead][fIOWrite][fSpecial]. The idea is to count t-states for each m-cycle. If you stumble against fMfetch or fMread or fMwrite in VRAM, we get the t-states T count before passing that M-cycle and we compute BLANK_S + T + 1 - get_passed_clock_since_vline()!

The most difficult part is the Z80 interpreter is not counting T-states per M-cycle and so it is a lot of work with potential errors.
hlide
Posts: 524
Joined: Thu Jan 25, 2018 9:31 pm

Re: MZ-700 demos

Post by hlide »

So I made a choice.

First, I decide to remove any reference of MZ-800 from the source: unlike MZ-1500 which still shares a lot of similiraty with MZ-700 in hardware - especially they have both the same LSI - MZ-800 is a complete different beast when not in mz-700 mode and the LSI is totally different.

Second, I added some partial T-states tables to count accumulated T-states by memory access steps (up to 6 for an instruction, including opcode fetchings). For instance, if I have a cc_op table for main instructions I have a mc_op table as follows:

Code: Select all

static const uint8_t cc_op[0x100] = {
	 4,10, 7, 6, 4, 4, 7, 4, 4,11, 7, 6, 4, 4, 7, 4,
	 8,10, 7, 6, 4, 4, 7, 4,12,11, 7, 6, 4, 4, 7, 4,
	 7,10,16, 6, 4, 4, 7, 4, 7,11,16, 6, 4, 4, 7, 4,
	 7,10,13, 6,11,11,10, 4, 7,11,13, 6, 4, 4, 7, 4,
	 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
	 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
	 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
	 7, 7, 7, 7, 7, 7, 4, 7, 4, 4, 4, 4, 4, 4, 7, 4,
	 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
	 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
	 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
	 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
	 5,10,10,10,10,11, 7,11, 5,10,10, 0,10,17, 7,11,
	 5,10,10,11,10,11, 7,11, 5, 4,10,11,10, 0, 7,11,
	 5,10,10,19,10,11, 7,11, 5, 4,10, 4,10, 0, 7,11,
	 5,10,10, 4,10,11, 7,11, 5, 6,10, 4,10, 0, 7,11
};

// Memory cycles: [fMfetch][fMread/write]*
static const uint8_t mc_op[0x100][6] = {
	//   x0       x1             x2          x3             x4       x5       x6             x7       x8       x9             xA    xB             xC             xD    xE             xF
	{4     },{4,7,10},{4,7         },{6        },{4           },{4     },{4,7   },{4           },{4     },{4     },{4,7         },{6  },{4           },{4           },{4,7},{4           }, // 0x
	{5,8   },{4,7,10},{4,7         },{6        },{4           },{4     },{4,7   },{4           },{4,7   },{4     },{4,7         },{6  },{4           },{4           },{4,7},{4           }, // 1x
	{4,7   },{4,7,10},{4,7,10,13,16},{6        },{4           },{4     },{4,7   },{4           },{4,7   },{4     },{4,7,10,13,16},{6  },{4           },{4           },{4,7},{4           }, // 2x
	{4     },{4,7,10},{4,7,10,13   },{6        },{4,8,11      },{4,8,11},{4,7,10},{4           },{4,7   },{4     },{4,7,10,13   },{6  },{4           },{4           },{4,7},{4           }, // 3x
	{4     },{4     },{4           },{4        },{4           },{4     },{4,7   },{4           },{4     },{4     },{4           },{4  },{4           },{4           },{4,7},{4           }, // 4x
	{4     },{4     },{4           },{4        },{4           },{4     },{4,7   },{4           },{4     },{4     },{4           },{4  },{4           },{4           },{4,7},{4           }, // 5x
	{4     },{4     },{4           },{4        },{4           },{4     },{4,7   },{4           },{4     },{4     },{4           },{4  },{4           },{4           },{4,7},{4           }, // 6x
	{4,7   },{4,7   },{4,7         },{4,7      },{4,7         },{4,7   },{4     },{4,7         },{4     },{4     },{4           },{4  },{4           },{4           },{4,7},{4           }, // 7x
	{4     },{4     },{4           },{4        },{4           },{4     },{4,7   },{4           },{4     },{4     },{4           },{4  },{4           },{4           },{4,7},{4           }, // 8x
	{4     },{4     },{4           },{4        },{4           },{4     },{4,7   },{4           },{4     },{4     },{4           },{4  },{4           },{4           },{4,7},{4           }, // 9x
	{4     },{4     },{4           },{4        },{4           },{4     },{4,7   },{4           },{4     },{4     },{4           },{4  },{4           },{4           },{4,7},{4           }, // Ax
	{4     },{4     },{4           },{4        },{4           },{4     },{4,7   },{4           },{4     },{4     },{4           },{4  },{4           },{4           },{4,7},{4           }, // Bx
	{5,8,11},{4,7,10},{4,7,10      },{4,7,11   },{4,7,11,14,17},{5,8,11},{4,7   },{5,8,11,14,17},{5,8,11},{4,7,10},{4,7,10      },{4  },{4,7,11,14,17},{4,7,11,14,17},{4,7},{5,8,11,14,17}, // Cx
	{5,8,11},{4,7,10},{4,7,10      },{4,7      },{4,7,11,14,17},{5,8,11},{4,7   },{5,8,11,14,17},{5,8,11},{4     },{4,7,10      },{4,7},{4,7,11,14,17},{4           },{4,7},{5,8,11,14,17}, // Dx
	{5,8,11},{4,7,10},{4,7,10      },{4,7,11,14},{4,7,11,14,17},{5,8,11},{4,7   },{5,8,11,14,17},{5,8,11},{4     },{4,7,10      },{4  },{4,7,11,14,17},{4           },{4,7},{5,8,11,14,17}, // Ex
	{5,8,11},{4,7,10},{4,7,10      },{4        },{4,7,11,14,17},{5,8,11},{4,7   },{5,8,11,14,17},{5,8,11},{6     },{4,7,10      },{4  },{4,7,11,14,17},{4           },{4,7},{5,8,11,14,17}  // Fx
};
Consider "<----" marks for the additions.

Code: Select all

int Z80::get_tstates() <---- new method for getting the necessary information to compute VRAM wait states
{
	return (mc_index < 0) ? int() : int((*mc_tstates)[mc_index]);
}
inline uint8_t Z80::FETCHOP()
{
	unsigned pctmp = PCD;
	PC++;
	R++;
	mc_tstates = 0; <---- no accumulated T-states yet
	mc_index = -1; <---- no accumulated T-states yet

	// consider m1 cycle wait
	UPDATE_EXTRA_EVENT(1);
	int wait;
	uint8_t val = d_mem->fetch_op(pctmp, &wait); <---- one memory step, T-states = 0!
	icount -= wait;
	UPDATE_EXTRA_EVENT(3 + wait);
	++mc_index; <---- that we have an opcode, set the index on its accumulated T-states
	return val;
}
void Z80::OP(uint8_t code)
{
	prevpc = PC - 1;
	icount -= cc_op[code];
	mc_tstates = mc_op + code; // <---- set our accumulated T-states table here !
	...
}
inline void Z80::WM8(uint32_t addr, uint8_t val)
{
	UPDATE_EXTRA_EVENT(1);
#ifdef Z80_MEMORY_WAIT <---- always true for MZ-700
	int wait;
	d_mem->write_data8w(addr, val, &wait); <---- here MZ-700 memory will intercept VRAM access!
	icount -= wait;
	UPDATE_EXTRA_EVENT(2 + wait);
#else
	d_mem->write_data8(addr, val);
	UPDATE_EXTRA_EVENT(2);
#endif
	++mc_index; <---- next memory step!
}
void MEMORY::write_data8w(uint32_t addr, uint32_t data, int* wait)
{
	*wait = 0;
...
	} else if(mem_bank & MEM_BANK_MON_H && (0xd000 <= addr && addr <= 0xdfff)) {
		if(!blank_vram) {
			int delta = d_cpu->get_tstates() + 2; <---- T-states of previous M-cycle + T2
			*wait = BLANK_S + delta - get_passed_clock_since_vline(); <---- adjust the wait states according to the instruction cycles and BLANK signal
			blank_vram = true;
		}
	}
	write_data8(addr, data);
}
And so on. I still need to finish the mc tables for the hellish xy and ixcb (IX/IY relative opcodes).

The VRAM interception with the "right" t-states is now handled with such cases as:
- any rank of opcode fetching in VRAM in the isntruction,
- any rank of memory reading/writing in VRAM in the instruction.

So far I am able to reproduce the same result as my genuine MZ-700 with `wt` and `emutest` (~16ms).

I don't think it is perfect because the emulator will read/write BEFORE waiting the necessary cycles while the genuine MZ-700 will do so AFTERWARD.

Note: The case of an instruction starting with BLANK=1 then ending with BLANK=0 may not be handled fine here.
Sdw
Posts: 12
Joined: Wed Jul 08, 2020 10:27 am

Re: MZ-700 demos

Post by Sdw »

Looks like you are making some great progress, and even if it might not be cycle perfect, it will be a big step up in correctness compared to current emulators!
Post Reply