Project

General

Profile

Actions

Feature #85

open

720p support/1080i support

Added by foft almost 5 years ago. Updated almost 4 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
-
Start date:
12/20/2019
Due date:
% Done:

0%

Estimated time:

Description

Implement 720p60 and 720p50 support, with suitable scaling


Files

723978_FULLTEXT01.pdf (8.72 MB) 723978_FULLTEXT01.pdf Scaling algo paper foft, 12/20/2019 09:50 PM
problem_not_shared.bmp (1.53 MB) problem_not_shared.bmp foft, 03/01/2020 10:30 PM
problem.bmp (2.12 MB) problem.bmp foft, 03/01/2020 10:30 PM
Actions #1

Updated by foft almost 5 years ago

  • Status changed from New to In Progress

HDMI syncs with 74.25MHz pixel clock. Cyclone V seems surprisingly happy with 742MHz gpio outputs via DDR registers.

Actions #2

Updated by foft almost 5 years ago

Experimented with filter based scaling. Going with an area based scaler for now instead.

Actions #3

Updated by foft almost 5 years ago

Great paper on scaling algos

Actions #4

Updated by foft almost 5 years ago

I wrote some octave scripts to try these out. Plugging mister filters in for one set and for the other set doing area scaling with a bunch of different optimization options.

Actions #5

Updated by foft almost 5 years ago

All the methods seem to need a bunch of multipliers, but fortunately we have some on Cyclone V. 25 DSP blocks on A2 and 66 DSP blocks on A4.
Each DSP block can be 3 9x9, 2 18x18 or 1 27x27 multiplier.
Only using a few already so they are mostly free.

Actions #6

Updated by foft almost 5 years ago

I guess 1080i is also possible.
Want to note that for 720p60 I need to adjust the frame rate again. Was using 59.94 for the 27MHz option and here its 60.

I wonder if I can get the fractional pll to give an accurate enough 27MHz and 74.25MHz at the same time?

Actions #7

Updated by foft almost 5 years ago

I implemented winscale and am debugging it!

It seems to work except I have some vertical stripes. I think this is because its actually down-scaling for the x axis (4x 1/2 colour clock width to 1280). While the attached paper says I can downscale up to 50% I think this is incorrect! Since the target pixel will overlap 3 source pixels even if the target pixel is even slightly wider than the source pixel.

Actions #8

Updated by foft almost 5 years ago

  • Subject changed from 720p support to 720p support/1080i support
Actions #9

Updated by foft almost 5 years ago

The scaler is looking very nice now, a few border/locations bugs to fix.

Looking at the plls, need to have: 27MHz, 135MHz, 74.25MHz and 371.25MHz at the same time. This looks to be possible by feeding 54MHz from the USB pll to the HDMI pll.

It looks like 74.25 will also work for 1080i, so may as well enable that too.
Speaking of other modes, with these two clocks can also support:
1440x576i@50 (also 288 option, but not quite 50Hz)
(also 240 option, but not quite 59.94Hz)

These are probably the best fit of all, for 4x colour clock modes :-)
For 720p, 480p, 576p only sampling every other pixel at the moment, since can't downscale...
For 1080i and 1440x its either upscaling or exact.
Not that there is any software for 2x or 4x anyway!

Actions #10

Updated by foft almost 5 years ago

I put in the modelines for 1080i. I need to add interlace support (different vtotal, mid line vsync etc) to the hcnt/vcnt logic. Probably will rewrite it since its not in my style at the moment (_next, _reg, integer based etc).

Actions #11

Updated by foft almost 5 years ago

Plumbed in a 4x4 block of pixels so I can try out polyphasic filters too.

For now its just using a 2x2 block with the area based scale, but got some vertical bug. Stripey!

Using 16KB of block ram now, which is a bit wasteful. 360*4 pixels/line, 8 lines = 11.25KB. However, addressing is tricker. So using 2048 bytes/line.

Actions #12

Updated by foft almost 5 years ago

Forgot to mention 1080i working fine as well as 720p.

I was looking at the 3d modes too, but not sure what to use those for ;-)

Actions #13

Updated by foft almost 5 years ago

I have a polyphasic version working too, I think. So far I only gave it nearest neighbour coefficients. Rebuilding with Lanczos.

Actions #14

Updated by foft almost 5 years ago

A weeks debugging later, this filter actually really works!

Actions #15

Updated by foft almost 5 years ago

Adding the i2c wiring to allow these to be controlled from the firmware. So far areascale wired up and working. Now doing crtc, then will do polyphasic. I guess its possible to include both on A4 FPGAs and only one of them on A2 FPGAs.

On that subject I really should try to reduce the resource usage in general, I think some DSP blocks are not shared that should be. That can wait though until its all plumbed in.

Actions #16

Updated by foft almost 5 years ago

Cleaned up the i2c wiring to be more generic. crtc wiring working too. Can switch from 720p50 and 1080i50 in firmware now (manual code, not menu yet...). Just wired up polyphasic too, except filter params which are currently hardcoded lanczos. Will come back to that.

Next up... clock switching and better plumbing into firmware. So can select 480p/576p,720p and 1080i from a nice menu.

Actions #17

Updated by foft almost 5 years ago

Clock switching is proving fun, due to a bunch of constraints!

1st board: 50MHz on H16 (CLK11p) - which is connected to FPLL X0_Y38,X54_Y38, but not the other two
2nd board: : 50MHz on H16 (CLK11p) - which is connected to FPLL X0_Y38,X54_Y38, but not the other two AND N16 (clk6p) /H13 (clk10p) to clkgen chip. clk10 is connected to the same plls as clk11, so does not add much (except another input freq). clk6p is just to a 3rd pll X54_y1. None of them go directly to x0_y1.
mini board: 50MHz on H16 (CLK11p) - which is connected to FPLL X0_Y38,X54_Y38, but not the other two

Anyway its possible to route a pll output to the other 2 plls via the global clock network. Though running into some clock network issues, I'm using lots of clkctrl blocks already.

Actions #18

Updated by foft almost 5 years ago

It seems pretty clear that on the 2nd board can use the reconfigurable clock to provide 27MHz and 74.25MHz hdmi clocks, by reconfiguring it. These can drive clk6p into the 3rd pll, which does x5 for the tmds clk (/2).

For the others... I've not come up with the magic source yet but I'm hopeful!

Actions #19

Updated by foft almost 5 years ago

The dev of another hdmi library posted it on hacker news. Worth a look.
https://github.com/hdl-util/hdmi

Actions #20

Updated by foft almost 5 years ago

I have all these modes working properly from the firmware.
NTSC:480i/480p/720p/1080i
PAL:576i/576p/720p/1080i
All in 4:3
480p/576p and 720p all skip a pixel, so 4x gr.0 or gr.8 isn't great.
480i and 1080i do not! Yes, 480i is better than 480p, because its 1440x480i and its 720x480p.

Next up I want to add scalar selection (since we have polyphasic and area) and start to store these in the settings or flash somewhere. All the crtc and scalar settings take up block that is best not wasted.

Actions #21

Updated by foft almost 5 years ago

Oh and the clkgen chip can drive the video too, for those who want to try custom modes. I've not tried it yet since its statically set up at 30MHz iirc, but should give it a spin.

Actions #22

Updated by foft almost 5 years ago

I have the crtc/scaler settings now in the flash chip (except scaler filter).
Upside: doesn't waste space in firmware
Downside: people will need to flash with usb blaster, rpd method will brick the video.

Actions #23

Updated by foft over 4 years ago

Got this merged down to svn and also v1 and v3 building.

Was hanging on purely internal i2c. Changed master and slave (+glue) to use in/wen instead of inout and working better. Slightly corrupt output on v1. I think this might be due to extra mux on the highest frequency part, will rework that... This is only present on v1 due to pin sharing between hdmi and vga so won't impact the mini.

Actions #24

Updated by foft over 4 years ago

On v1 its all working, with 32K block ram as system ram. Once I put it to 64K block ram as system ram (which JUST fits) I never get to basic. After some playing with the logic analyzer (tricky, no block ram!) I can see that at C4F1 the CPU reads D0 (BNE), twice. With 64KB ram the stack pointer increments (wrong), with 32KB it doesn't (correct!).

Actions #25

Updated by foft over 4 years ago

I guess related to the opcinfo storage somehow. with 32k get this:
atari800core:atari800|cpu:cpu6502|cpu_65xx:cpu_6502_peter|altsyncram:Mux54_rtl_0|altsyncram_ag91:auto_generated|ALTSYNCRAM AUTO ROM Single Clock 256 1 -- -- yes no -- -- 256 256 1 -- -- 256 1 0 atari800core_eclaireXLv1.atari800core_eclaireXL0.rtl.mif M10K_X38_Y11_N0 Don't care New data New data Off No No - Unsupported Depth 1

Actions #26

Updated by foft over 4 years ago

with 64k
atari800core:atari800|cpu:cpu6502|cpu_65xx:cpu_6502_peter|altsyncram:Mux54_rtl_0|altsyncram_ag91:auto_generated|ALTSYNCRAM AUTO ROM Single Clock 256 1 -- -- yes no -- -- 256 256 1 -- -- 256 1 0 atari800core_eclaireXLv1.atari800core_eclaireXL0.rtl.mif M10K_X38_Y13_N0 Don't care New data New data Off No No - Unsupported Depth 1

Actions #27

Updated by foft over 4 years ago

M10K is shared:
atari800core:atari800|cpu:cpu6502|cpu_65xx:cpu_6502_peter|altsyncram:Mux54_rtl_0|altsyncram_ag91:auto_generated|ALTSYNCRAM AUTO ROM Single Clock 256 1 -- -- yes no -- -- 256 256 1 -- -- 256 1 0 atari800core_eclaireXLv1.atari800core_eclaireXL0.rtl.mif M10K_X38_Y13_N0 Don't care New data New data Off No No - Unsupported Depth 1
zpu_rom:zpu_rom1|altsyncram:altsyncram_component|altsyncram_od24:auto_generated|ALTSYNCRAM AUTO ROM Single Clock 10240 32 -- -- yes no -- -- 327680 10240 32 -- -- 327680 39 0 zpu_rom.mif M10K_X22_Y14_N0, M10K_X22_Y31_N0, M10K_X22_Y33_N0, M10K_X22_Y18_N0, M10K_X22_Y12_N0, M10K_X30_Y29_N0, M10K_X30_Y26_N0, M10K_X22_Y29_N0, M10K_X22_Y20_N0, M10K_X11_Y12_N0, M10K_X30_Y25_N0, M10K_X22_Y30_N0, M10K_X11_Y19_N0, M10K_X11_Y25_N0, M10K_X3_Y18_N0, M10K_X11_Y20_N0, M10K_X22_Y26_N0, M10K_X3_Y24_N0, M10K_X11_Y14_N0, M10K_X22_Y16_N0, M10K_X3_Y19_N0, M10K_X3_Y22_N0, M10K_X11_Y21_N0, M10K_X11_Y18_N0, M10K_X11_Y15_N0, M10K_X22_Y17_N0, M10K_X22_Y15_N0, M10K_X22_Y19_N0, M10K_X11_Y13_N0, M10K_X22_Y13_N0, M10K_X30_Y31_N0, M10K_X11_Y16_N0, M10K_X22_Y32_N0, M10K_X22_Y28_N0, M10K_X30_Y30_N0, M10K_X3_Y23_N0, M10K_X3_Y20_N0, M10K_X3_Y15_N0, M10K_X38_Y13_N0 Don't care New data New data Off No No - Address Too Wide 2

Actions #28

Updated by foft over 4 years ago

with 32k, the block is not shared. I guess something is accessing memory where it shouldn't...

Actions #29

Updated by foft over 4 years ago

Its opcodeInfoTable that is getting put into altsyncram. Not sure why its invalid yet. I'm trying the ramstyle attribute but didn't find the right voodoo yet...

Actions #30

Updated by foft over 4 years ago

Problem is with E8:
002020100112
bit 20 == high = opcStackUp.

This isn't what is in the constant table, so not clear why quartus is returning that.

Combined with 32KB, so no overlap I see:
002020000112

Where does that bit error come from?

Actions #31

Updated by foft over 4 years ago

Tried to upgrade from Quartus 18.0 to 19.1. Same issue.

Updated by foft over 4 years ago

Checked on the ram block input/output when built with 32KB and 64KB.
Same input, same mif, different output...
Check the output of ram_block_1a0 vs the input (myNextOpCode[1-7] and nextOpcInfo42)
With block shared:

With block by itself:

Actions #33

Updated by foft almost 4 years ago

There is a new version of Quartus now, I'll try it to see if Intel fixed this bug yet.

Actions #34

Updated by foft almost 4 years ago

Quartus 20.1.1 still has the bug. Raised with Intel.

Actions

Also available in: Atom PDF