LEDE Project

  • Status Researching
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity High
  • Priority Very Low
  • Reported Version All
  • Due in Version Undecided
  • Due Date Undecided
  • Votes 4
  • Private
Attached to Project: LEDE Project
Opened by pmgp - 25.07.2017
Last edited by Baptiste Jonglez - 26.07.2017

FS#929 - mt7620 abysmal wifi performance

HT40 wifi throughput went from ~80 Mbps(LEDE 17.01.0) to ~3Mpbs (since LEDE 17.01.2 to trunk, 17.01.1 wouldn’t build).
HT20 decreased from ~40 to ~10 Mbps.

device is archer C20i (mt7620a) on 2.4 GHz band, 5 Ghz is unsupported (mt7610E).

Log errors:

[    9.197616] ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 6352, rev 0500 detected
[    9.205560] ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 7620 detected
[  304.206295] ieee80211 phy0: rt2x00queue_flush_queue: Warning - Queue 2 failed to flush
[  304.278242] ieee80211 phy0: rt2x00queue_flush_queue: Warning - Queue 2 failed to flush
[  306.059378] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2
psyborg55 commented on 26.07.2017 11:46
pmgp commented on 26.07.2017 20:10

That's not it.
The patch
020-21-rt2800-fix-LNA-gain-assignment-for-MT7620.patch
is already in place and the problem persists (trunk).

psyborg55 commented on 26.07.2017 21:05

why do i even care about that

Baptiste Jonglez commented on 26.07.2017 22:07

pmgp: actually, this patch has been introduced between 17.01.1 and 17.01.2, so it's possibly the source of the issue rather than the fix.
The commit in question for lede-17.01 https://git.lede-project.org/?p=source.git;a=commit;h=820a39687db3b14c3264ee37548e2cac3f911bca

Could you test reverting these commits in the lede-17.01 branch (one at a time) and see which one introduces the issue?
64fa4ead3247f50f39d2f5c1a48d38df5bc3cba0
eb11207397fe39ab37407ceeafb94b340d05a9e9
820a39687db3b14c3264ee37548e2cac3f911bca

psyborg55: please be respectful towards others, you're not being helpful.

pmgp commented on 27.07.2017 18:33

Confirmed that on 17.01.0 and 17.01.1 there is no performance hit.

Reverting the patches on trunk had no effect on restoring performance. (or maybe I did it wrong? the files were indeed gone from the patches folder).
The channel number has an effect on HT40 behavior (1=slow,3=stalled).

Meanwhile, these are the unique "rt2" patches on 17.01.02 that are not on 17.01.1:

020-02-rt2x00usb-do-not-anchor-rx-and-tx-urb-s.patch
020-03-rt2x00usb-fix-anchor-initialization.patch
020-05-rt2x00-call-entry-directly-in-rt2x00_dump_frame.patch
020-06-rt2x00-remove-queue_entry-from-skbdesc.patch
020-07-rt2500usb-don-t-mark-register-accesses-as-inline.patch
020-08-rt2x00-rt2800lib-move-rt2800_drv_data-declaration-in.patch
020-09-rt2800-identify-station-based-on-status-WCID.patch
020-10-rt2x00-separte-filling-tx-status-from-rt2x00lib_txdo.patch
020-11-rt2x00-separte-clearing-entry-from-rt2x00lib_txdone.patch
020-12-rt2x00-add-txdone-nomatch-function.patch
020-13-rt2x00-fixup-fill_tx_status-for-nomatch-case.patch
020-14-rt2x00-use-txdone_nomatch-on-rt2800usb.patch
020-15-rt2800-status-based-rate-flags-for-nomatch-case.patch
020-16-rt2800-use-TXOP_BACKOFF-for-probe-frames.patch
020-17-rt2x00-fix-rt2x00debug_dump_frame-comment.patch
020-18-rt2x00-fix-TX_PWR_CFG_4-register-definition.patch

021-01-rt2800-fix-LNA-gain-assignment-for-MT7620.patch
021-02-rt2800-do-VCO-calibration-after-programming-ALC.patch
021-03-rt2800-fix-mt7620-vco-calibration-registers.patch
021-04-rt2800-fix-mt7620-E2-channel-registers.patch

I'll try to ditch them all on 17.01.2 and see how it goes.

UPDATE: Again, removing the last 4 didn't improve things, must be somewhere else.

UPDATE2:

17.01.1's
621-rt2x00-add-support-for-mt7620.patch
has become 17.01.2's
020-19-rt2x00-add-support-for-MT7620.patch
which makes removing the rest of the 020s not possible unless I remove it too and brick something.

My blunt approach is at an end here, this must be debugged by someone who understands the driver.

psyborg55 commented on 29.07.2017 00:49

removed

pmgp commented on 29.07.2017 13:45

@psyborg55

tested just now

17.01.1: ~82/~78 Mbps dl/ul
your image: ~56 /~52 Mbps dl/ul

no errors on log and changing channels doesn't mess up like trunk
still, performance is lower

changed only the wireless configuration on your image:

config wifi-device 'radio0'
        option type 'mac80211'
        option hwmode '11g'
        option path 'platform/10180000.wmac'
        option disabled '0'
        option channel '1'
        option htmode 'HT40'
        option txpower '20'
        option country 'US'
        option noscan '1'

config wifi-iface 'default_radio0'
        option device 'radio0'
        option network 'lan'
        option mode 'ap'
        option ssid '<redacted>'
        option hidden '1'
        option encryption 'psk2+ccmp'
        option key '<redacted>'
psyborg55 commented on 29.07.2017 16:12

use same settings when doing test, especially channel.

72/85 dl/ul in my case after reverting his mess-up

pmgp commented on 29.07.2017 20:46

I used the same configuration on both.

I got the same speeds from the unpatched 17.01.2 that got for your image. (~55 Mbps).
Remains to be seen where is the difference that allows 17.01.0/1 to reach 80 Mbps.

I read that the mt7620 without heatsink has a thermal throttling issue with high wifi load. I'm looking into that.

psyborg55 commented on 31.07.2017 08:27

removed

pmgp commented on 31.07.2017 16:51

Your second image reached ~55 Mbps and remained stable.
17.01.1 got to ~70 today but becomes unstable with multiple clients. Besides the speed, those branches are not a working solution.
Trunk now reaches 50 Mbps but sometimes stalls with multiple clients.

Before testing I added a heatsink to the router's SoC, it gets quite toasty at load.

psyborg55 commented on 31.07.2017 17:55

trunk is not expected to have stability. performance is usually lower. my second image is trunk from about10 days ago with minor change nothing special.

on chaos calmer speed is over 100Mbps

pmgp commented on 31.07.2017 21:22

Since you mentioned it, I tried the CC image and got ~75 Mbps, stable with multiple clients.

What a shame the best is in the past. It's not the first time that this LEDE/OpenWrt fork messes up radios.

pmgp commented on 08.08.2017 12:58

At this point, trunk won't go past ~50 Mbps @HT40 and the radio will stall if more than one device connects to the AP (but still show on scans and allows to connect).
Changing channel requires a device restart. Issuing "wifi" just makes it freeze as above.
Ocasional "queue full" errors show up on log, as in the original report.

psyborg55 commented on 09.08.2017 09:34

maybe you could read dangowrt 's commit message before complaining about channel change, ha?

https://git.lede-project.org/?p=source.git;a=commit;h=9eacb9d7fc0b4c921f8d2ec91a51f10d8c3ae12f

"This makes the channel switching logic already look a bit more like
what we are used to in rt2x00.."

Namidairo commented on 11.08.2017 14:41

There was a bug introduced when LEDE pulled down the updated MT7260 patch from upstream mac80211 which might be causing this. Take a look at the patch linked from dangowrt's staging branch. (RT6352 was getting the wrong values set because of duplicated else if)

https://git.lede-project.org/?p=lede/dangole/staging.git;a=blob;f=package/kernel/mac80211/patches/021-rt2x00-apply-correct-TX_SW_CFG-values-for-MT7620.patch

psyborg55 commented on 11.08.2017 15:43

that patch as well as others regarding rt6352 are irrelevant until this trac issue is closed: https://dev.openwrt.org/ticket/22086

Amir Sabbaghi commented on 16.10.2017 11:59

I have found the commit that causes the drop in speed: https://github.com/lede-project/source/commit/4314646ac6343af3c9e813e6505b5ef072d2a714 It's previous commit has more than 60Mbps throughput but after this commit drops to less than 20 Mbps.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing