§

Liƙa HTML

§

Mai ƙanƙanta

§

Kallon Farko

Preview yana tsarkakewa ta DOMPurify — ana cire tags na rubutun, masu sarrafa taron, da javascript: URIs kafin nuna.

Markdown yana zama harshen haɗin gwiwa na takardun bayanai na ƙware a Najeriya da Afirka. Kowane README na GitHub, amsa na Stack Overflow, da sharhi na Discord yana wucewa ta parser na Markdown. Masu rubuta fasaha, masu kula da buɗaɗɗen tushe, da injiniyoyin dandali duk suna tafiya zuwa gida tsakanin Markdown da HTML a aikin yau da kullum. Wannan kayan aiki yana goyan bayan GitHub Flavored Markdown gaba ɗaya a alkiblu biyu da kiyaye duk wani bayani a cikin burauzarka.

Menene canzawar HTML zuwa Markdown?

HTML shine alama da aka nuna wanda burauzar ke nuna — <h1>, <ul>, <table>, <a>, da sauransu. Markdown tsari ne mai nauyi kaɗan na rubutu mai sauƙi wanda ke amfani da kaɗan alamar foda (# don taken, * don muhimmanci, - don abubuwan jerin) don lambar tsarin guda. GitHub Flavored Markdown (GFM) yana faɗaɗa takardan CommonMark na asali da tebur, soke-soke, jerin ayyuka, da autolinks. Canzawa HTML zuwa Markdown yana jan HTML da aka nuna ko aka fitar da shi baya zuwa Markdown na rubutu mai sauƙi da za a iya gyarawa — daidai abin da ake buƙata don ƙaurar abun ciki daga CMS, ko kuma tsaftace README.

Waɗane fasali na GitHub Flavored Markdown ne fitarwa ke goyan bayan su?

Injin ɗin turndown@7.2.0 da aka haɗa tare da faɗaɗawa ta turndown-plugin-gfm yana fitar da cikakken rubutun GFM daga HTML ɗinka: taken salon ATX # zuwa ###### daga <h1><h6>, jerin da aka oda da mara oda tare da haɗawa, mai ƙarfi **rubutu** da italic *rubutu*, soke-soke ~~rubutu~~ daga <del>, hanyoyi cikin layi [rubutu](url) da hotuna cikin layi ![alt](src), ɓangarorin lambar da aka keɓe tare da nuni na harshe (```js), ɓangaren lambar cikin layi `lambar`, tebur ɗin ƙwallo daga <table>, jerin ayyuka na GFM - [ ] / - [x] daga shigarwar checkbox, ɗabi'u >, da ƙa'idodin kwance ---. Allo na preview yana sake-nuna Markdown da aka fitar ta marked@12.0.2 don ka tabbatar da canzawa ta hanyar gani.

Yadda canzawar HTML zuwa Markdown ke aiki?

Kowane canzawa yana gudana a gida a cikin burauzarka ta amfani da ɗakunan karatu uku da aka haɗa — babu CDN, babu fetch, babu telemetry. Matakai mafi girma sune:

  1. Tsarkakewa: HTML da aka liƙa da farko yana wucewa ta DOMPurify.sanitize(html, { USE_PROFILES: { html: true } }) don cire abubuwan <script>, kowane mai sarrafa taron na on*, da URI na javascript: kafin komai ya tafi cikin markup. DOMPurify shine mai tsarkake XSS guda ɗaya da Mozilla MDN, Atlassian, da Microsoft 365 ke amfani da shi.
  2. Canzawa: turndownService.turndown yana tafiya cikin DOM mai tsabta kuma yana fitar da GitHub Flavored Markdown — taken suna zama masu gabatarwa na #, jerin suna zama abubuwa na - / 1., <table> yana zama tebur ɗin ƙwallo, da sauransu. Markdown ana rubuta shi zuwa <textarea> ɗin fitarwa na karatu-kawai ta value (ba ta innerHTML), don haka yana da aminci na asali.
  3. Nuna: Markdown da aka fitar ana sake-bincike shi ta marked.parse, ana tsarkake shi tare ta DOMPurify, kuma ana sanya shi zuwa innerHTML ɗin allo na preview don samu tabbacin gani cewa Markdown round-trip zuwa tsarin da ake tsammani. Yanayin live yana jinkiri shigarwa da 150 ms don fitarwa ya sabunta yayin da kake liƙa ba tare da ɗaukar parser ba.

Me ya sa canza HTML zuwa Markdown da wannan kayan aiki?

  • Sirri: kowane tsarkakewa, canzawa, da nuna wucewar yana gudana a cikin burauzarka. HTML — ciki har da machapisho na CMS da aka fitar, takardun bayanai na cikin gida, da chanzo na shafi mai sirranci — ba su taɓa maɓallinmu ba.
  • Aminci daga XSS ta asali: HTML da aka liƙa yana wucewa ta DOMPurify kafin turndown ya tafi cikinsa, kuma allo na preview yana sake-gudanar da HTML da aka sake-nuna ta DOMPurify tare kafin innerHTML, don haka liƙa markup da ke ƙunshe da tags na <script> ko masu sarrafa onerror= yana samar da preview mara aiki da Markdown mai tsabta.
  • GFM-cikakke: abubuwa na <table> suna canzawa zuwa tebur ɗin ƙwallo na Markdown, <del> zuwa soke-soke, da jerin checkbox zuwa jerin ayyuka na GFM. Yawancin masu canza na intanet suna sauka tebur a hanyar HTML → Markdown — faɗaɗawa ta turndown-plugin-gfm da aka haɗa tana adana su.

Menene ayyukan gari na canzawar HTML zuwa Markdown?

Canzawa HTML zuwa Markdown yana bayyana a ƙaurar abun ciki, takardun bayanai, da ajiyawa:

  • Ƙaurar CMS: fitar da rubutun WordPress ko Ghost a matsayin HTML kuma canza zuwa Markdown don sake gina shafin Hugo / Jekyll / 11ty / Astro mai tsauri. Canzawa tana adana matakin taken, hanyoyi, jerin, da muhimmanci cikin layi.
  • Tsaftace README: liƙa HTML da aka nuna na shafi na wiki ko labari na yanar gizo kuma ja shi baya zuwa Markdown mai gyarawa don README na aikin ko shafin dokumantaci — maimakon sake rubuta tsari da hannu.
  • Ajiya da bayanan kula: kama imel na HTML ko web clipping kuma canza zuwa Markdown don ajiyawa a Obsidian, Notion, ko msingi na ilimi na rubutu mai sauƙi — Markdown yana zama mai dacewa don diff kuma yana rayuwa ta canjin tsari.

Yaya misali na canzawar HTML zuwa Markdown ke kama?

Liƙa <h2>Heading</h2><ul><li>a</li><li>b<ul><li>nested</li></ul></li></ul> yana samar da Markdown tare da ## Heading, jerin ƙwallo mai haɗawa, da allo na preview wanda ke sake-nuna zuwa tsarin da aka haɗa guda ɗaya. Liƙa <table> mai layin taken da layukan data biyu yana samar da tebur ɗin ƙwallo mai daidaici na | col | col | — tabbatar da canzawa tana adana taken, jerin, da tebur.

Shin wannan mai canza HTML zuwa Markdown yana gudana gaba ɗaya a cikin burauzarki?

Eh. Kowane tsarkakewa, canzawa, da nuna wucewar yana gudana a gida a matsayin JavaScript a cikin taba burauzarka. Ɗakunan karatu uku da aka haɗa — turndown@7.2.0 (tare da turndown-plugin-gfm@1.0.2), marked@12.0.2, da DOMPurify@3.1.7 — suna aika daga asalin guda ɗaya kamar shafi, don babu dogara ga CDN, babu fetch, babu XMLHttpRequest, babu navigator.sendBeacon akan shigarwa. Kayan aikin kuma yana aiki ba tare da intanet ba bayan an loda shafi. Machapisho da aka fitar, takardun bayanai na cikin gida, da chanzo na shafi mai sirranci suna zama a na'urar ka.

Shin allo na preview da aka nuna yana aminci daga XSS?

Eh. Kowane kirtanin HTML da aka sanya zuwa innerHTML yana wucewa ta DOMPurify.sanitize(html, { USE_PROFILES: { html: true } }) da farko. DOMPurify shine mai tsarkake XSS na buɗaɗɗen tushe da Cure53 ke kula da shi; shine ɗakin karatu guda ɗaya da Mozilla MDN, Atlassian, da Microsoft 365 ke amfani da shi don ƙarfafa HTML da mai amfani ya samar. Tsarin asali na html yana cire abubuwan <script>, kowane sifit mai sarrafa taron na on* (onerror, onclick, da sauransu), tsarin URI na javascript: a cikin href / src. Liƙa <img src=x onerror=alert(1)> yana samar da preview inda document.querySelector('#output-preview img[onerror]') yana dawo da null kuma babu faɗakarwa da ke ƙuna.

Shin ana canza tebur na GFM daga HTML?

Eh. Alkibla ta HTML → Markdown tana amfani da turndown-plugin-gfm, wanda ke ƙara ƙa'idar turndown na al'ada wanda ke tafiya cikin nodes na <table> kuma yana fitar da Markdown na tebur ɗin ƙwallo daidai — layin taken, layin daidaitawa na |---|---|, sannan layin bayanai. Yawancin masu canza na intanet suna sauka tebur a hanyar nan; wannan yana adana su. Soke-soke (<del>rubutu</del>~~rubutu~~) da jerin ayyuka (<input type="checkbox">- [ ] / - [x]) suna canzawa iri ɗaya.

Shin HTML ɗina zai canza da tsabta?

Don cikakken jerin fasali na GFM — taken h1 zuwa h6, jerin da aka oda da mara oda tare da haɗawa, mai ƙarfi / italic / soke-soke, hanyoyi cikin layi, hotuna cikin layi, ɓangarorin lambar da aka keɓe tare da tags na harshe, ɓangaren lambar cikin layi, tebur ɗin ƙwallo, jerin ayyuka, ɗabi'u, ƙa'idodin kwance, da autolinks — canzawa tana da tsabta kuma ana iya hango ta. Lokutan gefen: ana cire sharhin HTML (<!-- ... -->) (Markdown ba shi da syntax na sharhi); inline styling da sifofi na class suna sauka saboda Markdown ba shi da daidaici; da tags na cikin layi na musamman kamar <sub> / <sup> ana canza su zuwa rubutu mai sauƙi. Waɗannan ɗabi'un turndown ne da aka taƙaita, ba kwari ba.

Shin ana goyan bayan haske na syntax a cikin ɓangarorin lambar da aka keɓe?

Ba a cikin v1 ba. Ɓangarorin lambar da aka keɓe suna nuna da rubutun monospaced da bango mai sauƙi amma ba tare da haske na token na harshe ba. Ƙara haske na syntax zai buƙatar haɗa Prism ko highlight.js, wanda kowannensu yana ƙara 15–40 KB da fayil ɗin grammar na kowace harshe da matrisin jigo wanda zai buƙatar daidaita da ɗumbin Workshop Terminal. A yanzu, mai nuna yana mai da hankali kan daidaito da aminci daga XSS; idan akwai buƙatun mai amfani don haske cikin layi, mabuɗin opt-in shine bi-biye mai yiyuwa.

Wannan mai canza HTML zuwa Markdown yana aika da turndown@7.2.0 (+ turndown-plugin-gfm@1.0.2), marked@12.0.2, da DOMPurify@3.1.7 da aka haɗa a asalin guda ɗaya, yana fitar da cikakken jerin fasali na GFM, kuma yana tsarkake kowane kirtanin HTML da aka nuna kafin ya taɓa DOM. Babu loda, babu CDN, babu telemetry — kowane byte yana zama a cikin burauzarka.