hugo/parser/pageparser/pagelexer.go

697 lines
16 KiB
Go
Raw Normal View History

// Copyright 2018 The Hugo Authors. All rights reserved.
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
//
2015-11-24 03:16:36 +00:00
// Licensed under the Apache License, Version 2.0 (the "License");
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
2015-11-24 03:16:36 +00:00
// http://www.apache.org/licenses/LICENSE-2.0
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Package pageparser provides a parser for Hugo content files (Markdown, HTML etc.) in Hugo.
// This implementation is highly inspired by the great talk given by Rob Pike called "Lexical Scanning in Go"
// It's on YouTube, Google it!.
// See slides here: http://cuddle.googlecode.com/hg/talk/lex.html
package pageparser
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
import (
"fmt"
"strings"
"unicode"
"unicode/utf8"
)
// position (in bytes)
type pos int
const eof = -1
// returns the next state in scanner.
type stateFunc func(*pageLexer) stateFunc
type lexerShortcodeState struct {
currLeftDelimItem itemType
currRightDelimItem itemType
currShortcodeName string // is only set when a shortcode is in opened state
closingState int // > 0 = on its way to be closed
elementStepNum int // step number in element
paramElements int // number of elements (name + value = 2) found first
openShortcodes map[string]bool // set of shortcodes in open state
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
}
type pageLexer struct {
input string
stateStart stateFunc
state stateFunc
pos pos // input position
start pos // item start position
width pos // width of last element
lastPos pos // position of the last item returned by nextItem
contentSections int
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
lexerShortcodeState
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
// items delivered to client
items []Item
}
func Parse(s string) *Tokens {
return ParseFrom(s, 0)
}
func ParseFrom(s string, from int) *Tokens {
lexer := newPageLexer(s, pos(from), lexMainSection) // TODO(bep) 2errors
lexer.run()
return &Tokens{lexer: lexer}
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
}
// note: the input position here is normally 0 (start), but
// can be set if position of first shortcode is known
func newPageLexer(input string, inputPosition pos, stateStart stateFunc) *pageLexer {
lexer := &pageLexer{
input: input,
pos: inputPosition,
stateStart: stateStart,
lexerShortcodeState: lexerShortcodeState{
currLeftDelimItem: tLeftDelimScNoMarkup,
currRightDelimItem: tRightDelimScNoMarkup,
openShortcodes: make(map[string]bool),
},
items: make([]Item, 0, 5),
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
}
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
return lexer
}
// main loop
func (l *pageLexer) run() *pageLexer {
for l.state = l.stateStart; l.state != nil; {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
l.state = l.state(l)
}
return l
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
}
// Shortcode syntax
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
const (
leftDelimScNoMarkup = "{{<"
rightDelimScNoMarkup = ">}}"
leftDelimScWithMarkup = "{{%"
rightDelimScWithMarkup = "%}}"
leftComment = "/*" // comments in this context us used to to mark shortcodes as "not really a shortcode"
rightComment = "*/"
)
// Page syntax
const (
summaryDivider = "<!--more-->"
summaryDividerOrg = "# more"
)
func (l *pageLexer) next() rune {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
if int(l.pos) >= len(l.input) {
l.width = 0
return eof
}
// looks expensive, but should produce the same iteration sequence as the string range loop
// see: http://blog.golang.org/strings
runeValue, runeWidth := utf8.DecodeRuneInString(l.input[l.pos:])
l.width = pos(runeWidth)
l.pos += l.width
return runeValue
}
// peek, but no consume
func (l *pageLexer) peek() rune {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
r := l.next()
l.backup()
return r
}
// steps back one
func (l *pageLexer) backup() {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
l.pos -= l.width
}
// sends an item back to the client.
func (l *pageLexer) emit(t itemType) {
l.items = append(l.items, Item{t, l.start, l.input[l.start:l.pos]})
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
l.start = l.pos
}
// special case, do not send '\\' back to client
func (l *pageLexer) ignoreEscapesAndEmit(t itemType) {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
val := strings.Map(func(r rune) rune {
if r == '\\' {
return -1
}
return r
}, l.input[l.start:l.pos])
l.items = append(l.items, Item{t, l.start, val})
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
l.start = l.pos
}
// gets the current value (for debugging and error handling)
func (l *pageLexer) current() string {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
return l.input[l.start:l.pos]
}
// ignore current element
func (l *pageLexer) ignore() {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
l.start = l.pos
}
// nice to have in error logs
func (l *pageLexer) lineNum() int {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
return strings.Count(l.input[:l.lastPos], "\n") + 1
}
// nil terminates the parser
func (l *pageLexer) errorf(format string, args ...interface{}) stateFunc {
l.items = append(l.items, Item{tError, l.start, fmt.Sprintf(format, args...)})
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
return nil
}
// consumes and returns the next item
func (l *pageLexer) nextItem() Item {
item := l.items[0]
l.items = l.items[1:]
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
l.lastPos = item.pos
return item
}
func (l *pageLexer) consumeCRLF() bool {
var consumed bool
for _, r := range crLf {
if l.next() != r {
l.backup()
} else {
consumed = true
}
}
return consumed
}
func lexMainSection(l *pageLexer) stateFunc {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
for {
if l.isShortCodeStart() {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
if l.pos > l.start {
l.emit(tText)
}
if strings.HasPrefix(l.input[l.pos:], leftDelimScWithMarkup) {
l.currLeftDelimItem = tLeftDelimScWithMarkup
l.currRightDelimItem = tRightDelimScWithMarkup
} else {
l.currLeftDelimItem = tLeftDelimScNoMarkup
l.currRightDelimItem = tRightDelimScNoMarkup
}
return lexShortcodeLeftDelim
}
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
if l.contentSections <= 1 {
if strings.HasPrefix(l.input[l.pos:], summaryDivider) {
if l.pos > l.start {
l.emit(tText)
}
l.contentSections++
l.pos += pos(len(summaryDivider))
l.emit(tSummaryDivider)
} else if strings.HasPrefix(l.input[l.pos:], summaryDividerOrg) {
if l.pos > l.start {
l.emit(tText)
}
l.contentSections++
l.pos += pos(len(summaryDividerOrg))
l.emit(tSummaryDividerOrg)
}
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
}
r := l.next()
if r == eof {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
break
}
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
}
return lexDone
}
func (l *pageLexer) isShortCodeStart() bool {
return strings.HasPrefix(l.input[l.pos:], leftDelimScWithMarkup) || strings.HasPrefix(l.input[l.pos:], leftDelimScNoMarkup)
}
func lexIntroSection(l *pageLexer) stateFunc {
LOOP:
for {
r := l.next()
if r == eof {
break
}
switch {
case r == '+':
return l.lexFrontMatterSection(tFrontMatterTOML, r, "TOML", "+++")
case r == '-':
return l.lexFrontMatterSection(tFrontMatterYAML, r, "YAML", "---")
case r == '{':
return lexFrontMatterJSON
case r == '#':
return lexFrontMatterOrgMode
case !isSpace(r) && !isEndOfLine(r):
if r == '<' {
l.emit(tHTMLLead)
// Not need to look further. Hugo treats this as plain HTML,
// no front matter, no shortcodes, no nothing.
l.pos = pos(len(l.input))
l.emit(tText)
break LOOP
}
return l.errorf("failed to detect front matter type; got unknown identifier %q", r)
}
}
l.contentSections = 1
// Now move on to the shortcodes.
return lexMainSection
}
func lexDone(l *pageLexer) stateFunc {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
// Done!
if l.pos > l.start {
l.emit(tText)
}
l.emit(tEOF)
return nil
}
func lexFrontMatterJSON(l *pageLexer) stateFunc {
// Include the left delimiter
l.backup()
var (
inQuote bool
level int
)
for {
r := l.next()
switch {
case r == eof:
return l.errorf("unexpected EOF parsing JSON front matter")
case r == '{':
if !inQuote {
level++
}
case r == '}':
if !inQuote {
level--
}
case r == '"':
inQuote = !inQuote
case r == '\\':
// This may be an escaped quote. Make sure it's not marked as a
// real one.
l.next()
}
if level == 0 {
break
}
}
l.consumeCRLF()
l.emit(tFrontMatterJSON)
return lexMainSection
}
func lexFrontMatterOrgMode(l *pageLexer) stateFunc {
/*
#+TITLE: Test File For chaseadamsio/goorgeous
#+AUTHOR: Chase Adams
#+DESCRIPTION: Just another golang parser for org content!
*/
const prefix = "#+"
l.backup()
if !strings.HasPrefix(l.input[l.pos:], prefix) {
// TODO(bep) consider error
return lexMainSection
}
// Read lines until we no longer see a #+ prefix
LOOP:
for {
r := l.next()
switch {
case r == '\n':
if !strings.HasPrefix(l.input[l.pos:], prefix) {
break LOOP
}
case r == eof:
break LOOP
}
}
l.emit(tFrontMatterORG)
return lexMainSection
}
// Handle YAML or TOML front matter.
func (l *pageLexer) lexFrontMatterSection(tp itemType, delimr rune, name, delim string) stateFunc {
for i := 0; i < 2; i++ {
if r := l.next(); r != delimr {
return l.errorf("invalid %s delimiter", name)
}
}
if !l.consumeCRLF() {
return l.errorf("invalid %s delimiter", name)
}
// We don't care about the delimiters.
l.ignore()
for {
r := l.next()
if r == eof {
return l.errorf("EOF looking for end %s front matter delimiter", name)
}
if isEndOfLine(r) {
if strings.HasPrefix(l.input[l.pos:], delim) {
l.emit(tp)
l.pos += 3
l.consumeCRLF()
l.ignore()
break
}
}
}
return lexMainSection
}
func lexShortcodeLeftDelim(l *pageLexer) stateFunc {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
l.pos += pos(len(l.currentLeftShortcodeDelim()))
if strings.HasPrefix(l.input[l.pos:], leftComment) {
return lexShortcodeComment
}
l.emit(l.currentLeftShortcodeDelimItem())
l.elementStepNum = 0
l.paramElements = 0
return lexInsideShortcode
}
func lexShortcodeComment(l *pageLexer) stateFunc {
posRightComment := strings.Index(l.input[l.pos:], rightComment+l.currentRightShortcodeDelim())
if posRightComment <= 1 {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
return l.errorf("comment must be closed")
}
// we emit all as text, except the comment markers
l.emit(tText)
l.pos += pos(len(leftComment))
l.ignore()
l.pos += pos(posRightComment - len(leftComment))
l.emit(tText)
l.pos += pos(len(rightComment))
l.ignore()
l.pos += pos(len(l.currentRightShortcodeDelim()))
l.emit(tText)
return lexMainSection
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
}
func lexShortcodeRightDelim(l *pageLexer) stateFunc {
l.closingState = 0
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
l.pos += pos(len(l.currentRightShortcodeDelim()))
l.emit(l.currentRightShortcodeDelimItem())
return lexMainSection
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
}
// either:
// 1. param
// 2. "param" or "param\"
// 3. param="123" or param="123\"
// 4. param="Some \"escaped\" text"
func lexShortcodeParam(l *pageLexer, escapedQuoteStart bool) stateFunc {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
first := true
nextEq := false
var r rune
for {
r = l.next()
if first {
if r == '"' {
// a positional param with quotes
if l.paramElements == 2 {
return l.errorf("got quoted positional parameter. Cannot mix named and positional parameters")
}
l.paramElements = 1
l.backup()
return lexShortcodeQuotedParamVal(l, !escapedQuoteStart, tScParam)
}
first = false
} else if r == '=' {
// a named param
l.backup()
nextEq = true
break
}
if !isAlphaNumericOrHyphen(r) {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
l.backup()
break
}
}
if l.paramElements == 0 {
l.paramElements++
if nextEq {
l.paramElements++
}
} else {
if nextEq && l.paramElements == 1 {
return l.errorf("got named parameter '%s'. Cannot mix named and positional parameters", l.current())
} else if !nextEq && l.paramElements == 2 {
return l.errorf("got positional parameter '%s'. Cannot mix named and positional parameters", l.current())
}
}
l.emit(tScParam)
return lexInsideShortcode
}
func lexShortcodeQuotedParamVal(l *pageLexer, escapedQuotedValuesAllowed bool, typ itemType) stateFunc {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
openQuoteFound := false
escapedInnerQuoteFound := false
escapedQuoteState := 0
Loop:
for {
switch r := l.next(); {
case r == '\\':
if l.peek() == '"' {
if openQuoteFound && !escapedQuotedValuesAllowed {
l.backup()
break Loop
} else if openQuoteFound {
// the coming quoute is inside
escapedInnerQuoteFound = true
escapedQuoteState = 1
}
}
case r == eof, r == '\n':
return l.errorf("unterminated quoted string in shortcode parameter-argument: '%s'", l.current())
case r == '"':
if escapedQuoteState == 0 {
if openQuoteFound {
l.backup()
break Loop
} else {
openQuoteFound = true
l.ignore()
}
} else {
escapedQuoteState = 0
}
}
}
if escapedInnerQuoteFound {
l.ignoreEscapesAndEmit(typ)
} else {
l.emit(typ)
}
r := l.next()
if r == '\\' {
if l.peek() == '"' {
// ignore the escaped closing quote
l.ignore()
l.next()
l.ignore()
}
} else if r == '"' {
// ignore closing quote
l.ignore()
} else {
// handled by next state
l.backup()
}
return lexInsideShortcode
}
// scans an alphanumeric inside shortcode
func lexIdentifierInShortcode(l *pageLexer) stateFunc {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
lookForEnd := false
Loop:
for {
switch r := l.next(); {
case isAlphaNumericOrHyphen(r):
// Allow forward slash inside names to make it possible to create namespaces.
case r == '/':
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
default:
l.backup()
word := l.input[l.start:l.pos]
if l.closingState > 0 && !l.openShortcodes[word] {
return l.errorf("closing tag for shortcode '%s' does not match start tag", word)
} else if l.closingState > 0 {
l.openShortcodes[word] = false
lookForEnd = true
}
l.closingState = 0
l.currShortcodeName = word
l.openShortcodes[word] = true
l.elementStepNum++
l.emit(tScName)
break Loop
}
}
if lookForEnd {
return lexEndOfShortcode
}
return lexInsideShortcode
}
func lexEndOfShortcode(l *pageLexer) stateFunc {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
if strings.HasPrefix(l.input[l.pos:], l.currentRightShortcodeDelim()) {
return lexShortcodeRightDelim
}
switch r := l.next(); {
case isSpace(r):
l.ignore()
default:
return l.errorf("unclosed shortcode")
}
return lexEndOfShortcode
}
// scans the elements inside shortcode tags
func lexInsideShortcode(l *pageLexer) stateFunc {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
if strings.HasPrefix(l.input[l.pos:], l.currentRightShortcodeDelim()) {
return lexShortcodeRightDelim
}
switch r := l.next(); {
case r == eof:
// eol is allowed inside shortcodes; this may go to end of document before it fails
return l.errorf("unclosed shortcode action")
case isSpace(r), isEndOfLine(r):
l.ignore()
case r == '=':
l.ignore()
return lexShortcodeQuotedParamVal(l, l.peek() != '\\', tScParamVal)
case r == '/':
if l.currShortcodeName == "" {
return l.errorf("got closing shortcode, but none is open")
}
l.closingState++
l.emit(tScClose)
case r == '\\':
l.ignore()
if l.peek() == '"' {
return lexShortcodeParam(l, true)
}
case l.elementStepNum > 0 && (isAlphaNumericOrHyphen(r) || r == '"'): // positional params can have quotes
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
l.backup()
return lexShortcodeParam(l, false)
case isAlphaNumeric(r):
l.backup()
return lexIdentifierInShortcode
default:
return l.errorf("unrecognized character in shortcode action: %#U. Note: Parameters with non-alphanumeric args must be quoted", r)
}
return lexInsideShortcode
}
// state helpers
func (l *pageLexer) currentLeftShortcodeDelimItem() itemType {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
return l.currLeftDelimItem
}
func (l *pageLexer) currentRightShortcodeDelimItem() itemType {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
return l.currRightDelimItem
}
func (l *pageLexer) currentLeftShortcodeDelim() string {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
if l.currLeftDelimItem == tLeftDelimScWithMarkup {
return leftDelimScWithMarkup
}
return leftDelimScNoMarkup
}
func (l *pageLexer) currentRightShortcodeDelim() string {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
if l.currRightDelimItem == tRightDelimScWithMarkup {
return rightDelimScWithMarkup
}
return rightDelimScNoMarkup
}
// helper functions
func isSpace(r rune) bool {
return r == ' ' || r == '\t'
}
func isAlphaNumericOrHyphen(r rune) bool {
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
// let unquoted YouTube ids as positional params slip through (they contain hyphens)
return isAlphaNumeric(r) || r == '-'
}
var crLf = []rune{'\r', '\n'}
Shortcode rewrite, take 2 This commit contains a restructuring and partial rewrite of the shortcode handling. Prior to this commit rendering of the page content was mingled with handling of the shortcodes. This led to several oddities. The new flow is: 1. Shortcodes are extracted from page and replaced with placeholders. 2. Shortcodes are processed and rendered 3. Page is processed 4. The placeholders are replaced with the rendered shortcodes The handling of summaries is also made simpler by this. This commit also introduces some other chenges: 1. distinction between shortcodes that need further processing and those who do not: * `{{< >}}`: Typically raw HTML. Will not be processed. * `{{% %}}`: Will be processed by the page's markup engine (Markdown or (infuture) Asciidoctor) The above also involves a new shortcode-parser, with lexical scanning inspired by Rob Pike's talk called "Lexical Scanning in Go", which should be easier to understand, give better error messages and perform better. 2. If you want to exclude a shortcode from being processed (for documentation etc.), the inner part of the shorcode must be commented out, i.e. `{{%/* movie 47238zzb */%}}`. See the updated shortcode section in the documentation for further examples. The new parser supports nested shortcodes. This isn't new, but has two related design choices worth mentioning: * The shortcodes will be rendered individually, so If both `{{< >}}` and `{{% %}}` are used in the nested hierarchy, one will be passed through the page's markdown processor, the other not. * To avoid potential costly overhead of always looking far ahead for a possible closing tag, this implementation looks at the template itself, and is branded as a container with inner content if it contains a reference to `.Inner` Fixes #565 Fixes #480 Fixes #461 And probably some others.
2014-10-27 20:48:30 +00:00
func isEndOfLine(r rune) bool {
return r == '\r' || r == '\n'
}
func isAlphaNumeric(r rune) bool {
return r == '_' || unicode.IsLetter(r) || unicode.IsDigit(r)
}