如何从 URL 中检索域?

在 Go 中,如何仅从 URL 字符串中提取域名?

前:

https://www.example.com/some-random-url
www.example.com/some-random-url
example.com/some-random-url
www.example.com
subdomain.example.com

后:

example.com

此外,我仅限于使用 Golang 标准库。

stack overflow How do I retrieve the domain from a URL?
原文答案
author avatar

接受的答案

我认为由于您的示例也有不正确的 URL,因此您需要使用正则表达式来提取 URL 中的域。请在下面找到示例代码以获取您共享的示例的域:

package main

import (
    "fmt"
    "regexp"
)

// Main function
func main() {

    // Finding regexp from the given string
    // Using FindString() method
    m := regexp.MustCompile(`.?([^.]*.com)`)

    fmt.Println(m.FindStringSubmatch("https://www.example.com/some-random-url")[1])
    fmt.Println(m.FindStringSubmatch("www.example.com/some-random-url")[1])
    fmt.Println(m.FindStringSubmatch("example.com/some-random-url")[1])
    fmt.Println(m.FindStringSubmatch("www.example.com")[1])
    fmt.Println(m.FindStringSubmatch("subdomain.example.com")[1])

}

理想情况下,这涵盖了所有情况(包括格式不正确的 URL)。如果有任何未正确解析的 URL,您可以轻松更新 RegEx。

转到上面的游乐场链接: here


答案:

作者头像

我终于想通了。

package main

import (
    "fmt"
    "log"
    "net/url"
    "strings"
)

func main() {
    url, err := url.Parse("https://www.example.com")
    if err != nil {
        log.Fatal(err)
    }
    parts := strings.Split(url.Hostname(), ".")
    domain := parts[len(parts)-2] + "." + parts[len(parts)-1]
    fmt.Println(domain)
}

example.com

如果域类似于 subdomain.example.com ,那么它会让您感到恐慌。

https://play.golang.org/p/Li0PviAr2jU

作者头像

这个解决方案

func extractDomain(urlLikeString string) string {

    urlLikeString = strings.TrimSpace(urlLikeString)

    if regexp.MustCompile(`^https?`).MatchString(urlLikeString) {
        read, _ := url.Parse(urlLikeString)
        urlLikeString = read.Host
    }

    if regexp.MustCompile(`^www.`).MatchString(urlLikeString) {
        urlLikeString = regexp.MustCompile(`^www.`).ReplaceAllString(urlLikeString, "")
    }

    return regexp.MustCompile(`([a-z0-9-]+.)+[a-z0-9-]+`).FindString(urlLikeString)
}

会把这个

"   ",
"aaa",
"not domain",
"ca.mail.google.com",
"google.com",
" google.com ",
" www.google.com/a/example.com",
"www.google.com/f/example.com",
"google.com/f/example.com",
"http://google.com/f/abc.com",
"http://google.com/f/?wow=xyz.com",
"http://google.com/f/?wow=www.xyz.com",
"http://www.google.com/f/abc.com",
"https://www.google.com/f/abc.com",
"https://mail.google.com/f/abc.com",
"https://123.google.com/f/abc.com",
"https://xn-ddf3.google.com/f/abc.com",

进入这个

[empty string]
[empty string]
[empty string]
ca.mail.google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
mail.google.com
123.google.com
xn-ddf3.google.com

“net/url”方法 url.Parse 不会处理类似域的字符串,例如: bla bla google.com

作者头像

我认为这会有所帮助

package main

import (
"fmt"
"log"
"net/url"
"strings"
)
func main() {
    strArray := []string{
    "www.google.co.in",
    "https://google.in",
    "instagram.com",
    "nymag.com",
    "http://www.example.com/?airport=approval&box=brother",
    "https://www.example.com/babies.php#birds",
    "http://example.org/bear",
    "www.google.co.in",
    "google.com",
    "https://www.bbb.org/search/business-review-form/",
    "https://www.localvisibilitysystem.com/2015/08/19/how-to-use-meetup-sponsorships-for-local-marketing-and-seo-dave-oremlands-tips/",
    "http://www.example.com/boat/advertisement?actor=bat#boundary",
    "https://www.example.com/",
    "https://www.google.com",
    "https://www.example.com/army/approval.htm?basket=bottle",
    "http://example.com/board.aspx?afternoon=appliance&angle=ball",
    "http://www.example.com/",
    "http://example.com/",
    "http://www.example.com/",
    "livejournal.com",
    "delicious.com",
    "illinois.edu",
    "instagram.com",
    "nymag.com",
    "altervista.org",
    "t.co",
    "reddit.com",
    "tinyurl.com",
}
var hostname string
var temp []string
for i := 0; i < len(strArray); i++ {
    url, err := url.Parse(strArray[i])
    if err != nil {
        log.Fatal(err)
    }
    var urlstr string = url.String()

这里前缀和主机名将被过滤

    if strings.HasPrefix(urlstr, "https") {
        hostname = strings.TrimPrefix(urlstr, "https://")
    } else if strings.HasPrefix(urlstr, "http") {
        hostname = strings.TrimPrefix(urlstr, "http://")
    } else {
        hostname = urlstr
    }

    if strings.HasPrefix(hostname, "www") {
        hostname = strings.TrimPrefix(hostname, "www.")
    }
    if strings.Contains(hostname, "/") {
        temp = strings.Split(hostname, "/")
        fmt.Println(temp[0])
    } else {
        fmt.Println(hostname)
    }

}
}

输出:

 google.co.in
 google.in
 instagram.com
 nymag.com
 example.com
 example.com
 example.org
 google.co.in
 google.com
 bbb.org
 localvisibilitysystem.com
 example.com
 example.com
 google.com
 example.com
 example.com
 example.com
 example.com
 example.com
 livejournal.com
 delicious.com
 illinois.edu
 instagram.com
 nymag.com
 altervista.org
 t.co
 reddit.com
 tinyurl.com

这将从任何 url Go Playground 链接为您提供所需的域: https://go.dev/play/p/vfCOAnTNqh8