Rich Media: Skip Microformats hashtags
authorrinpatch <rinpatch@sdf.org>
Tue, 18 Jun 2019 21:31:30 +0000 (00:31 +0300)
committerrinpatch <rinpatch@sdf.org>
Tue, 18 Jun 2019 21:46:30 +0000 (00:46 +0300)
When fixing this problem I incorrectly assumed a.hashtag is
the proper way for detecting hashtags, but it is just something Pleroma and
Mastodon add. Per microformats it should be detected by the presense of rel=tag.

This MR adds a check for rel=tag, but I still left a.hashtag just in case

lib/pleroma/html.ex
test/html_test.exs

index 8c226c9446b1f369e03c47bc5fe86a322b264d3c..2fae7281c46c30134ec89edf4adc10267539cd9f 100644 (file)
@@ -89,7 +89,7 @@ defmodule Pleroma.HTML do
     Cachex.fetch!(:scrubber_cache, key, fn _key ->
       result =
         content
-        |> Floki.filter_out("a.mention,a.hashtag")
+        |> Floki.filter_out("a.mention,a.hashtag,a[rel~=\"tag\"]")
         |> Floki.attribute("a", "href")
         |> Enum.at(0)
 
index 64513980b0bb48fb57ec375646d968cef70a86b0..b8906c46a5ef6f0674f66493a8643d0a39680f30 100644 (file)
@@ -212,5 +212,21 @@ defmodule Pleroma.HTMLTest do
 
       assert url == "https://www.pixiv.net/member_illust.php?mode=medium&illust_id=72255140"
     end
+
+    test "skips microformats hashtags" do
+      user = insert(:user)
+
+      {:ok, activity} =
+        CommonAPI.post(user, %{
+          "status" =>
+            "<a href=\"https://pleroma.gov/tags/cofe\" rel=\"tag\">#cofe</a> https://www.pixiv.net/member_illust.php?mode=medium&illust_id=72255140",
+          "content_type" => "text/html"
+        })
+
+      object = Object.normalize(activity)
+      {:ok, url} = HTML.extract_first_external_url(object, object.data["content"])
+
+      assert url == "https://www.pixiv.net/member_illust.php?mode=medium&illust_id=72255140"
+    end
   end
 end